SourceData: a semantic platform to make published data and figures discoverable

Friday, December 2, 2016, 11:00 am - 12:00 pm PDTiCal
11th floor large conference room
This event is open to the public.
AI Seminar
Thomas Lemberger

Title:  SourceData: a semantic platform to make published data and figures discoverable

In scientific publications, data are visually depicted in figures or tables. The original data behind the figures – the ‘source data’ – however are almost never available in a structured format that would make them findable and reusable.  To address this issue, SourceData ( has built a suite of tools to capture the structure of published research data and to make published research papers discoverable based solely on their data content. SourceData converts the narrative descriptions provided in figure legends into standardized, machine-readable metadata. Each biological component in a figure is consistently identified via links to established public databases of biological terms. The experimental design is furthermore captured in a structured format by classifying the role of each component. Computer-assisted manual identification and classification of biological entities is performed with a web-based curation tool. A separate interface allows authors to verify the accuracy of curated information. In a pilot project, the SourceData team has processed over 15,000 experiments from papers across 23 journals. The resulting web of connected data can be browsed through the SmartFigure application (, which displays data in the context of related figures published in other papers and enables users to easily navigate between them. Users can also use the SourceData search engine to directly retrieve data based on the design of an experiment. SourceData searches the structure of the data rather than relying on keyword indexing, thus avoiding potentially subjective interpretation of results provided in the text.

Short bio:

Thomas Lemberger is Deputy Head of Scientific Publications at EMBO ( in Heidelberg, Germany, Chief Editor of the open access journal Molecular Systems Biology ( and Project Leader of the SourceData project ( Trained as a molecular biologist, Thomas earned his PhD at the University of Lausanne, Switzerland, where he studied hormonal regulation of gene expression by nuclear receptors. For his postdoctoral research, he moved to Heidelberg, Germany, where his research focused on the regulation of transcription in the brain. He joined EMBO as scientific editor in 2005 and assumed the editorial oversight of Molecular Systems Biology since launch of the journal. He has recently initiated the SourceData project to build an open platform that makes scientific publications discoverable based on their data content.

« Return to Upcoming Events