SciKnowMine Release Workshop - Bridging BioNLP and Biocuration

Biological Natural Language Processing ('BioNLP') holds great promise to support and accelerate biocuration (organizing published biomedical knowledge into online resources such as databases) but has not yet generated viable open technology for use within the community. This is an area of active research and is the subject of shared evaluations such as 'BioCreative 4'. As the closing meeting of an NSF-funded infrastructure project (called 'SciKnowMine', #0849977), we intend to (A) present an implementation of a system for document triage that we are currently deploying to the Mouse Genome Informatics (MGI) system, (B) present and develop a strategic plan for open-source community-driven tools that bridge between curators committed to improving the quality of their informatics resources and computer science specialists developing novel NLP technology.

We illustrate this concept in Figure 1 below and seek to define and provide viable points of interaction between the two communities that can facilitate and support biocuration in active database systems. 

Figure 1: Schema for the development of bridge technology between biocurators and NLP researchers. 

The workshop will run from USC's Information Sciences Institute and we will present talks from key stakeholders in this activity: (A) the developers of the SciKnowMine system; (B) NLP computer scientists developing novel applications based on text from articles curated by MGI, (C) from a number of biomedical databases themselves to describe their requirements and from other stakeholders (such as publishers, and the organizers of the BioCreative evaluation). We will present a half-day of presentations followed by panel discussions to plan and develop open-source strategies for developing biocuration systems based on the SciKnowMine platform. 

Date: Monday, August 19th, 2013
Location: 'Bayview room', Marina del Rey Mariott. 4100 Admiralty Way. Marina Del Rey, CA 90292.
Contact and Organizer: Gully Burns ([email protected])

Complete Video Coverage of the Workshop

The program mainly consisted of 25-minute long presentations with 5 minutes for questions. We finished the day with an informative group discussion focussed on setting up collaborative projects and next steps.  

Full PDF download of the program + abstracts

All videos are shared on USC/ISI's YouTube Channel

Opening Remarks (Gully Burns)


PART I - Review of the SciKnowMine Project

The SciKnowMine Project - Bridging BioNLP and Biocuration (Gully Burns)

Mouse Genome Informatics (MGI): the challenge to develop NLP strategies and tools to support publication-based curation (Janan Eppig) 

The Colorado Richly Annotated Full Text corpus: A corpus linguist’s perspective (Kevin B Cohen)



PART II - Biomedical Databases: At the cutting edge of knowledge

Reactome – Linking pathways, networks and disease (Robin Haw)

OntoPub: An Ontology-Driven, Concept-Based Biomedical Literature Search Engine (Dongui Lui) 

Wiki-based community annotation (Jim Hu) 



PART III - BioNLP: Computing over the complexity.

An NLP Voyage: Explorations with Information Extraction for Biocuration (Ellen Riloff) 

Mining Fulltext (Maximilian Haeussler) 

Clarifying ClearTK: A Case Study in Entity Tagging (James Gung) 



PART IV - Technological approaches applied to publishing space.

Mendeley’s vision of biocuration (William Gunn) 

The BioCreative Evaluation initiative (Cecilia Arighi) 

PART V - Forging the future: Group discussion. 

Meeting Attendees

Attendees consisted of NLP specialists, infrastructure creators, biocurators and students.

The meeting was also attended by students and scientists from ISI and USC.
Thanks to everyone who participated. This was an awesome day.