Biomedical Knowledge Engineering Group

BMKEG Presentations

Invited Lecture: INCF 2013 - Using experimental design to design neuroinformatics data structures  

Video Link:

The interdisciplinary nature of neuroscience research leads to an explosion of different informatics tools, data structures, platforms and terminologies. A central difficulty faced by developers is that knowledge representations for any neuroscience subdomain must serve the domain-specific needs of that specified sub-community. Related representations overlap, they contradict each other, they have competing standards. The process of standardization is itself difficult to organize within the community and even harder to enforce in practice. This involves complex issues involving ease of use, computability, data availability as well as scientific correctness and philosophical purity.
In this talk, I present a novel, relatively simple conceptual design that makes a clear distinction between interpretive and observation knowledge to build a general framework for scientific data. Our methodology (called 'Knowledge Engineering from Experimental Design' or KEfED)  uses an experiment's protocol's to define the dependencies between its independent and dependent variables. These dependencies support the construction of a data structure that can capture (a) data points, (b) mean values, (c) statistical significance relations and (d) correlations. We will describe the underlying formalism of the KEfED approach, the tools we provide to help researchers build their own models, our approach to unify and standardize the definition of variables, the application of KEfED to complex neuroscience knowledge and possible research directions for this technology in the future.

Workshop: SciKnowMine 2013 - Bridging BioNLP and Biocuration 

Local page:

Biological Natural Language Processing ('BioNLP') holds great promise to support and accelerate biocuration (organizing published biomedical knowledge into online resources such as databases) but has not yet generated viable open technology for use within the community. This is an area of active research and is the subject of shared evaluations such as 'BioCreative 4'. As the closing meeting of an NSF-funded infrastructure project (called 'SciKnowMine', #0849977), we held a workshop to (A) present an implementation of a system for document triage that we are currently deploying to the Mouse Genome Informatics (MGI) system, (B) present and develop a strategic plan for open-source community-driven tools that bridge between curators committed to improving the quality of their informatics resources and computer science specialists developing novel NLP technology. The meeting was well-attended by many experts from both communities and in-keeping with the vision of this blog of examining the issues inherent in developing scientific breakthroughs by explicitly describing the paradigms that different disciplines inhabit, the workshop was fully designed around the theme of finding connecting points between these two inter-dependent paradigms.

"Introducing paradigms as a viable structural guide for biomedical knowledge engineering"

Video link:

Following Thomas Kuhn's seminal 1962 book in which he introduced the notion of scientific paradigms, we here describe a computational methodology that leverages that concept in a concrete formulation. I describe this approach partially as a methodology for framing and scoping the knowledge representation and analysis work necessary to build tools to serve a specific community. However this approach also has technical implications that are relevant to semantic web representations, the use of workflows and reasoning and the way that we derive content from existing scientific artefacts. We will explore this viewpoint in the context of a well defined domain problem (Biomarker studies of neurodegenerative diseases) with the strategic intent of developing a practical, scoped view of biomarker data that could serve as the basis of corollary work within AI computer science groups.

"Organizing the world’s scientific knowledge to make it universally accessible and powerful: Building the Breakthrough Machine"

Video Link:

Not all information is created equal. Accurate, innovative scientific knowledge generally has an enormous impact on humanity. It is the source of our ability to make predictions about our environment. It is the source of new technology (with all its attendent consequences, both positive and negative). It is also a continuous source of wonder and fascination. In general, the value and power of scientific knowledge is not reflected in the scale and structure of the information infrastructure used to house, store and share this knowledge. Many scientists use spreadsheets as the most sophisticated data management tool and only publish their data as PDF files in the literature. In this high-level talk, we describe a powerful, new knowledge engineering framework for describing scientific observations within a broader strategic model of the scientific process. We describe general open-source tools for scientists to model and manage their data in an attempt to accelerate discovery. Using examples focussed on the high-value challenge problem: finding a cure for Parkinson's Disease, we present a high-level strategic approach that is both in-keeping with Google's vision and values and could also provide a viable new research that would benefit from Google's massively scalable technology. Ultimately, we present an informatics research initiative for the 21st century: 'Building a Breakthrough Machine".

"Structured nanopublications pertaining to the drugome"

Video Link:

No abstract available. This was a presentation given at the 'Beyond the PDF' meeting (in January 2010) about applying KEfED models to data processing workflows about the relationships between drugs and target sites in Tuberculosis. This idea was developed much further by Yolanda Gil's group described here: