Gully A.P.C. Burns

Research Interests

'Discovery Informatics'

My research interests are effectively encapsulated within the formulation of the domain of 'Discovery Informatics' that is an attempt to bring together many facets of AI research in service of scientific research. Within this work, we are focussed on constructing tools and approaches that support the cycle of scientific investigation by accelerating it, tracking its provenance, populating it with data, automating processes of reasoning within it. This involves collaboration and the application of technology from several subfields.

Knowledge Representation and Reasoning, Ontology Engineering and Semantic Web 

Our primary contribution in this field is 'Knowledge Engineering from Experimental Design (KEfED)' (currently defined within the scope of the BioScholar project on this site).  

We have recently published a paper describing this approach to representing scientific observations in terms of the underlying variables used to capture and measure them (Russ et al 2001). This work uses a mature knowledge representation and reasoning system (called PowerLoom) that provides an powerful, expressive system based on Common Logic (as opposed to the OWL reasoning systems that may be considered Description Logics). We are also interested in leveraging object-oriented modeling techniques based on UML within a framework called the View-Primitive-Data-Model framework (VPDMf). We are working towards exporting these systems and methodologies to semantic web standards such as RDF and OWL within frameworks being developed under the rubric of 'nanopublications' or 'microattributions'.

Accelerating Biocuration, Natural Language Processing, Text Mining and Machine Reading

Populating these models requires biocuration from the literature; a challenging, time-consuming task that is expensive, inaccurate and largely a by-product of the antiquated models of knowledge publication that form the scientific literature. In an attempt to transform this process, we are developing infrastructure frameworks for text mining and information extraction designed to (a) accelerate biocuration to speed up the population of scientific databases, (b) provide a platform for computer scientists working in these fields to work with so that their tools may be easily made accessible to biomedical scientists in need of them and (c) to transform the scientific publishing process.    

Information Integration, Domain Modeling 

Biomedical Data is widely accessible in many systems and locations and the process of making use of the data contained in these systems is inherently an information integration problem. In collaboration with researchers from ISI's Information Integration group (principally Jose Luis Ambite, Pedro Szekely and Craig Knoblock), we are interested in leveraging online data into our high level reasoning model based on the cycle of scientific investigation (see Fig 1).

User Interfaces, Data Visualization Software Engineering, Tool Development and Deployment

This work is firmly embedded within the process of developing software that is deliverable and usable by biomedical scientists, biocurators and knowledge engineers. Although we are continuously striving to improve in this area, we have a track-record of delivering practical systems and tools that people use. NeuroScholar, an early project developed from 2001-2006 was downloaded over 1800 times during its lifecycle. Our software development work is evaluated continuously based on extrinsic performance criteria (e.g., subjective ease-of-use, understandability, effectiveness) that require evaluation at all stages of the development cycle (requirements specification – storyboard design – initial prototype – stable prototype – operational system). This is possibly the most challenging part of our work since the primary metric of success for these projects is based on how well scientists use our systems to make new scientific discoveries.