Research
SciKnowMine: A community-driven framework for text mining tools in direct service to biocuration
The challenge of delivering effective computational support for curation of large-scale biomedical databases is still unsolved. The SciKnowMine project is aimed at providing knowledge engineering enhancements of existing biocuration systems via large-scale text processing pipelines bringing together multiple NLP tools developed using the UIMA framework.![]() |
| High-level architecture of SciKnowMine |
Details about the scope of this project are available in this paper to appear in the in Workshop "New Challenges for NLP Frameworks" collocated with The seventh international conference on Language Resources and Evaluation (LREC) 2010.
Related Publications
- Pokkunuri, S., Ramakrishnan, C., Riloff, E., Hovy, E., Burns, G. (2011) "
The Role of Information Extraction in the Design of a Document Triage Application for Biocuration ", ACL/HLT 2011 Workshop on Biomedical Natural Language Processing (BioNLP 2011).[pdf] - "
Building the Scientific Knowledge Mine (SciKnowMine): a community-driven framework for text mining tools in direct service to biocuration " Ramakrishnan et. al., In Workshop "New Challenges for NLP Frameworks" collocated with the seventh international conference on Language Resources and Evaluation (LREC) 2010.[pdf]
Related Software
- LA-PDFText - Accurate Layout-Aware Text Extraction from Full-text PDF of Scientific Articles [Google Code]
Biomedical Knowledge Engineering - BioScholar
BioScholar is a Knowledge Engineering and Management system to support a single scientific worker (at the level of a graduate student or postdoctoral worker) to design, construct and manage a shared knowledge repository for a research group by curating and processing knowledge from the scientific literature.Related Publications
- "
Knowledge Engineering Tools for Reasoning with Scientific Observations and Interpretations; a Neural Connectivity Use Case " Thomas A. Russ, Cartic Ramakrishnan, Eduard H. Hovy, Mihail Bota, Gully A.P.C. Burns. BMC Bioinformatics 2011, 12:351.[pdf]
Related Software
- A Knowledge Engineering and Management system for biomedical scientists [Google Code]
Dissertation Research
My dissertation research was motivated by the need to effectively utilize textual knowledge in building intelligent systems. Over the years several compelling examples have shown the need for automating the identification of emergent knowledge from text. This capability also been listed as a milestone in Microsoft's 2020 vision. The need for such a capability pointed out by Dr. Don Swanson when he discovered potential therapeutic uses of Magnesium in alleviating some Migraines. This was done by manually linking article titles and the discovered associations were subsequently validated by wet-lab experiments. The utility of traversing named associations between objects was pointed out as far back as 1945 by Dr.Vannevar Bush in his MEMEX vision. All of these have served as motivation for my research.
Research threads
I have developed algorithms for entity and relationship extraction from text. Multi-relational knowledge extracted by these algorithms is represented using RDF. I have also developed an algorithm for informative subgraph discovery over the resulting RDF graph. For my dissertation I have focussed on the following research problems as they apply to the biomedical domain:- Unsupervised Identification of compound entities in biomedical text - Named entities occuring in text can be structurally and semantically complex. Simple entities can be nested to form complex ones. Supervised approaches to identification and typing of these nested entities have been explored. These however require training data. My research has sought to circumvent the need for training data for this problem. Details of my approach to this problem can be found in the following paper: EKAW 2008.
- Unsupervised Extraction of relationships from biomedical text - Dictionaries of terms in the biomedical domain contain entities organized in hierarchies (e.g. MeSH). My research on relationship extraction has sought to extract named relationship between these entities. I have investigated two approaches to this problem. Details are available in the following papers: WI 2008 and ISWC 2006.
- Subgraph discovery over Multi-relational Networks - Discovering patterns in graphs has long been an area of interest. In most approaches to such pattern discovery either quantitative anomalies, frequency of substructure is used to measure the interestingness of a pattern. My approach to this problem sought to define "interestingness" of subgraphs based on domain semantics. In this work I adapted a fast connection subgraph algorithm to answer the following question given an RDF graph. "What are the most relevant ways in which entity X is related to entity Y?" the response to this question is a subgraph connecting X to Y containing the most relevant paths between X and Y. Details are available in the following paper: SIGKDD Explorations 2005.
