The Role of Information Extraction in the Design of a Document Triage Application for Biocuration

TitleThe Role of Information Extraction in the Design of a Document Triage Application for Biocuration
Publication TypeConference Paper
Year of Publication2011
AuthorsS. Pokkunuri, C. Ramakrishnan, E. Riloff, E. H. Hovy, and G. A. P. C. Burns
Conference NameWorkshop on Biomedical Natural Language Processing (BioNLP 2011) at ACL/HLT 2011
Conference LocationPortland, Oregon
Abstract

Traditionally, automated triage of papers is performed using lexical (unigram, bigram, and sometimes trigram) features. This paper explores the use of information extraction (IE) techniques to create richer linguistic features than traditional bag-of-words models. Our classifier includes lexico-syntactic patterns and more-complex features that represent a pattern coupled with its extracted noun, represented both as a lexical term and as a semantic category. Our experimental results show that the IE-based features can improve performance over unigram and bigram features alone. We present intrinsic evaluation results of full-text document classification experiments to determine automatically whether a paper should be considered of interest to biologists at the Mouse Genome Informatics (MGI) system at the Jackson Laboratories. We also further discuss issues relating to design and deployment of our classifiers as an application to support scientific knowledge curation at MGI.

URLhttp://www.isi.edu/sites/default/files/users/burns/bmkeg/bionlp-2011-Final.pdf
Groups: