next up previous
Next: FRAGMENT COMBINATION Up: The Generic Information Extraction Previous: PREPARSER

PARSER

This module takes a sequence of lexical items and perhaps phrases and normally tries to produce a parse tree for the entire sentence. Systems that do full-sentence parsing usually represent their rules either as a phrase structure grammar augmented with constraints on the application of the rules (Augmented Transition Networks, or ATNs), or as unification grammars in which the constraints are represented declaratively. The most frequent parsing algorithm is chart parsing. Sentence are parsed bottom-up, with top-down constraints being applied. As fragmentary parsing becomes more prevalent, the top-down constraints cannot be used as much. Similar structures that span the same string of words are merged in order to bring the processing down from exponential time to polynomial time.

Recently more and more systems are abandoning full-sentence parsing in information extraction applications. Some of these systems recognize only fragments because although they are using the standard methods for full-sentence parsing, their grammar has very limited coverage. In other systems the parser applies domain-dependent, finite-state pattern-matching techniques rather than more complex processing, trying only to locate within the sentence various patterns that are of interest in the application.

Grammars for the parsing module are either developed manually over a long period of time or borrowed from another site. There has been some work on the statistical inference of grammar rules in some areas of the grammar.


next up previous
Next: FRAGMENT COMBINATION Up: The Generic Information Extraction Previous: PREPARSER
Jerry Hobbs 2004-02-24