next up previous
Next: The Walkthrough Example Up: The SRI MUC-5 JV-FASTUS Previous: Results of the Evaluation

System Architecture

The basic architecture of the English JV-FASTUS system is illustrated in Figure 1. The text is input to the cascade of transducers as a stream of ASCII characters. This ammounts to a decision to treat all text as unformatted, which for the English joint ventures texts is not unreasonable, since these texts contain very little relevant formatted data such as tables, and when they do occur, their format is ideosyncratic. The first transducer is the TOKENIZER, which produces symbolic and numeric tokens as output. These symbolic tokens are given to the PREPROCESSOR, which recognizes multiword lexical items, and some company and personal names, and produces lexical-items as output. The PHRASE PARSER then breaks the input stream into Noun Groups (the part of the noun phrase consisting of determiner, prenominal modifiers and head noun) Verb Groups (auxilliaries, intervening adverbs, with main verb) and particles (single lexical items including conjunctions, prepositions, subordinating conjunctions, and relative pronouns). The PHRASE parser also identifies the head of each constituent, which with some minor exceptions, is the only component of the constituent that influences subsequent processing. The PHRASE COMBINER takes the phrases output by the PHRASE PARSER and combines them into larger phrases of the same type. For example, adjacent noun groups may be merged into appositives, certain prepositional phrases are attached to their noun groups, and conjunctions of both verb groups and noun groups are combined. The combined phrases are input to the DOMAIN PATTERN RECOGIZER, which nondeterministically matches the sentence against patterns that are relevant to the information to be extracted. The by-product of the match is partially instantiated raw templates that are merged by the MERGER. Finally a POST PROCESSOR puts the raw templates into final form for printing.



Subsections
next up previous
Next: The Walkthrough Example Up: The SRI MUC-5 JV-FASTUS Previous: Results of the Evaluation
Jerry Hobbs 2004-02-24