next up previous
Next: The PHRASE COMBINER Up: System Architecture Previous: The PREPROCESSOR

The PHRASE PARSER

The next phase accepts the lexical items combined by the preprocessor as input and produces a sequence of phrases as output. The head of each phrase is identified, and if the head of the phrase corresponds to an object in the domain for which a template object is defined, then an object of the appropriate type is associated with the phrase. For example, if the noun group is ``the Japanese company,'' the noun group is associated with an ENTITY object whose NATIONALITY slot is Japan.

The phrase parser constructs phrases that can be reliably described as a regular language. Attachment ambiguities are preserved for later phases where they will either be ignored as irrelevant, or combined on the basis of domain specific patterns when the combination can be done reliably.

The basic grammar of English used in this phase is a superset of that used in the MUC-4 FASTUS system. The main differences involve more detailed processing of numbers consisting of mixed numeric and sybolic parts (e.g. 3 million), currency phrases (e.g. DM 2500), and the recognition of bank names. Possible companies are treated as proper nouns and can be combined to form noun groups just as other proper nouns referring to locations, companies or people.

Lexical ambiguity can lead to multiple analyses at the end of the parsing phase. In general, longer phrases are prefered to shorter ones. In mixed case texts, nominals with proper noun heads are preferred to other analyses if they are capitalized. In upper-case-only texts, company names are preferred to other analyses because of the central role that companies play in the joint venture domain. However, in upper-case-only texts, common nouns and verbs are preferred to location names when ambiguity arises, because of the relatively large number of locations in the gazetteer that overlap with ordinary English words.

As an example, here is how the PHRASE PARSER ananlyzes the first sentence of the walkthrough example:

CN: "BRIDGESTONE " (0,1) Head: BRIDGESTONE
NG: "SPORTS " (1,2) Head: SPORTS
ACTIVE/PASSIVE: "SAID " (3,4) Head: SAID
NG: "FRIDAY " (4,5) Head: FRIDAY
NG: "IT " (5,6) Head: IT
ACTIVE: "HAS SET " (6,8) Head: SET
PREP: "UP " (8,9) Head: UP
NG: "JOINT-VENTURE " (9,12) Head: JOINT-VENTURE
PREP: "IN " (12,13) Head: IN
LOC: "TAIWAN " (13,14) Head: TAIWAN
PREP: "WITH " (14,15) Head: WITH
NG: "LOCAL CONCERN " (15,18) Head: CONCERN
CONJ: "AND " (18,19) Head: AND
NG: "JAPANESE TRADING HOUSE " (19,23) Head: HOUSE
INF: "TO PRODUCE " (23,25) Head: PRODUCE
NG: "GOLF CLUBS " (25,27) Head: CLUBS
INF: "TO BE " (27,29) Head: BE
ACTIVE/PASSIVE: "SHIPPED " (29,30) Head: SHIPPED
PREP: "TO " (30,31) Head: TO
LOC: "JAPAN " (31,32) Head: JAPAN

At this point the system now has entity objects representing a company named ``BRIDGESTONE,'' a ``JOINT-VENTURE'' and a ``LOCAL CONCERN.'' The local concern has a location of ``TAIWAN'' because that was the most recently mentioned location. The system did not realize that a noun group with the head ``HOUSE'' could refer to a company, so no entity is created for ``JAPANESE TRADING HOUSE.''


next up previous
Next: The PHRASE COMBINER Up: System Architecture Previous: The PREPROCESSOR
Jerry Hobbs 2004-02-24