The next phase accepts the lexical items combined by the preprocessor as input and produces a sequence of phrases as output. The head of each phrase is identified, and if the head of the phrase corresponds to an object in the domain for which a template object is defined, then an object of the appropriate type is associated with the phrase. For example, if the noun group is ``the Japanese company,'' the noun group is associated with an ENTITY object whose NATIONALITY slot is Japan.
The phrase parser constructs phrases that can be reliably described as a regular language. Attachment ambiguities are preserved for later phases where they will either be ignored as irrelevant, or combined on the basis of domain specific patterns when the combination can be done reliably.
The basic grammar of English used in this phase is a superset of that used in the MUC-4 FASTUS system. The main differences involve more detailed processing of numbers consisting of mixed numeric and sybolic parts (e.g. 3 million), currency phrases (e.g. DM 2500), and the recognition of bank names. Possible companies are treated as proper nouns and can be combined to form noun groups just as other proper nouns referring to locations, companies or people.
Lexical ambiguity can lead to multiple analyses at the end of the parsing phase. In general, longer phrases are prefered to shorter ones. In mixed case texts, nominals with proper noun heads are preferred to other analyses if they are capitalized. In upper-case-only texts, company names are preferred to other analyses because of the central role that companies play in the joint venture domain. However, in upper-case-only texts, common nouns and verbs are preferred to location names when ambiguity arises, because of the relatively large number of locations in the gazetteer that overlap with ordinary English words.
As an example, here is how the PHRASE PARSER ananlyzes the first sentence of the walkthrough example:
CN: "BRIDGESTONE " (0,1) Head: BRIDGESTONE NG: "SPORTS " (1,2) Head: SPORTS ACTIVE/PASSIVE: "SAID " (3,4) Head: SAID NG: "FRIDAY " (4,5) Head: FRIDAY NG: "IT " (5,6) Head: IT ACTIVE: "HAS SET " (6,8) Head: SET PREP: "UP " (8,9) Head: UP NG: "JOINT-VENTURE " (9,12) Head: JOINT-VENTURE PREP: "IN " (12,13) Head: IN LOC: "TAIWAN " (13,14) Head: TAIWAN PREP: "WITH " (14,15) Head: WITH NG: "LOCAL CONCERN " (15,18) Head: CONCERN CONJ: "AND " (18,19) Head: AND NG: "JAPANESE TRADING HOUSE " (19,23) Head: HOUSE INF: "TO PRODUCE " (23,25) Head: PRODUCE NG: "GOLF CLUBS " (25,27) Head: CLUBS INF: "TO BE " (27,29) Head: BE ACTIVE/PASSIVE: "SHIPPED " (29,30) Head: SHIPPED PREP: "TO " (30,31) Head: TO LOC: "JAPAN " (31,32) Head: JAPAN
At this point the system now has entity objects representing a company named ``BRIDGESTONE,'' a ``JOINT-VENTURE'' and a ``LOCAL CONCERN.'' The local concern has a location of ``TAIWAN'' because that was the most recently mentioned location. The system did not realize that a noun group with the head ``HOUSE'' could refer to a company, so no entity is created for ``JAPANESE TRADING HOUSE.''