next up previous
Next: The MERGER Up: System Architecture Previous: The PHRASE COMBINER

The DOMAIN PATTERN RECOGNIZER

The DOMAIN PATTERN RECOGNIZER does the most critical work of the system by recognizing phrases that establish the most important relationships to be extracted. The DOMAIN PATTERN RECOGNIZER takes the output of the PHRASE COMBINER as input, and produces raw templates as output.

The PATTERN RECOGNIZER of the MUC-4 system had only one subphase, but it was recognized that because of the limited development time available we could not possibly account for all the possible ways joint venture relationships could be expressed. Therefore it was decided to implement the JV-FASTUS PATTERN RECOGNIZER as a multiphase process. The output of the earlier phases would be kept by the system only as long as they were consistent with output found in the later phases. Thus the earlier phases of the PATTERN RECOGNIZER could be used to implement extremely general, loose patterns that could serve as defaults that could be defeated by the output of more precise, specific patterns at higher levels.

Inspection of the corpus indicated that there are three basic, general patterns that indicate joint venture relationships with surprisingly high reliability. They are

The first pattern means that there are at least two occurrances of company names preceeding the words ``joint venture'' (ignoring all other words) and a single company name following the word ``joint venture.'' The parent entities are the first set of companies, and the joint venture entitiy is the singular one. Typical instances of this pattern are ``The Toyota - General Motors joint venture, NUMMI...'' and ``IBM and Intel formed a joint venture called Foobarco.'' The second pattern is the ``passive'' variant of the first (although verb groups and their properties are completely ignored!) which matches sentences like ``Foobarco is a joint venture formed by IBM and Intel.'' Finally the third pattern matches sentences that don't meet the number constraints of the above pattern. In that case, all of the entities are parents. An example is ``IBM formed a joint venture with Intel to produce mainframes in Timbuktu.''

It is, of course, easy to think of counterexamples to the above patterns. The above patterns help recall much more than they hurt precision, however, because they are only defaults that can be defeated by more precise information. We were initially skeptical that the inclusion of such vague patterns would actually enhance system performance. However, a test showed that they improved the system's F-metric by approximately 6 points.

We eventually settled on three levels for the PATTERN RECOGNIZER. The first level consisted of the above patterns, the second level consisted of a very general pattern for recognizing ownership percentages with active verbs (which, like the above patterns, never actually examined the verbs or their properties), and the third level included a similar pattern for passive ownership percentages (which is more constrained because of the frequent use of the preposition ``by'') together with more obviously motivated patterns for joint ventures and products.

In the walkthrough example in the first sentence, the system recognized the pattern ``BRIDGESTONE ... SAID IT HAS SET UP A JOINT VENTURE ... WITH A LOCAL CONCERN.'' This pattern led to a tie-up relationship with Bridgestone and a company as parent entities. The adverbial ``IN TAIWAN'' was recognized nondeterministically by a different pattern which caused ``TAIWAN'' to be recorded as a default location for the joint venture, and to provide a referent for ``local'' in ``LOCAL CONCERN.'' As mentioned previously, the system did not realize that ``JAPANESE TRADING HOUSE'' was a company.


next up previous
Next: The MERGER Up: System Architecture Previous: The PHRASE COMBINER
Jerry Hobbs 2004-02-24