Next: Conclusion
Up: System Architecture
Previous: The MERGER
The output of the PATTERN RECOGNIZER is raw templates. These
templates match the structure of the officially specified templates
rather closely, but they contain enough differences to require
normalization of the output before printing so they will meet the
specifications of the task. This task falls to the POST PROCESSOR.
The post processor is a rather complicated and task-specific piece of
code which performs several, mostly uninteresting functions.
The following tasks are assigned to the POST PROCESSOR:
- ENTITY-RELATIONSHIP objects are generated for entities involved
in joint ventures. (Subordinate ENTITY-RELATIONSHIPs are generated as
a result of patterns recognized when processing the text.)
- Ordered pair slots are constructed where required (The system
treats ordered-pair fills as full objects, as they were in the
original TIPSTER specifications, because this makes the merging
algorithm simpler).
- String fills are extracted from the original text, rather than
printed in the normalized, upper case form used by JV-FASTUS.
- Company names are extracted from the original text and
normalized to ensure compliance with the specifications.
- Locations are disambiguated and normalized using information
from the gazetteer.
- SIC codes for product-service strings are generated.
Associating these codes with strings is really black magic, and the
keys are very inconsistent, and in some cases clearly wrong on these.
We fill them in for those cases where we feel we can guess the right
answer at least 50 percent of the time.
- Dates are normalized and printed according to specifications.
Next: Conclusion
Up: System Architecture
Previous: The MERGER
Jerry Hobbs
2004-02-24