Information extraction is the process of scanning text for information relevant to some interest, including extracting entities, relations, and, most challenging, events--or who did what to whom. It requires deeper analysis than key word searches, but its aims fall short of the very hard and long-term problem of text understanding, where we seek to capture all the information in a text, along with the speakers' or writer's intention. Information extraction represents a midpoint on this spectrum, where the aim is to capture structured information without sacrificing feasibility.
Information extraction technology arose in response to the need for efficient processing of texts in specialized domains. Full-sentence parsers expended a lot of effort in trying to arrive at parses of long sentences that were not relevant to the domain, or which contained much irrelevant material, thereby increasing the chances for error. Information extraction technology, by contrast, focuses in on only the relevant parts of the text and ignores the rest.
In the last ten years, the technology of information extraction has advanced significantly. It has been applied primarily to domains of economic and military interest. There are now initial efforts to apply it to biomedical text (e.g., Humphreys et al., 2000; Thomas et al., 2000), and the time is ripe for further research.