Ellen Rilloff
"Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping"
12/10/1999: [time not recorded]
[location not recorded]
Abstract: Information extraction systems usually require two dictionaries: a
semantic lexicon and a dictionary of extraction patterns for the
domain. We will present a multi-level bootstrapping algorithm that
generates both the semantic lexicon and extraction patterns
simultaneously. As input, our technique requires only unannotated
training texts and a handful of seed words for a category. We use a
"mutual bootstrapping" technique to alternately select the best
extraction pattern for the category and bootstrap its extractions into
the semantic lexicon, which then becomes the basis for selecting the
next extraction pattern. To make this approach more robust, we add a
second level of bootstrapping (meta-bootstrapping) that retains only
the most reliable lexicon entries produced by mutual bootstrapping and
restarts the process. We evaluated this multi-level bootstrapping
technique on a collection of corporate web pages and a corpus of
terrorism news articles. The algorithm produced high-quality
dictionaries for several semantic categories.
Last updated: Mon Jun 19 17:44:06 2006
 |