Publications
Wrapper induction by hierarchical data analysis
Abstract
An inductive algorithm, denominated STALKER, generat ing high accuracy extraction rules based on user-labeled training examples. With the tremendous amount of infor mation that becomes available on the Web on a daily basis, the ability to quickly develop information agents has become a crucial problem. A vital component of any Web based information agent is a set of wrappers that can extract the relevant data from Semistructured information Sources.
The novel approach to wrapped induction provided herein is based on the idea of hierarchical information extraction, which turns the hard problem of extracting data from an arbitrarily complex document into a Series of easier extrac tion tasks. Labeling the training data represents the major bottleneck in using wrapper induction techniques, and experimental results show that STALKER performs signifi cantly better than other approaches; on one hand …
- Date
- August 12, 2003
- Authors
- I Muslea, S Minton, CA Knoblock
- Inventors
- Ion Muslea, Steven Minton, Craig A Knoblock
- Patent_office
- US
- Patent_number
- 6606625
- Application_number
- 09587528