Learning the Common Structure of Data
Kristina Lerman and Steven Minton
Information Sciences Institute
Univ. of Southern California
Marina del Rey, CA 90292-6695
April 4, 2000
Abstract
The proliferation of online information sources has accentuated the need for tools that automatically validate
and recognize data. We present an efficient algorithm that learns structural information about data from
positive examples alone. We describe two Web wrapper maintenance applications that employ this algorithm.
The first application detects when a wrapper is not extracting correct data. The second application
automatically identifies data on Web pages so that the wrapper may be re-induced when the source
format changes.
(Full text)