Publications
Wrapper generation for semi-structured internet sources
Abstract
With the current explosion of information on the World Wide Web (WWW) a wealth of information on many different subjects has become available on-line. Numerous sources contain information that can be classified as semi-structured. At present, however, the only way to access the information is by browsing individual pages. We cannot query web documents in a database-like fashion based on their underlying structure. However, we can provide database-like querying for semi-structured WWW sources by building wrappers around these sources. We present an approach for semi-automatically generating such wrappers. The key idea is to exploit the formatting information in pages from the source to hypothesize the underlying structure of a page. From this structure the system generates a wrapper that facilitates querying of a source and possibly integrating it with other sources. We demonstrate the ease with …
- Date
- December 1, 1997
- Authors
- Naveen Ashish, Craig A Knoblock
- Journal
- ACM Sigmod Record
- Volume
- 26
- Issue
- 4
- Pages
- 8-15
- Publisher
- ACM