Publications
Wrapper Maintenance.
Abstract
A Web wrapper is a software application that extracts information from a semi-structured source and converts it to a structured format. While semi-structured sources, such as Web pages, contain no explicitly specified schema, they do have an implicit grammar that can be used to identify relevant information in the document. A wrapper learning system analyzes page layout to generate either grammar-based or “landmark”-based extraction rules that wrappers use to extract data. As a consequence, even slight changes in the page layout can break the wrapper and prevent it from extracting data correctly. Wrapper maintenance is a composite task that (1) verifies that the wrapper continues to extract data correctly from a source, and (2) repairs the wrapper so that it works on the changed pages.
- Date
- September 10, 2025
- Authors
- Kristina Lerman, Craig A Knoblock
- Book
- Encyclopedia of Database Systems
- Pages
- 3565-3569