Publications

Wrapper Maintenance.

Abstract

A Web wrapper is a software application that extracts information from a semi-structured source and converts it to a structured format. While semi-structured sources, such as Web pages, contain no explicitly specified schema, they do have an implicit grammar that can be used to identify relevant information in the document. A wrapper learning system analyzes page layout to generate either grammar-based or “landmark”-based extraction rules that wrappers use to extract data. As a consequence, even slight changes in the page layout can break the wrapper and prevent it from extracting data correctly. Wrapper maintenance is a composite task that (1) verifies that the wrapper continues to extract data correctly from a source, and (2) repairs the wrapper so that it works on the changed pages.

Date
September 10, 2025
Authors
Kristina Lerman, Craig A Knoblock
Book
Encyclopedia of Database Systems
Pages
3565-3569