Publications

Machine Learning Techniques for Wrapper Maintenance

Abstract

The proliferation of online information has led to an increased use of wrappers for extracting data from Web sources and transforming it to a structured format. The resulting data can then be used to build new enterprise applications. While most of the previous research has focused on quick and efficient generation of wrappers, the development of tools for wrapper maintenance has received less attention. This is an important problem, because Web sources often change in ways that prevent the wrappers from operating correctly. In this chapter, we describe machine learning techniques for verifying that a wrapper is working correctly and repairing it if not. Our approach is to learn structural descriptions of data and use these descriptions to verify that the wrapper is correctly extracting data. The repair algorithm automatically recovers from Web source format changes by identifying data so that a new wrapper may be …

Date
September 22, 2025
Authors
Kristina Lerman, Steven N Minton, Craig A Knoblock
Book
Virtual Enterprise Integration: Technological and Organizational Perspectives
Pages
334-350
Publisher
IGI Global