Publications
View Validation: A Case Study for Wrapper Induction and Text Classification
Abstract
Wrapper induction algorithms, which use labeled examples to learn extraction rules, are a crucial component of information agents that integrate semi-structured information sources. Multi-view wrapper induction algorithms reduce the amount of training data by exploiting several types of rules (ie, views), each of which being sufficient to extract the relevant data. All multiview algorithms rely on the assumption that the views are sufficiently compatible for multi-view learning (ie, most examples are labeled identically in all views). In practice, it is unclear whether or not two views are sufficiently compatible for solving a new, unseen learning task. In order to cope with this problem, we introduce a view validation algorithm: given a learning task, the algorithm predicts whether or not the views are sufficiently compatible for solving that particular task. We use information acquired while solving several exemplar learning tasks to train a classifier that discriminates between the tasks for which the views are sufficiently and insufficiently compatible for multi-view learning. For both wrapper induction and text classification, view validation requires only a modest amount of training data to make high accuracy predictions.
- Date
- September 22, 2025
- Authors
- Ion Muslea, Steven Minton, Craig A Knoblock