Publications
Exploiting structure within data for accurate labeling using conditional random fields
Abstract
Automatically assigning semantic class labels such as WindSpeed, Flight Number and Address to data obtained from structured sources including databases or web pages is an important problem in data integration since it enables the researchers to identify the contents of these sources. Automatic semantic annotation is difficult because of the variety of formats used for each semantic type (eg, Date) as well as the similarity between different semantic types (eg, Humidity and Chance of Precipitation). In this paper, we show that by exploiting different kinds of latent structure within data we can perform this task accurately. We show that this improvement happens in spite of higher complexity in terms of both the inference procedure and the increased number of labels. We study how increasing the amount of structure taken into account by the model improves accuracy of semantic labeling. Finally, we show that when exploiting all the relationships, we obtain a significant improvement in field labeling accuracy over the regular-expression-based approach, while still keeping the complexity low.
- Date
- January 17, 2026
- Authors
- Aman Goel, Craig A Knoblock, Kristina Lerman
- Journal
- Proceedings on the International Conference on Artificial Intelligence (ICAI)
- Pages
- 1
- Publisher
- The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp)