Exploiting structure within data for accurate labeling using conditional random fields

Abstract

Automatically assigning semantic class labels such as WindSpeed, Flight Number and Address to data obtained from structured sources including databases or web pages is an important problem in data integration since it enables the researchers to identify the contents of these sources. Automatic semantic annotation is difficult because of the variety of formats used for each semantic type (eg, Date) as well as the similarity between different semantic types (eg, Humidity and Chance of Precipitation). In this paper, we show that by exploiting different kinds of latent structure within data we can perform this task accurately. We show that this improvement happens in spite of higher complexity in terms of both the inference procedure and the increased number of labels. We study how increasing the amount of structure taken into account by the model improves accuracy of semantic labeling. Finally, we show that when exploiting all the relationships, we obtain a significant improvement in field labeling accuracy over the regular-expression-based approach, while still keeping the complexity low.

Date: 2012
Authors: Aman Goel, Craig A Knoblock, Kristina Lerman
Journal: Proceedings on the International Conference on Artificial Intelligence (ICAI)
Pages: 1
Publisher: The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp)

View Paper

Information Sciences Institute

Publications

Exploiting structure within data for accurate labeling using conditional random fields

Abstract