Semantic labeling of online information sources

Abstract

In order to combine data from various heterogeneous sources, software agents must first understand the semantics of the sources, expressed in the source model. Currently, source modeling is manual, but as large numbers of sources come online, it is impractical to expect users to continue modeling them by hand. We describe two machine learning techniques for automatically modeling information sources: one that uses source’s metadata, contained in a Web Service Definition file, and one that uses the source’s content, to classify the semantics of the data it uses. We go beyond previous works and verify predictions by invoking the source with sample data of the predicted type. We provide performance results of both methods and validate our approach on several live Web sources. In addition, we describe the application of semantic modeling within the CALO project.

Date: July 1, 2007
Authors: Kristina Lerman, Anon Plangprasopchock, Craig A Knoblock
Journal: International Journal on Semantic Web and Information Systems (IJSWIS)
Volume: 3
Issue: 3
Pages: 36-56
Publisher: IGI Global