Publications
Exploiting secondary sources for automatic object consolidation
Abstract
Information sources on the web are controlled by different organizations or people, utilize different text formats, and have varying inconsistencies. Therefore, any system that integrates information from different data sources must consolidate data from these sources. Data from many data sources on the web may not contain enough information to accurately consolidate the data even using state of the art object consolidation systems. We present an approach to accurately and automatically consolidate data from various data sources by utilizing a state of the art object consolidation system in conjunction with a mediator system. The mediator system is able to automatically determine which secondary sources need to be queried in cases where the object consolidation system is unable to confidently determine whether two records refer to the same entity. In turn, the object consolidation system is then able to utilize this additional information to improve the accuracy of the consolidation between datasets.
- Date
- January 1, 1970
- Authors
- Martin Michalowski, Snehal Thakkar, Craig A Knoblock
- Journal
- Proceeding of 2003 KDD Workshop on Data Cleaning, Record Linkage, and Object Consolidation
- Pages
- 34-36