Publications
A two-step blocking scheme learner for scalable link discovery
Abstract
A two-step procedure for learning a link-discovery blocking scheme is presented. Link discovery is the problem of linking entities between two or more datasets. Identifying owl: sameAs links is an important, special case. A blocking scheme is a one-to-many mapping from entities to blocks. Blocking methods avoid O (n2) comparisons by clustering entities into blocks, and limiting the evaluation of link specifications to entity pairs within blocks. Current link-discovery blocking methods use blocking schemes tailored for owl: sameAs links or that rely on assumptions about the underlying link specifications. The presented framework learns blocking schemes for arbitrary link specifications. The first step of the algorithm is unsupervised and performs dataset mapping between a pair of dataset collections. The second supervised step learns blocking schemes on structurally heterogeneous dataset pairs. Application to RDF is accomplished by representing the RDF dataset in property table form. The method is empirically evaluated on four real-world test collections ranging over various domains and tasks.
- Date
- 2014
- Authors
- Mayank Kejriwal, Daniel P Miranker
- Journal
- ISWC OM Workshop