Publications

Scalable generation of type embeddings using the ABox

Abstract

Structured knowledge bases gain their expressive power from both the ABox and TBox. While the ABox is rich in data, the TBox contains the ontological assertions that are often necessary for logical inference. The crucial links between the ABox and the TBox are served by is-a statements (formally a part of the ABox) that connect instances to types, also referred to as classes or concepts. Latent space embedding algorithms, such as RDF2Vec and TransE, have been used to great effect to model instances in the ABox. Such algorithms work well on large-scale knowledge bases like DBpedia and Geonames, as they are robust to noise and are low-dimensional and real-valued. In this paper, we investigate a supervised algorithm for deriving type embeddings in the same latent space as a given set of entity embeddings. We show that our algorithm generalizes to hundreds of types, and via incremental execution, achieves near-linear scaling on graphs with millions of instances and facts. We also present a theoretical foundation for our proposed model, and the means of validating the model. The empirical utility of the embeddings is illustrated on five partitions of the English DBpedia ABox. We use visualization and clustering to show that our embeddings are in good agreement with the manually curated TBox. We also use the embeddings to perform a soft clustering on 4 million DBpedia instances in terms of the 415 types explicitly participating in is-a relationships in the DBpedia ABox. Lastly, we present a set of results obtained by using the embeddings to recommend types for untyped instances. Our method is shown to outperform another feature …

Date
2017
Authors
Mayank Kejriwal, Pedro Szekely
Journal
Open Journal of Semantic Web (OJSW)
Volume
4
Issue
1
Pages
20-34
Publisher
RonPub