Publications

KR2RML: An Alternative Interpretation of R2RML for Heterogenous Sources.

Abstract

Data sets are generated today at an ever increasing rate in a host of new formats and vocabularies, each with its own data quality issues and limited, if any, semantic annotations. Without semantic annotation and cleanup, integrating across these data sets is difficult. Approaches exist for integration by semantically mapping such data using R2RML and its extension for heterogeneous sources, RML, into RDF, but they are not easily extendable or scalable, nor do they provide facilities for cleaning. We present an alternative interpretation of R2RML paired with a source-agnostic R2RML processor that supports data cleaning and transformation. With this approach, it is easy to add new input and output formats without modifying the language or the processor, while supporting the efficient cleaning, transformation, and generation of billion triple datasets.

Date
October 12, 2015
Authors
Jason Slepicka, Chengye Yin, Pedro A Szekely, Craig A Knoblock
Conference
Cold