Karma

A Data Integration Tool

Karma is available as open source (Apache 2 License) Download »

Principal Investigators

  • >Craig Knoblock
    Craig Knoblock
  • Pedro Szekely
    Pedro Szekely

Karma is an information integration tool that enables users to quickly and easily integrate data from a variety of data sources including databases, spreadsheets, delimited text files, XML, JSON, KML and Web APIs. Users integrate information by modeling it according to an ontology of their choice using a graphical user interface that automates much of the process. Karma learns to recognize the mapping of data to ontology classes and then uses the ontology to propose a model that ties together these classes. Users then interact with the system to adjust the automatically generated model. During this process, users can transform the data as needed to normalize data expressed in different formats and to restructure it. Once the model is complete, users can published the integrated data as RDF or store it in a database.

All the project publications are here. The best paper on the technical aspects of Karma is our ESWC'2012 paper (bibtex), and the best application paper is our ESWC'2013 paper (bibtex), which received the best in-use paper award at the conference.

Karma Innovations

Ease of use:

Karma uses programming-by-example, learning techniques and a Steiner tree optimization algorithm to automate as much of the process as possible to enable end-users to map their data to a chosen ontology. Users adjust the automatically generated model using a graphical user interface and never see the complex mapping rules used in other systems.

Hierarchical sources:

Many systems have been developed to map tabular sources to ontologies. Karma is unique in that it also supports hierarchcal data sources such as XML, JSON and KML.

Web APIs:

In addition to static sources (databases and files), Karma supports data integration from Web APIs, enabling users to leverage the thousands of data sources that are available today via Web APIs.

Semantic models:

Karma uses ontologies as a basis for integrating infomation, leveraring the class and property hierarchies, domain and range information and other ontology constructs to help users integrate their data. Karma allows users to combine multiple ontologies to enable users to map their data to standard vocabularies.

Scalable processing:

Users work with a subset of their data to define the models that integrate their data sources. This enables Karma to offer a responsive user interface when users are defining the model that integrates their data. Karma can then use these models in batch mode to integrate large data sources.

Data transformations:

Karma offers a programming-by-example interface to enable users to define data transformation scripts that transform data expressed in multiple data formats into a common format.

Case Studies

Integration of Bio-Informatics Data

We used Karma to replicate the mappings done in a scenario from the Semantic MediaWiki Linked Data Extension (SMW-LDE) work [Becker et al.] where researchers integrated ABA, Uniprot, KEGG Pathway, PharmGKB and Linking Open Drug Data datasets by mapping them to a common ontology. Papers: ESWC'2012, ISWC'2011 Linked Science.

Mapping USC Faculty Data to VIVO

VIVO is a system to build researcher networks across institutions. In this case study we used Karma to map data about USC faculty to the VIVO ontology and to publish the data in the RDF format that the VIVO system can ingest. Karma enables users to ingest data in VIVO by interacting with an easy to use graphical user interface, and does not require knowledge of SPARQL or other Web languages such as XSLT or XQuery. The video shows how to ingest the sample files provided in the VIVO Data Ingest Guide.

Karma at the VIVO'2012 Conference in Miami, Florida: Abstract (PDF), Slides (Powerpoint), Ontology and datasets (zip)

Smithsonian American Art Museum

In this case study we used Karma to convert records of 44,000 of the museum’s holdings to Linked Open Data according to the Europeana Data Model (EDM). The records are stored in several tables in a SQL Server database. Using Karma we modeled these tables in terms of the EDM ontology and converted the data into RDF. We are creating a 5-star Linked Data, linked to DBpedia, the Getty Union List of Artist Names (ULAN)® and the NY Times Linked Data. The USC press release. Our paper on this work received the best in-use paper award at the ESWC'2013 conference: paper, slides. The Linked Data is now deployed: each time you visit an artist page in the Smithsonian American Art Museum web site, a SPARQL query is issued to retrieve links to Wikipedia and the NY Times.

Geospatial Data and Services

In this case study we show how Karma could be used to help first responders plan evacuations of affected personnel in the event of a fire in an oil field. We used Karma to integrate publicly available data about oil well locations available in MS Excel format, data about personnel locations from a text file, information about the extent of the fire and the location of care centers from a KML file. In this example, no detailed road network information is available for the region in question, so our software extracted the road network data from a USGS map. In this case study we also show how Karma can invoke services that perform complex geospatial reasoning to 1) subtract from the road network the roads that intersect the region affected by the fire, 2) compute the shortest evacuation path for each person avoiding roads that go through the fire, and 3) perform a simulation of the likely spread of the fire based on wind conditions extracted from a public weather service. Users can perform the information integration tasks, invoke the services interactively and visualize the results on a map using Karma.

Integration of Environmental Data

In this case study we used Karma to help an environmental scientists to construct a model of the metabolism of the Merced river in California. An important bottleneck that the scientists face is to prepare the data used to fit and run the models. In this case, data came from the California Data Exchange Center (CDEC), the scientists' own sensors, and weather information from NOAA. The CDEC and NOAA data was accessible via web services, and the scientists' data was available in CSV files. In addition, the data used different formats to represent dates, times and units, different time resolutions and contained errors. We used Karma to retrieve, clean, normalize, integrate and publish the data. Karma published the data in the format needed by the executable models, and produced semantic metadata that allowed the WINGS workflow system to help users compose the different parts of the workflow. In addition, Karma exported the data preparation procedures in a script that could be executed every day to produce fresh data. This allowed WINGS to automatically execute the workflow every day. Paper: ISWC'2011.

Rapidly Integrating Services into the Linked Data Cloud

The amount of data available in the Linked Data cloud continues to grow. Yet, few services consume and produce linked data. There is recent work that allows a user to define a linked service from an online service, which includes the specifications for consuming and producing linked data, but building such models is time consuming and requires specialized knowledge of RDF and SPARQL. We present a new approach that allows domain experts to rapidly create semantic models of services by demonstration in an interactive web-based interface. First, the user provides examples of the service request URLs. Then, the system automatically proposes a service model the user can refine interactively. Finally, the system saves a service specification using a new expressive vocabulary that includes lowering and lifting rules. This approach empowers end users to rapidly model existing services and immediately use them to consume and produce linked data. Paper: ISWC'2012.

Team

  • >Craig Knoblock
    Craig Knoblock
  • Pedro Szekely
    Pedro Szekely
  • Jose Luis Ambite
    Jose Luis Ambite
  • Shubham Gupta
    Shubham Gupta
  • Maria Muslea
    Maria Muslea
  • Mohsen Taheriyan
    Mohsen Taheriyan
  • Bo Wu
    Bo Wu
     
  • Yao-Yi Chiang
    Yao-Yi Chiang
  • Vineet Gadodia
    Vineet Gadodia
  • Shaarif Zia
    Shaarif Zia
     
  • Hao Zhang
    Hao Zhang
  • Jianliang Chen
    Jianliang Chen
  • Yuting Liu
    Yuting Liu
     
  • Shraddha Deshmukh
    Shraddha Deshmukh
  • Shrikanth Narayanan
    Shrikanth Narayanan
  • Ayush Jaiswal
    Ayush Jaiswal
  • Ying Zhang
    Ying Zhang
  • Animesh Manglik
    Animesh Manglik
  • Akshay Ramesh Dani
    Akshay Ramesh Dani

Publications

A Graph-based Approach to Learn Semantic Descriptions of Data Sources. Taheriyan, M.; Knoblock, C. A.; Szekely, P.; and Ambite, J. L. In Proceedings of the 12th International Semantic Web Conference (ISWC 2013), 2013.
A Graph-based Approach to Learn Semantic Descriptions of Data Sources [.pdf]Paper A Graph-based Approach to Learn Semantic Descriptions of Data Sources [bib]Bibtex   48 downloads  
Learning Data Transformation Rules through Examples: Preliminary Results. Wu, B.; Szekely, P.; and Knoblock, C. A. In Ninth International Workshop on Information Integration on the Web (IIWeb 2012), 2012.
Learning Data Transformation Rules through Examples: Preliminary Results [.pdf]Slides Learning Data Transformation Rules through Examples: Preliminary Results [.pdf]Paper Learning Data Transformation Rules through Examples: Preliminary Results [bib]Bibtex   5 downloads  
Using Conditional Random Fields to Exploit Token Structure and Labels for Accurate Semantic Annotation. Goel, A.; Knoblock, C. A.; and Lerman, K. In Proceedings of the 25th National Conference on Artificial Intelligence (AAAI-11), San Francisco, CA, 2011.
Using Conditional Random Fields to Exploit Token Structure and Labels for Accurate Semantic Annotation [.pdf]Presentation Using Conditional Random Fields to Exploit Token Structure and Labels for Accurate Semantic Annotation [.pdf]Paper Using Conditional Random Fields to Exploit Token Structure and Labels for Accurate Semantic Annotation [bib]Bibtex   2 downloads  
Building Data Integration Queries by Demonstration. Tuchinda, R.; Szekely, P.; and Knoblock, C. A. In Proceedings of the International Conference on Intelligent User Interface, January 2007.
Building Data Integration Queries by Demonstration [.pdf]Presentation Building Data Integration Queries by Demonstration [.pdf]Paper Building Data Integration Queries by Demonstration [bib]Bibtex   8 downloads  
Connecting the Smithsonian American Art Museum to the Linked Data Cloud. Szekely, P.; Knoblock, C. A.; Yang, F.; Zhu, X.; Fink, E.; Allen, R.; and Goodlander, G. In Proceedings of the 10th Extended Semantic Web Conference, Montpellier, May 2013. Awarded Best In-Use Paper at ESWC 2013.
Connecting the Smithsonian American Art Museum to the Linked Data Cloud [.com/watch?v=1Vaytr09H1w&feature=share&list=PLdD4tO6i0DwMXmBhOwlt7zEZKoOWrWpOA]Youtube Connecting the Smithsonian American Art Museum to the Linked Data Cloud [.re/18vxMnn]Slideshare Connecting the Smithsonian American Art Museum to the Linked Data Cloud [.pdf]Paper Connecting the Smithsonian American Art Museum to the Linked Data Cloud [bib]Bibtex   16 downloads  
Mapping Existing Data Sources into VIVO. Knoblock; A, C.; Szekely, P.; Muslea, M.; and Gupta, S. . August 2012.
Mapping Existing Data Sources into VIVO [.pdf]Paper Mapping Existing Data Sources into VIVO [bib]Bibtex   8 downloads  
Exploiting Semantics of Web Services for Geospatial Data Fusion. Szekely, P.; Knoblock, C. A.; Gupta, S.; Taheriyan, M.; and Wu, B. In Proceedings of the SIGSPATIAL International Workshop on Spatial Semantics and Ontologies (SSO 2011), Chicago, IL, 2011.
Exploiting Semantics of Web Services for Geospatial Data Fusion [.pdf]Slides Exploiting Semantics of Web Services for Geospatial Data Fusion [.pdf]Paper Exploiting Semantics of Web Services for Geospatial Data Fusion [bib]Bibtex   8 downloads  
Rapidly Integrating Services into the Linked Data Cloud. Taheriyan, M.; Knoblock, C. A.; Szekely, P.; and Ambite, J. L. In Proceedings of the 11th International Semantic Web Conference (ISWC 2012), 2012.
Rapidly Integrating Services into the Linked Data Cloud [.com/watch?v=3L1G5kh5jYg&list=PLdD4tO6i0DwMXmBhOwlt7zEZKoOWrWpOA&index=10]Youtube Rapidly Integrating Services into the Linked Data Cloud [.pdf]Slidesresentation Rapidly Integrating Services into the Linked Data Cloud [.pdf]Poster Rapidly Integrating Services into the Linked Data Cloud [.pdf]Paper Rapidly Integrating Services into the Linked Data Cloud [bib]Bibtex   45 downloads  
Building Mashups by Demonstration. Tuchinda, R.; Knoblock, C. A.; and Szekely, P. ACM Transactions on the Web (TWEB), 5(3). July 2011.
Building Mashups by Demonstration [.pdf]Paper Building Mashups by Demonstration [.1993058]Link Building Mashups by Demonstration [bib]Bibtex   16 downloads  
Exploiting Structure within Data for Accurate Labeling Using Conditional Random Fields. Goel, A.; Knoblock; A, C.; and Lerman, K. In Proceedings of the 14th International Conference on Artificial Intelligence (ICAI), 2012.
Exploiting Structure within Data for Accurate Labeling Using Conditional Random Fields [.pdf]Slides Exploiting Structure within Data for Accurate Labeling Using Conditional Random Fields [.pdf]Paper Exploiting Structure within Data for Accurate Labeling Using Conditional Random Fields [bib]Bibtex   2 downloads  
Semi-Automatically Mapping Structured Sources into the Semantic Web. Knoblock, C. A.; Szekely, P.; Ambite, J. L.; Gupta, S.; Goel, A.; Muslea, M.; Lerman, K.; and Mallick, P. In Proceedings of the Extended Semantic Web Conference, Crete, Greece, 2012.
Semi-Automatically Mapping Structured Sources into the Semantic Web [.com/watch?v=kUIqTI56oeQ&feature=share&list=PLdD4tO6i0DwMXmBhOwlt7zEZKoOWrWpOA]]Youtube Semi-Automatically Mapping Structured Sources into the Semantic Web [.pdf]Presentation Semi-Automatically Mapping Structured Sources into the Semantic Web [.pdf]Paper Semi-Automatically Mapping Structured Sources into the Semantic Web [bib]Bibtex   21 downloads  
Interactively Mapping Data Sources into the Semantic Web. Knoblock, C. A.; Szekely, P.; Ambite, J. L.; Gupta, S.; Goel, A.; Muslea, M.; Lerman, K.; and Mallick, P. In Proceedings of the First International Workshop on Linked Science 2011 in Conjunction with the 10th International Semantic Web Conference, Bonn, Germany, 2011.
Interactively Mapping Data Sources into the Semantic Web [.pdf]Slides Interactively Mapping Data Sources into the Semantic Web [.pdf]Paper Interactively Mapping Data Sources into the Semantic Web [.org/Vol-783/]Link Interactively Mapping Data Sources into the Semantic Web [bib]Bibtex   1 download  
Building Geospatial Mashups to Visualize Information for Crisis Management. Gupta, S.; and Knoblock, C. A. In Proceedings of the 7th International Conference on Information Systems for Crisis Response and Management, 2010.
Building Geospatial Mashups to Visualize Information for Crisis Management [.pdf]Presentation Building Geospatial Mashups to Visualize Information for Crisis Management [.pdf]Paper Building Geospatial Mashups to Visualize Information for Crisis Management [bib]Bibtex   4 downloads  
Semi-Automatically Modeling Web APIs to Create Linked APIs. Taheriyan, M.; Knoblock, C. A.; Szekely, P.; and Ambite, J. L. In Proceedings of the ESWC 2012 Workshop on Linked APIs, 2012.
Semi-Automatically Modeling Web APIs to Create Linked APIs [.com/pu5smfc7iexa/giscience-2012-generating-named-road-vector-data-from-raster-maps/]Presentation Semi-Automatically Modeling Web APIs to Create Linked APIs [.pdf]Paper Semi-Automatically Modeling Web APIs to Create Linked APIs [bib]Bibtex   15 downloads  
Mind Your Metadata: Exploiting Semantics for Configuration, Adaptation, and Provenance in Scientific Workflows. Gil, Y.; Szekely, P.; Villamizar, S.; Harmon, T. C.; Ratnakar, V.; Gupta, S.; Muslea, M.; Silva, F.; and Knoblock, C. A. In Proceedings of the 10th International Semantic Web Conference (ISWC 2011), 2011.
Mind Your Metadata: Exploiting Semantics for Configuration, Adaptation, and Provenance in Scientific Workflows [.pdf]Slides Mind Your Metadata: Exploiting Semantics for Configuration, Adaptation, and Provenance in Scientific Workflows [.pdf]Paper Mind Your Metadata: Exploiting Semantics for Configuration, Adaptation, and Provenance in Scientific Workflows [bib]Bibtex   1 download  
Building Mashups by Example. Tuchinda, R.; Szekely, P.; and Knoblock, C. A. In Proceedings of the 2008 International Conference on Intelligent User Interface, January 2008.
Building Mashups by Example [.pdf]Presentation Building Mashups by Example [.pdf]Paper Building Mashups by Example [bib]Bibtex   1 download