Linking and Building Ontologies of Linked Data
Rahul Parundekar, Craig A. Knoblock and José-Luis Ambite
University of Southern California,
Information Sciences Institute
4676 Admiralty Way, Marina del Rey, CA 90292
{parundek,knoblock,ambite@isi.edu}
This page provides the dataset used in the paper on Linking and Building Ontologies of Linked Data.
-
This dataset is organized as follows:
- The 5 source pairs, discussed in the paper, each have a compressed file containing the instance pairs input to the algorithm and the alignments generated by the algorithm from it.
- Each compressed file contains 3 Comma separated variables (*.csv) files containing the instance pairs, alignments before and after post processing. There are two data sources that are being aligned source1 and source2 (which may be the same). in each of these files
- The instancepairs_source1_source2 file:
- Each row in the file represents an instance pair which is a join of the flattened property-value pairs of the instances from each source(see the paper), where the join is on the property that asserts instance equivalence
- The first line in the csv file lists the properties in that source. It is of the form
uri_1, property1_source1, ..., propertyn_source1, uri_2, property1_source2, ..., propertym_source2
- The other lines in the file contain a URI of the instance from the first source, the values of the properties under each of its columns ('?' if no value exists) and a similar vector for the URI of the second source
- Preprocessing has already been performed on these instances.
- The alignments_source1_source2 file:
- Each row in the file represents an alignment generated by the algorithm along with the stats that support that hypothesis
- The first line in the csv file contains the column headings.
The columns in this file are
- Restriction class from Ontology 1: (R1) property-value pairs representing restriction class from Source/Ontology 1
- Restriction class from Ontology 2: (R2) property-value pairs representing restriction class from Source/Ontology 2
- |Img(R1) int R2| / |Img(R1)|: support score for the alignment from the first source. (See R from paper in Fig. 5 Metrics)
- |Img(R1) int R2| / |R2|: support score for the alignment from the second source. (See P from paper in Fig. 5 Metrics)
- Relation: Equvivalent, R1 subset R2 or R2 subset R1
- Size of Intersection
- Size of Restriction 1
- Size of Restriction 2
- These alignments were produced by the algorithm described in the paper.
- Important Note: Alignments still have pending post-processing
- The results_source1_source2 file:
- Each row in the file represents an alignment generated by the algorithm after post-processing along with the stats
- The columns in the file are similar to the alignments_source1_source2 file.
These columns are:
- Restriction class from Ontology 1: (R1) property-value pairs representing restriction class from Source/Ontology 1
- Restriction class from Ontology 2: (R2) property-value pairs representing restriction class from Source/Ontology 2
- |Img(R1) int R2| / |Img(R1)|: support score for the alignment from the first source. (See R from paper in Fig. 5 Metrics)
- |Img(R1) int R2| / |R2|: support score for the alignment from the second source. (See P from paper in Fig. 5 Metrics)
- Relation: Equvivalent, Similar, R1 subset R2 or R2 subset R1
- Size of Intersection
- Size of Restriction 1
- Size of Restriction 2
- These are the alignments resulting after post processing
- Sorting the .csv file on 'Relation' (alphabetical) and then 'Size of Intersection' (descending) gives a clear picture of the alignments generated. One can use any spreadsheet editor that handles .csv files
-
Dataset Files:
In case of questions or comments please contact me (Rahul Parundekar) at parundek at usc.edu or any of the authors as on the paper.