Inducing Source Definitions: System Demo

This is a simple offline demo, which demonstrates the capabilities of are source induction system to learn definitions for a new service based on a set of known services. For an explanation of the research work underpinning this demo, click here.

System Input

Input to the system consists of a mediated schema (a set of semantic types and domain relations), a set of source definitions, and a target source (for which the definition will be learnt). Input for a particular problem is shown below. The corresponding system input file found can be found here.

The Mediated Schema

Semantic Types:
Domain Predicates:
Comparison Predicates:

The Sources

The Target Predicate



The Search

The system performs a best-first search through the space of plausible source definitions, evaluating each definition in turn by invoking the relevant sources. The different levels of search are shown below. At each level, the best performing partial definition is expanded and the score for each expanded definition is calculated (shown in front of the definition).

In the output below, the search is performed over combinations of the source predicates (rather than the domain predicates). Doing so eliminates the need to perform query reformulation, but would be far less efficient in domains where multiple known sources are providing the same or similar functionality. In this domain doing the search over the source predicates doesn't change the results.

Search Level 0

Expanding: .

Search Level I

Expanding: ZipCodesWithin(_,_,_,_).

Search Level II

Expanding: ZipCodesWithin(X0,_,X2,X3) :- DistanceBetweenZips(X2,X0,X3).

The last candidate evaluated at search level II has the highest score at that level and is not improved upon by the next level of search (not shown). Thus it is returned by the best-first search algorithm. The definition can be rewritten (by giving more intuitive names to the variables), as:

ZipCodesWithin($zip1,$dist1,zip2,dist2) :- DistanceBetweenZips(zip2,zip1,dist2), <(dist2,dist1).

The description states that the source returns zip codes (zip2) which lie within a given radius (dist1) of the input zip code (zip1), along with the respective distances (dist2). This is indeed the correct definition for the target predicate.


For questions or comments regarding this demo please contact Mark Carman