Consolidate

Proposes tuple-level matches between two relations based on transformation weights and mapping rules

 

Usage

consolidate ( rel1, rel2, rules, weights, conditions: proposed-matches)

 

Details

Returns all tuples from the set consisting of the collection of attributes from rel1 and rel2 that are marked as mapped. The transformation weights and mapping rules are passed as relations to the operator and along with the consolidation conditions, are used to map records from rel1 to matching records in rel2.

 

Notes

·  If any transformation weight isn’t specified, its value will be set to 0.

·  The names of a column used in the weights relation must match the column name used in the condition expression from rel1.

·  The column names used in the rule strings stored in the rules relation must match the column names used in the weights relation.

·  In the conditions, the left-hand side of an expression is assumed to belong to rel1 and the right-hand side of the expression to rel2.

·  Mapping rules can only be separated by the AND clause in rules.

·  Supported transformation weights are: Equality ,Prefix, Suffix, Word Add, Word Drop, Substring, Abbreviation, Soundex

 

Known Bugs

None.

 

Example

Using the input:

RELATION people1: name char, street char, city char, zip number

John Smith|11 Street St.|Los Angeles|90007

Steve Jones|65 Vent Blvd|Santa Monica|90034

Megan Overton|7 Cons Street|Minneapolis|55414

Jack Johnson|88291 Off Road|Jackson| 82002

Fred Philly|5469 Giza St.|Seattle|98195

 

RELATION people2: name char, street char, state char, zip number

Smith, John|11 Street Street|CA|90007

Steve Collins|23 Vent St.|CA|90045

J Johnson|88291 Off Road|WY|82002

Sarah Kjoberg|443 Cool Lane|MN|55455

 

 

 

 

 

RELATION trans_weights: transformation char, column char, weight number

        EQUALITY|name|0.87

        EQUALITY|street|0.465

        EQUALITY|zip|0.899

        ABBREVIATION|name|0.22

        ABBREVIATION|street|0.886

        ABBREVIATION|zip|0

 

RELATION mapping_rules: rule_string char, mapped char

        name>0.765 AND zip>0.88|Yes

        street<0.65 AND zip<0.44|No

        

when executing the plan:

PLAN test

{

INPUT: stream people1, stream people2, stream mapping_rules, stream trans_weights

OUTPUT: stream matches

BODY

{

            consolidate (people1, people2, mapping_rules,trans_weights, “name=name, street=street, zip=zip” : matches)

}

}

will generate the following output:

---------------------------------------------------

RELATION: test_matches

attrs: rec1name, rec1street, rec1city, rec1zip,

        rec2name, rec2street, rec2state, rec2zip

---------------------------------------------------

John Smith|11 Street St.|Los Angeles|90007|Smith, John|11 Street Street|CA|90007

Jack Johnson|88291 Off Road|Jackson| 82002|J Johnson|88291 Off Road|WY|82002

---------------------------------------------------