Proposes tuple-level matches between two relations based on transformation weights and mapping rules
consolidate ( rel1, rel2, rules, weights, conditions: proposed-matches)
Returns all tuples from the set consisting of the collection of attributes from rel1 and rel2 that are marked as mapped. The transformation weights and mapping rules are passed as relations to the operator and along with the consolidation conditions, are used to map records from rel1 to matching records in rel2.
· If any transformation weight isn’t specified, its value will be set to 0.
· The names of a column used in the weights relation must match the column name used in the condition expression from rel1.
· The column names used in the rule strings stored in the rules relation must match the column names used in the weights relation.
· In the conditions, the left-hand side of an expression is assumed to belong to rel1 and the right-hand side of the expression to rel2.
· Mapping rules can only be separated by the AND clause in rules.
· Supported transformation weights are: Equality ,Prefix, Suffix, Word Add, Word Drop, Substring, Abbreviation, Soundex
None.
Using the input:
RELATION people1: name char, street char, city char, zip number
John Smith|11 Street St.|Los Angeles|90007
Steve Jones|65 Vent Blvd|Santa Monica|90034
Jack Johnson|88291 Off Road|Jackson| 82002
RELATION people2: name char, street char, state char, zip number
Smith, John|11 Street Street|CA|90007
Steve Collins|23 Vent St.|CA|90045
J Johnson|88291 Off Road|WY|82002
Sarah Kjoberg|443 Cool Lane|MN|55455
RELATION trans_weights: transformation char, column char, weight number
EQUALITY|name|0.87
EQUALITY|street|0.465
EQUALITY|zip|0.899
ABBREVIATION|name|0.22
ABBREVIATION|street|0.886
ABBREVIATION|zip|0
RELATION mapping_rules: rule_string char, mapped char
name>0.765 AND zip>0.88|Yes
street<0.65 AND zip<0.44|No
when executing the plan:
PLAN test
{
INPUT: stream people1, stream people2, stream mapping_rules, stream trans_weights
OUTPUT: stream matches
BODY
{
consolidate (people1, people2, mapping_rules,trans_weights, “name=name, street=street, zip=zip” : matches)
}
}
will generate the following output:
---------------------------------------------------
RELATION: test_matches
attrs: rec1name, rec1street, rec1city, rec1zip,
rec2name, rec2street, rec2state, rec2zip
---------------------------------------------------
John Smith|11 Street St.|Los Angeles|90007|Smith, John|11 Street Street|CA|90007
Jack Johnson|88291 Off Road|Jackson| 82002|J Johnson|88291 Off Road|WY|82002
---------------------------------------------------