USC/ISI Automatic Data Extraction Demo

Step 6: Find correct examples of CITY

After the extracts are clustered, each cluster is given a score according to how similar it is to the set of training examples. The clusters are printed in descending order according to their scores. Usually, the top cluster contains the correct examples of the data field, and only the correct examples.
Extract another field
or Back to beginning

Cities identified by the algorithm are highlighted in red and shown in context.

Cluster 37, score 15.0000
From Page 4
    T Schumer Brooklyn , NY
    Sheila Schumer 100 Caton Ave Brooklyn , NY
    Marvin Schumer 1938 82nd St Brooklyn , NY
    Charles Schumer 9 Prospect Park W Brooklyn , NY
    Charles Schumer 1628 Kings Hwy Brooklyn , NY
From Page 3
    E Presley 1138 N Mcneil St Memphis , TN
From Page 1
    A Smith Orlando , FL
    A Smith 777 W Lancaster Rd Orlando , FL
    A Smith 539 El Vedado Ave Orlando , FL
    A Smith 5118 City St Orlando , FL
    A Smith 5 Channing Ave Orlando , FL
    A Smith 3214 Dupree Ave Orlando , FL
    A Smith 2633 Bent Willow Cir Orlando , FL
    A Smith 206 E Concord St Orlando , FL
    A Smith 20 W Lucerne Cir Orlando , FL
    A Smith 1041 Colyer St Orlando , FL


Extract another field
or Back to beginning



Copyright: USC Information Sciences Institute 2000