USC/ISI Automatic Data Extraction Demo

Step 6: Find correct examples of ISBN

After the extracts are clustered, each cluster is given a score according to how similar it is to the set of training examples. The clusters are printed in descending order according to their scores. Usually, the top cluster contains the correct examples of the data field, and only the correct examples.

Extract another field
or Back to beginning


ISBN numbers identified by the algorithm are highlighted in red, and shown in context.

Cluster 0, score 18.0000


Extract another field
or Back to beginning


Copyright: USC Information Sciences Institute 2000