USC/ISI Automatic Data Extraction Demo

Step 6: Find correct examples of AUTHOR

After the extracts are clustered, each cluster is given a score according to how similar it is to the set of training examples. The clusters are printed in descending order according to their scores. Usually, the top cluster contains the correct examples of the data field, and only the correct examples.
Extract another field
or Back to beginning

Author names identified by the algorithm are highlighted in red and shown in context.

Cluster 122, score 16.0000
From Page 1
by Toni Morrison ( Afterword )
From Page 2
    by Charles Frazier
From Page 3
    by Edward Ball
From Page 4
    by Fo Dario , Dario Fo ,
    Fo Dario , Dario Fo , Rupert Lowe (
    Dario Fo , Rupert Lowe ( Translator )
From Page 5
    by Barry Unsworth
From Page 6
    by Jared Diamond
From Page 7
    by Margaret Laurence
From Page 8
    by Fannie Flagg
From Page 9
    by Edward Osborne Wilson , Sarah Landry (
    Edward Osborne Wilson , Sarah Landry ( Photographer )
From Page 10
    by Richard Dawkins
From Page 11
by Tom Brokaw
From Page 12
    by John H . Holland
From Page 13
    by Henry Roth
From Page 14
    by Tom M . Mitchell , Thomas
    Mitchell , Thomas M . Mitchell
From Page 15
    by Stuart Kauffman
From Page 16
    by Jeffrey Y . Tsao
 

Cluster 123, score 16.0000


Cluster 88, score 15.0000


Extract another field
or Back to beginning


Copyright: USC Information Sciences Institute 2000