USC/ISI Automatic Data Extraction Demo
Step 6: Find correct examples of TITLE
After the extracts are clustered, each
cluster is given a score according to how similar it is to the set of training
examples. The clusters are printed in descending order according to their
scores. Usually, the top cluster contains the correct examples of the data
field, and only the correct examples.
Titles identified by the algorithm
are highlighted in red and shown in context.
Cluster 115, score 8.0000
-
The Bluest Eye by
-
Sacred Hunger ( Norton Paperback Fiction
) by
-
The Selfish Gene by
-
Call It Sleep by
-
Machine Learning ( McGraw - Hill Series
in Computer Science ) by
-
At Home in the Universe : The Search
for Laws of Self - Organization and Complexity by
-
Materials Fundamentals of Molecular
Beam Epitaxy by
-
Harry Potter and the Sorcerer ' s Stone
by
Cluster 116, score 8.0000
-
Amazon . com : buying info : The
Bluest Eye
-
Amazon . com : buying info : Sacred
Hunger ( Norton Paperback Fiction )
-
Amazon . com : buying info : Sociobiology
: The Abridged Edition
-
Amazon . com : buying info : The
Selfish Gene
-
The Greatest Generation
( Random House Large Print ) [ LARGE PRINT ] by
-
Amazon . com : buying info : Call
It Sleep
-
Amazon . com : buying info : Machine
Learning ( McGraw - Hill Series in Computer Science )
-
Amazon . com : buying info : At Home in
the Universe : The Search for Laws of Self - Organization
and Complexity
-
Amazon . com : buying info : At
Home in the Universe : The Search for Laws of Self - Organization and Complexity
-
Amazon . com : buying info : Materials
Fundamentals of Molecular Beam Epitaxy
-
Amazon . com : buying info : Harry
Potter and the Sorcerer ' s Stone
Extract
another field
or Back
to beginning
Copyright: USC Information Sciences Institute 2000