USC/ISI Automatic Data Extraction Demo

Step 5: Clustering titles

We use clustering to help identify correct extracts by finding subsets of extracts that share common features. The extracts are clustered according to the set of  positional features, including adjacent landmarks, position on the page and whether the extract is visible.
Next: Score clusters

TITLE : 117 clusters identified among 1030 possible extracts
Printing 116th cluster of 11 objects
The Bluest Eye
Sacred Hunger ( Norton Paperback Fiction )
The Abridged Edition
The Selfish Gene
Greatest Generation ( Random House Large Print )
Call It Sleep
Machine Learning ( McGraw - Hill Series in Computer Science )
The Search for Laws of Self - Organization and Complexity
At Home in the Universe : The Search for Laws of Self - Organization and Complexity
Materials Fundamentals of Molecular Beam Epitaxy
Harry Potter and the Sorcerer ' s Stone

Printing 43th cluster of 9 objects
Learning Sets of Rules
Genetic Algorithms 10 . Learning Sets of Rules
Computational Learning Theory
Bayesian Learning 7 . Computational Learning Theory
Evaluating Hypotheses 6 . Bayesian Learning 7 . Computational Learning Theory
Artificial Neural Networks
Decision Tree Learning 4 . Artificial Neural Networks
Decision Tree Learning
Concept Learning and General - to - Specific Ordering 3 . Decision Tree Learning

Printing 109th cluster of 12 objects
The Bluest Eye
The Bluest Eye
The New York Times Book Review
Fried Green Tomatoes
The Selfish Gene
The New York Times Book Review
The Washington Post Book World
The New York Times Book Review
Harry Potter and the Sorcerer ' s Stone
Harry Potter and the Philosopher ' s Stone
Harry Potter and the Sorcerer ' s Stone
Harry Potter and the Sorcerer ' s Stone

Printing 114th cluster of 20 objects
On Human Nature
River Out of Eden : A Darwinian View of Life ( Science Masters Series )
The Meme Machine
The Extended Phenotype : The Long Reach of the Gene ( Popular Science )
The Blind Watchmaker : Why the Evidence of Evolution Reveals a Universe Without Design
Blind Man ' s Bluff : The Untold Story of American Submarine Espionage
The Greatest Generation Speaks : Letters and Reflections
Hidden Order : How Adaptation Builds Complexity
An Introduction to Genetic Algorithms ( Complex Adaptive Systems Series )
The Control of Nature
In Suspect Terrain
The Rise of David Levinsky ( Twentieth - Century Classics )
Dangling Man ( Penguin Twentieth - Century Classics )
The Victim : A Novel ( Penguin Twentieth Century Classics )
Statistical Learning Theory
Reinforcement Learning : An Introduction ( Adaptive Computation and Machine Learning )
The End of Certainty : Time , Chaos , and the New Laws of Nature
Hidden Order : How Adaptation Builds Complexity
Harry Potter and the Prisoner of Azkaban
Harry Potter and the Chamber of Secrets

Printing 115th cluster of 8 objects *** correct cluster ***
The Bluest Eye
Sacred Hunger ( Norton Paperback Fiction )
The Selfish Gene
Call It Sleep
Machine Learning ( McGraw - Hill Series in Computer Science )
At Home in the Universe : The Search for Laws of Self - Organization and Complexity
Materials Fundamentals of Molecular Beam Epitaxy
Harry Potter and the Sorcerer ' s Stone


Next: Score clusters



Copyright: USC Information Sciences Institute 2000