USC/ISI Automatic Data Extraction Demo

Step 4: Extracting TITLE from test pages using data prototype

Using the patterns of the data prototype learned from the training examples (Step 1), we identify possible extracts on the new (test) pages. Some patterns are specific and identify only the correct examples of the data field; however, other patterns are general and identify many extraneous examples. The test pages in this example came from the same source (amazon.com). However, because the layout of the source has changed, the wrapper was not extracting the data from the new pages.


Next: Cluster extracts
or View final results


Titles identified by the data prototype on test pages
(partial list of 1030 extracts from 20 pages)
Page 1
Harry Potter and the Sorcerer ' s Stone 
Harry Potter and the Sorcerer ' s Stone 
Harry Potter and the Chamber of Secrets 
Harry Potter and the Prisoner of Azkaban 
Newbery Medal Book 
Medal Book 
Harry Potter & the Chamber of Secrets 
The Iron Giant 
Iron Giant 
Harry Potter and the Sorcerer ' s Stone 
Hogwarts School of Witchcraft 
Hogwarts School of Witchcraft and Wizardry 
Harry Potter and the Sorcerer ' s Stone 
Harry Potter and the Philosopher ' s Stone 
National Book Award 
National Book Award , the Smarties Prize , the Children ' s Book 
Book Award 
Book Award , the Smarties Prize , the Children ' s Book 
Smarties Prize , the Children ' s Book 
Book 
Harry Potter and the Chamber of Secrets 
Harry Potter and the Prisoner of Azkaban 
Harry Potter and the Sorcerer ' s Stone 
Leaky Cauldron , Diagon Alley , and Hogwarts School of Witchcraft 
Leaky Cauldron , Diagon Alley , and Hogwarts School of Witchcraft and Wizardry 
Diagon Alley , and Hogwarts School of Witchcraft 
Diagon Alley , and Hogwarts School of Witchcraft and Wizardry 
Hogwarts School of Witchcraft 
Hogwarts School of Witchcraft and Wizardry 
Reader Jim Dale 
Jim Dale 
The New York 
The New York Times 
The New York Times Book 
The New York Times Book Review 
Page 2
Materials Fundamentals of Molecular 
Materials Fundamentals of Molecular Beam Epitaxy 
Molecular 
Molecular Beam Epitaxy 
Beam Epitaxy 
Materials Fundamentals of Molecular 
Materials Fundamentals of Molecular Beam Epitaxy 
Molecular 
Molecular Beam Epitaxy 
Beam Epitaxy 
From Book News 
Book News 
Book 
Page 3
At Home in the Universe : The Search for Laws 
At Home in the Universe : The Search for Laws of Self 
At Home in the Universe : The Search for Laws of Self - Organization and Complexity 
The Search for Laws 
The Search for Laws of Self 
The Search for Laws of Self - Organization and Complexity 
At Home in the Universe : The Search for Laws 
At Home in the Universe : The Search for Laws of Self 
At Home in the Universe : The Search for Laws of Self - Organization and Complexity 
The Search for Laws 
The Search for Laws of Self 
The Search for Laws of Self - Organization and Complexity 
Oxford Univ Pr 
Univ Pr 
Hidden Order : How Adaptation Builds 
Hidden Order : How Adaptation Builds Complexity 
How Adaptation Builds 
How Adaptation Builds Complexity 
Adaptation Builds 
Adaptation Builds Complexity 
Builds 
Builds Complexity 
The Emerging Science 
The Emerging Science at the Edge of Order 
The Emerging Science at the Edge of Order and Chaos 
Emerging Science 
Emerging Science at the Edge of Order 
Emerging Science at the Edge of Order and Chaos 
The End of Certainty 
The End of Certainty : Time , Chaos , and the New Laws of Nature 
New Laws of Nature 
The Search for Order 
The Universe and Eye 
Page 4
Machine Learning ( McGraw - Hill Series in Computer 
Machine Learning ( McGraw - Hill Series in Computer Science ) 
Hill Series in Computer 
Hill Series in Computer Science ) 
Computer 
Computer Science ) 
Machine Learning ( McGraw - Hill Series in Computer 
Machine Learning ( McGraw - Hill Series in Computer Science ) 
Hill Series in Computer 
Hill Series in Computer Science ) 
Computer 
Computer Science ) 
Hill College Div 
College Div 
Carnegie Mellon University 
Mellon University 
Reinforcement Learning : An Introduction ( Adaptive Computation and Machine 
Reinforcement Learning : An Introduction ( Adaptive Computation and Machine Learning ) 
An Introduction ( Adaptive Computation and Machine 
An Introduction ( Adaptive Computation and Machine Learning ) 
Adaptive Computation and Machine 
Adaptive Computation and Machine Learning ) 
Machine 
Machine Learning ) 
Statistical Learning Theory 
Statistical Learning Theory ; Vladimir Naumovich Vapnik 
Learning Theory 
Learning Theory ; Vladimir Naumovich Vapnik 
Vladimir Naumovich Vapnik 
Naumovich Vapnik 
Learning 
Learning Systems for Signal 
Learning Systems for Signal Processing , Communications and Control 
Learning Systems for Signal 
Page 5
Call It Sleep 
It Sleep 
Call It Sleep 
It Sleep 
The Victim : A Novel ( Penguin Twentieth Century 
The Victim : A Novel ( Penguin Twentieth Century Classics 
The Victim : A Novel ( Penguin Twentieth Century Classics ) 
Penguin Twentieth Century 
Penguin Twentieth Century Classics 
Penguin Twentieth Century Classics ) 
Twentieth Century 
Twentieth Century Classics 
Twentieth Century Classics ) 
Century 
Century Classics 
Century Classics ) 
Dangling Man ( Penguin Twentieth - Century Classics ) 
Penguin Twentieth - Century Classics ) 
Century Classics ) 
The Rise of David 
The Rise of David Levinsky ( Twentieth - Century Classics ) 
David 
David Levinsky ( Twentieth - Century Classics ) 
Century Classics ) 
The Merriam - Webster Encyclopedia of Literature 
Webster Encyclopedia of Literature 
Henry 
New York City 
York City 
The New York 
The New York Times 
The New York Times Book 
The New York Times Book Review 
New York 
New York Times 
New York Times Book 
New York Times Book Review 
Call It Sleep 
It Sleep 
James Joyce ' s Ulysses 
Perhaps The great American 
Page 6
In Suspect Terrain 
Suspect Terrain 
The Control of Nature 
Fourteen Weeks to Popular 
Popular 
Eldridge Moores , a tectonicist at the University of California 
Eldridge Moores , a tectonicist at the University of California at Davis 
Sierra Nevadas to the Great Central Valley 
Great Central Valley 
Central Valley 
North America along Interstate 
Page 7 
Artificial 
Artificial Systems : An Introductory Analysis 
Artificial Systems : An Introductory Analysis With 
Artificial Systems : An Introductory Analysis With Applications 
Artificial Systems : An Introductory Analysis With Applications to Biology 
An Introductory Analysis 
An Introductory Analysis With 
An Introductory Analysis With Applications 
An Introductory Analysis With Applications to Biology 
Introductory Analysis 
Introductory Analysis With 
Introductory Analysis With Applications 
Introductory Analysis With Applications to Biology 
Introductory Analysis With Applications to Biology , Control , and Artificial Intelligence ( Complex A ) 
Analysis 
Analysis With 
Analysis With Applications 
Analysis With Applications to Biology 
Analysis With Applications to Biology , Control , and Artificial Intelligence ( Complex A ) 
Page 8
The Greatest Generation 
The Greatest Generation ( Random House Large 
The Greatest Generation ( Random House Large Print 
The Greatest Generation ( Random House Large Print ) 
Greatest Generation 
Greatest Generation ( Random House Large 
Greatest Generation ( Random House Large Print 
Greatest Generation ( Random House Large Print ) 
Random House Large 
Random House Large Print 
Random House Large Print ) 
House Large 
House Large Print 
House Large Print ) 
Large 
Large Print 
Large Print ) 
The Greatest Generation 
The Greatest Generation ( Random House Large 
The Greatest Generation ( Random House Large Print 
The Greatest Generation ( Random House Large Print ) 
Page 9
The Selfish Gene 
Selfish Gene 
The Selfish Gene 
Selfish Gene 
Oxford Univ Pr 
Univ Pr 
The Blind Watchmaker 
The Blind Watchmaker : Why the Evidence 
The Blind Watchmaker : Why the Evidence of Evolution 
The Blind Watchmaker : Why the Evidence of Evolution Reveals a Universe 
The Blind Watchmaker : Why the Evidence of Evolution Reveals a Universe Without Design 
Blind Watchmaker 
Blind Watchmaker : Why the Evidence 
Blind Watchmaker : Why the Evidence of Evolution 
Blind Watchmaker : Why the Evidence of Evolution Reveals a Universe 
Blind Watchmaker : Why the Evidence of Evolution Reveals a Universe Without Design 
Evolution 
Evolution Reveals a Universe 
Evolution Reveals a Universe Without Design 


Next: Cluster extracts
or View final results


Copyright: USC Information Sciences Institute 2000