USC/ISI Automatic Data Extraction Demo

Step 4: Extracting ISBN from test pages using data prototype

Using the patterns of the data prototype learned from the training examples (Step 1), we identify possible extracts on the new (test) pages. Some patterns are specific and identify only the correct examples of the data field; however, other patterns are general and identify many extraneous examples. The test pages in this example came from the same source (amazon.com). However, because the layout of the source has changed, the wrapper was not extracting the data from the new pages.


Next: View final results


ISBN identified by the data prototype on test pages
Page 1
0590353403 
Page 2
0399144498 
Page 3
0127016252 
Page 4
0195111303 
Page 5
0070428077 
Page 6
0374522928 
Page 7
0374106452 
Page 8
0262581116 
Page 9
 0375705694 
etc...

Next: View final results


Copyright: USC Information Sciences Institute 2000