USC/ISI Automatic Data Extraction Demo

Step 4: Extracting AVAILABILITY from test pages using data prototype

Using the patterns of the data prototype learned from the training examples (Step 1), we identify possible extracts on the new (test) pages. Some patterns are specific and identify only the correct examples of the data field; however, other patterns are general and identify many extraneous examples. The test pages in this example came from the same source (amazon.com). However, because the layout of the source has changed, the wrapper was not extracting the data from the new pages.


Next: View final results


Availability identified by the data prototype on test pages
Page 1
Usually ships within 24 hours 
Page 2
 Usually ships within 24 hours 
Page 3
Usually ships within 24 hours 
Page 4
Usually ships within 24 hours 
Page 5
Usually ships within 24 hours 
Page 6
Usually ships within 24 hours 
Page 7 
Usually ships within 24 hours 
Page 8
Usually ships within 24 hours 
Page 9
Usually ships within 24 hours 


Next: View final results


Copyright: USC Information Sciences Institute 2000