USC/ISI Automatic Data Extraction Demo

Step 4: Extracting PRICE from test pages using data prototype

Using the patterns of the data prototype learned from the training examples (Step 1), we identify possible extracts on the new (test) pages. Some patterns are specific and identify only the correct examples of the data field; however, other patterns are general and identify many extraneous examples. The test pages in this example came from the same source (amazon.com). However, because the layout of the source has changed, the wrapper was not extracting the data from the new pages.


Next: Cluster extracts
or View final results


Prices identified by the data prototype on test pages
Page 1
25 . 00 
20 . 00 
17 . 95 
Page 2
30 . 00 
21 . 00 
15 . 00
Page 3
66 . 00 
Page 4
14 . 36 
17 . 95 
15 . 95 
39 . 95 
Page 5
94 . 80 
29 . 95
Page 6
14 . 00 
12 . 95
Page 7
21 . 00 
68 . 00 
28 . 00 
15 . 00 
Page 8
20 . 00 
15 . 00 
Page 7
20 . 00 
19 . 96 
24 . 95 
Page 10
90 . 00 
10 . 36 
12 . 95
Page 11
25 . 95 
Page 12
25 . 95 
14 . 95 
Page 13
10 . 36 
12 . 95 
Page 14
99 . 00 
Page 15
10 . 00 
10 . 00 
11 . 96 
14 . 95 
Page 16
10 . 95 
Page 17
30 . 00 
21 . 00 
Page 18
24 . 00 
16 . 80 
Page 19
20 . 00 
10 . 00 
15 . 00 
10 . 36 
12 . 95
Page 20
10 . 36 
12 . 95


Next: Cluster extracts
or View final results


Copyright: USC Information Sciences Institute 2000