USC/ISI Automatic Data Extraction Demo

Step 5: Clustering prices

We use clustering to help identify correct extracts by finding subsets of extracts that share common features. The extracts are clustered according to the set of  positional features, including adjacent landmarks, position on the page and whether the extract is visible. Next, each cluster is given a score according to how similar it is to the set of training examples. The top scoring cluster should contain the correct extracts.

Next: Score clusters


PRICE : 3 clusters identified among 49 possible extracts
Printing 0th cluster of 15 objects *** correct cluster ***
10 . 36
10 . 36
16 . 80
21 . 00
10 . 95
11 . 96
10 . 36
25 . 95
10 . 36
19 . 96
20 . 00
94 . 80
14 . 36
66 . 00
21 . 00

Printing 1th cluster of 14 objects
12 . 95
12 . 95
24 . 00
30 . 00
14 . 95
12 . 95
25 . 95
12 . 95
24 . 95
21 . 00
14 . 00
17 . 95
30 . 00
17 . 95

Printing 2th cluster of 20 objects
15 . 00
10 . 00
20 . 00
10 . 00
10 . 00
99 . 00
14 . 95
90 . 00
20 . 00
15 . 00
15 . 00
28 . 00
68 . 00
12 . 95
29 . 95
39 . 95
15 . 95
15 . 00
20 . 00
25 . 00


Next: Score clusters



Copyright: USC Information Sciences Institute 2000