Homework 2
Due Date: February 7, 2002
In this homework you are required to wrap the followin two sites:
- Geocoder:
- URL: http://cronus.isi.edu:8080/geocode.html
- Type: form
- Input parameters: street address, city, state, zip code
Example input:
- street address: 4676 Admiralty Way
- city: Marina del Rey
- state: CA
- zip: 90292
- Output parameters (to be extracted from answer page):
For five addresses of your choice, you should extract the following information:
- Latitute
- Longitute
- Error Info
- Attention: please be patient with the geo-coder!
(On its best day, it takes about 10 seconds to retrieve a location.)
- CIA World Factbook:
- URL: http://www.cia.gov/cia/publications/factbook
- Type: frame-based navigation
- Input parameters: none
- Output parameters (to be extracted from site):
For five countries of your choice, you should extract the following information:
- Country Name
- Location
- Geographic Coordinates
- Map Reference
- Area - comparative
- Climate
- Terrain
- Note: in solving this problem, you must follow the procedure used in class:
define entry connector, use a wrapper W-1 to get the correct frame,
create wrapper W-2 that extracts list of country names and URLs,
and learn wrapper W-3 that takes as input the URLs from the previous
one and extracts info of interest.
What to turn in:
For each learned wrapper (note that for Factbook you'll learn
several wrappers) you should print four screen shots:
- one that shows that structure of the extracted data;
examle
- one that shows the training pages (remember that you are not allowed
to train your wrapper on the 5 pages that are used for testing);
examle
- one that shows the learned rules;
examle
- one that shows the output for the five pages;
examle
If you use Internet Explorer, you can click here for a sample output.