WHIRL: A Set of 111 Sources

Original Sites: can be found at the URLs provided in the Spitprogdb.pl files
IE/WG by: William Cohen (email: wcohen@research.att.com)
Project: WHIRL
Institution: AT&T Labs
Algorithm/Approach: hand-written PERL scripts
Main Paper: Autonomous Agents 1998
Other Papers: -

Librarian's Comments

As these 111 sources were wrapped before RISE was created, converting all of them to the new format represents a huge overhead. Consequently, we present here the sources in their original WHIRL format, which should be easy to understand based on the description provided here.

The four compressed files below were created by the author in order to avoid name clashes. If you expand any of these archives, you will obtain - among others - a set of *.HTML and *.XML directories (click here to go to the original submission page that provides further details).

For instance, if you expand birds-snap.tar.gz, one such pair of newly created directories will be states.HTML and states.XML, which contain the files 0001.html and 0021.html, and 0001.stir and 0021.stir, respectively. Each XXXX.html file represents a one-page information source, while the corresponding XXXX.stir file contains the data to be extracted (click here to see a description of the STIR format).


