ISI Web Search Agents Find WI 2003 Conference Grand Prize

November 6, 2003

A graduate student and an ISI Senior Project leader brought back a "best paper" prize from a recent IEEE Conference on Web Intelligence and Artificial Agents.

Craig Knoblock and Shou-de Lin returned from Halifax, Nova Scotia with a certificate and a check for $500 for their paper on "Exploiting a Search Engine to Develop More Flexible Web Agents," judged best of all presented.

The competition was stiff. "WI 2003" received nearly 600 papers for the October 13- 16 gathering devoted to research on Web Intelligence, "a new direction" according to conference organizers, "that explores the fundamental roles as well as practical impacts of Artificial Intelligence and advanced Information Technology on the next generation of Web-empowered products, systems, services, and activities. It is the key and the most urgent research field of IT in the era of World Wide Web and agent intelligence."

Knoblock has been working in this field for a considerable period. In addition to his appointments at ISI and the USC School of Engineering's department of computer science, he is also the chief scientist of Fetch Technologies, a ompany which is commercializing automated systems for extracting information from the World Wide Web.

Lin, a Ph.D. student who earned his BS at the National Taiwan University, and his M.S. in Electrical Engineering department at the University of Michigan, is currently a research assistant in ISI working with Dr. Hans Chalupsky.

The prizewinning paper is an extension of Lin's work in a course taught by Knoblock in spring 2002. It successfully solves two seemingly straightforward but quite stubborn problems in automatic information retrieval via intelligent web agents, and uses the solutions to illuminate the general issues involved in using agents to go beyond search engine products.

The first challenge was to create an "Internet reverse geocoder." An ordinary geocoder can return the longitude and latitude of any given street address. The tricky challenge of the inverse coder was to do the reverse: given a longitude and latitude, come up with a street address. The problem is complicated by the fact that on some streets, addresses are very close together, while on others they are spread far apart.

The second was also challenging: an address lookup module that generates the missing address fields automatically by use of the known address fields.

In both cases, web agents had to go to existing web search engine sites that had clues to the desired information, (the ones used were Mapblast, Yahoo and Yahoo Yellow Pages, and White Pages Address Lookup), extract the clues automatically from the sites, and then process (and sometimes reprocess) the clues to get the needed result.

The authors generalized from the techniques used to solve the problem "by translating it into an equivalent constraint satisfaction problem, which simplifies the implementation of the recall-driven IE tasks."

In the prize competition, the conference reviewers selected eight papers out of the hundreds submitted and forwarded them to the conference chairs. The chairs selected three of the eight for the final round, with the authors appearing for a public question and answer session. Lin did the presentation.

"They didn't explicitly say why our paper won," he said, "but they told me I delivered a decent talk and answered the questions properly."

Knoblock's Web research is incorporated in other ongoing ISI projects, including a new Web services engine called Proteus, and a Web air fare shopping algorithm called "Hamlet" developed in conjuction with the University of Washington. (see accompanying web links).

Besides this best paper award, another Lin paper, "Using Unsupervised Link Discovery Methods to Find Interesting Facts and Connections in a Bibliography Dataset" this one co-authored by Chalupsky, won second place in the open problem of ACM KDDCup 2003, as part of the 9th ACM Knowledge Discovery and Data Mining conference held in August, 2003 in Washington D.C. (see accompanying web links)