CSCI 599: Information Integration on the Web

Spring 2002

Instructor: Craig Knoblock (

Office Hours:

Teaching Assistant: Ion Muslea (

Office Hours:

Meeting Time:


There is an abundance of data available on the Internet and there are many opportunities to combine this information to build new applications and tools. However, there are many obstacles to exploiting the available data. In many cases the information is not available in a structured representation, it is complicated to navigate to the required information, the format of the information changes over time, and the terminology used to describe the data varies from one data source to the next.

This course will focus on the basic foundations and techniques in Information Integration as it applies to the Web. There has been a great deal of interest and research over the last few years on this topic and the course will cover the research and tools for addressing the technical problems. The topics covered will include structured data representation such as XML, view integration techniques, machine learning techniques for turning web sites into structured data sources, high-performance query execution systems based on dataflow, constraint-based integration systems, and approaches to resolving naming inconsistencies across sites.

The class will be run as a lecture course with lots student participation and hands-on experience. As an integral part of the course each student will develop and build their own integrated Web application using the research and tools covered in the class.

Course Syllabus and Schedule

Course Materials: Notes, Slides, Papers, and Homeworks

  • Lecture 11:

  • Lecture 12:

  • Lecture 13: