CSCI 599
Information Integration on the Web
Spring 2003  

Instructor: Craig Knoblock (knoblock@isi.edu)

Meeting Time: Tuesday 2-4:50pm

Location: THH 119

Office Hours:

Tuesday 1-2pm (PHE 416)

Tuesday 4:50 – 5:30 (THH 119)

By Appointment (ISI 907)

Teaching Assistant: Snehal Thakkar (thakkar@isi.edu)

TA Office Hours:

          Monday 2-3pm  (SAL 200c)

          Tuesday 10-11am (SAL 200c)

Course Web Page: http://www.isi.edu/info-agents/courses/iiweb

 

There is an abundance of data available on the Internet and there are many opportunities to combine this information to build new applications and tools.  However, there are many obstacles to exploiting the available data.  In many cases the information is not available in a structured representation, it is complicated to navigate to the required information, the format of the information changes over time, and the terminology used to describe the data varies from one data source to the next. 

This course will focus on the basic foundations and techniques in Information Integration as it applies to the Web.  There has been a great deal of interest and research over the last few years on this topic and the course will cover the research and tools for addressing the technical problems.  The topics covered will include structured data representation such as XML, view integration techniques, machine learning techniques for turning web sites into structured data sources, high-performance query execution systems based on streaming dataflow, constraint-based integration systems, and approaches to resolving naming inconsistencies across sites. 

The class will be run as a lecture course with lots student participation and hands-on experience.  As an integral part of the course each student will develop and build their own integrated Web application or related research project using the research and tools covered in the class.

 

Prerequisites:

CSCI561 -- Introduction to AI

CSCI585 – Database Systems

 

Recommended Course:

CSCI571— Issues of Programming Language Design

 

Grading:

Class participation -- 10%

Homework -- 30%

Course project -- 30%

Mid-term and Final Exam -- 30%

 

Books: There is no required textbook.  We will be reading technical papers on each topic.

 

Lab: SAL 200C (there is a $175 lab fee for this course)

 

Course Syllabus and Schedule

 

·        Lecture 1 (January 14)

o       Topics

§         Introduction

§         Overview of the course

§         Application demonstrations

§         Course projects

 

·        Lecture 2 (January 21)

o       Topics

§         Wrapper Learning

§         Agent Builder (Snehal Thakkar, TA)

 

·        Lecture 3 (January 28)

o       Topics

§         Automatic Wrapper Generation

§         Advanced Agent Builder (Snehal Thakkar, TA)

o       Homework

§         Assignment 1 Due

·        Wrapper building

 

·        Lecture 4 (February 4)

o       Topics

§         Wrapper Maintenance

§         Automatic Source Modeling

§         Web Services (Snehal Thakkar, TA)

o       Homework

§         Assignment 2 Due

·        Wrapper building

 

·        Lecture 5 (February 11)

o       Topics

§          Dataflow Architectures for Plan Execution

§         Theseus Agent Execution System

o       Homework

§         Assignment 3 Due

·        .Net Integration

 

·        Lecture 6 (February 18)

o       Topics

§         The Semantic Web  (Professor Yolanda Gil)

§         XML Query Processing

o       Homework

§         Assignment 4 Due

·        Plan Execution in Theseus

 

·        Lecture 7 (February 25)

o       Topics

§         Adaptive Execution Strategies

§         Speculative Execution

§         Advanced Theseus

o       Homework

§         Assignment 5 Due

·        XQuery

 

·        Lecture 8 (March 4)

o       Topics

§         Constraint Integration

§         Heracles Constraint Integration System (Dr. Jose Luis Ambite)

o       Homework

§         Homework 6 Due

·        Advanced execution in Theseus

§         Written Project Proposals Due

 

·        Lecture 9 (March 11)

o       Topics

§         Schema Mapping

§         Record Linkage

o       Homework

§         Homework 7 Due

·        Constraint Integration in Heracles

o       Exams

§         Mid-term

 

·        Spring Break (March 18)

 

·        Lecture 10 (March 25)

o       Topics

§         Query Planning for Information Integration

 

·        Lecture 11 (April 1)

o       Topics

§         Case Studies of Integration Systems:

·        Global as View Systems

·        Local as View Systems

o       Homework

§         Project Status Report Due

 

·        Lecture 12 (April 8)

o       Topics

§         Optimizing Query Plans

·        Planning by rewriting

·        Selective Materialization

·        Sensing Actions

 

·        Lecture 13 (April 15)

o       Topics

§         Geospatial Data Integration

§         Q&A Review

 

·        Project Presentations (April 22)

 

·        Project Presentations (April 29)

 

·        Final Exam (May 6, 2-4pm)