CSCI 599
Information Integration on the Web

 

Instructor: Craig Knoblock (Knoblock@isi.edu)

Office Hours:

Thursday 1-2pm (PHE 416)

Thursday 4:50 – 5:20 (THH 112)

By Appointment (ISI 941)

Teaching Assistant: TBD

TA Office Hours:

          TBD  (SAL 200c)

Meeting Time: Thursday 2-4:50pm

Location: TBD

Course Web Page: http://www.isi.edu/info-agents/courses/csci599

 

There is an abundance of data available on the Internet and there are many opportunities to combine this information to build new applications and tools.  However, there are many obstacles to exploiting the available data.  In many cases the information is not available in a structured representation, it is complicated to navigate to the required information, the format of the information changes over time, and the terminology used to describe the data varies from one data source to the next. 

 

This course will focus on the basic foundations and techniques in Information Integration as it applies to the Web.  There has been a great deal of interest and research over the last few years on this topic and the course will cover the research and tools for addressing the technical problems.  The topics covered will include structured data representation such as XML, view integration techniques, machine learning techniques for turning web sites into structured data sources, high-performance query execution systems based on dataflow, constraint-based integration systems, and approaches to resolving naming inconsistencies across sites. 

 

The class will be run as a lecture course with lots student participation and hands-on experience.  As an integral part of the course each student will develop and build their own integrated Web application using the research and tools covered in the class.

 

Prerequisites:

CSCI561 -- Introduction to AI

CSCI585 – Database Systems

 

Recommended Course:

CSCI571— Issues of Programming Language Design

 

Grading:

Class participation -- 10%

Homework -- 30%

Course project -- 30%

Mid-term and Final Exam -- 30%

 

Books: There is no required textbook.  We will be reading technical papers on each topic.

 

Course Syllabus and Schedule

 

·        Lecture 1 (January 16)

o       Topics

§         Introduction

§         Overview of the course

§         Application demonstrations

§         Course requirements

§         Course projects

 

·        Lecture 2 (January 23)

o       Topics

§         Wrapper Learning

§         Agent Builder

 

·        Lecture 3 (January 30)

o       Topics

§         Automatic Wrapper Generation

§         Advanced Agent Builder     

o       Homework

§         Assignment 1 Due

·        Wrapper building

 

·        Lecture 4 (February 6)

o       Topics

§         Wrapper Maintenance

§         Automatic Source Modeling

§         .Net Framework

o       Homework

§         Assignment 2 Due

·        Wrapper building

 

·        Lecture 5 (February 13)

o       Topics

§         The Semantic Web

§         XML Query Processing

o       Homework

§         Assignment 3 Due

·        .Net Integration

 

·        Lecture 6 (February 20)

o       Topics

§         Dataflow Architectures for Plan Execution

§         Theseus Agent Execution System

o       Homework

§         Assignment 4 Due

·        XQuery

·        Lecture 7 (February 27)

o       Topics

§         Adaptive Execution Strategies

§         Speculative Execution

§         Advanced Theseus

o       Homework

§         Assignment 5 Due

·        Plan Execution in Theseus

 

·        Lecture 8 (March 6)

o       Topics

§         Constraint Integration

§         Heracles Constraint Integration System

o       Homework

§         Homework 6 Due

·        Advanced execution in Theseus

§         Written Project Proposals Due

 

·        Lecture 9 (March 13)

o       Topics

§         Schema Mapping

§         Record Linkage

o       Homework

§         Homework 7 Due

·        Constraint Integration in Heracles

o       Exams

§         Mid-term

 

·        Spring Break (March 20)

 

·        Lecture 10 (March 27)

o       Topics

§         Query Planning for Information Integration

 

·        Lecture 11 (April 3)

o       Topics

§         Case Studies of Integration Systems:

·        Global as View Systems

·        Local as View Systems

o       Homework

§         Project Status Report Due

 

·        Lecture 12 (April 10)

o       Topics

§         Optimizing Query Plans

·        Planning by rewriting

·        Selective Materialization

·        Sensing Actions

 

·        Lecture 13 (April 17)

o       Topics

§         Geospatial Data Integration

§         Q&A Review

 

·        Project Presentations (April 24)

 

·        Project Presentations (May 1)

 

·        Final Exam (TBD)