CSCI 548
Information Integration on the Web
Spring 2004
  

Instructor: Craig Knoblock (knoblock@isi.edu)

Meeting Time: Thursdays 2-4:50pm

Location: THH 116

Office Hours:

Thursdays 5-6pm (PHE 416)

Tuesdays 4-5pm (ISI 922 or by phone: 310-448-8786)

Teaching Assistant: Rattapoom Tuchinda (pipet@isi.edu)

TA Office Hours:

          Mondays 10-11:30am (SAL 200C)

          Wednesdays 10-11:30am (SAL 200C)

 

Grader: Rahul Bakshi (rbakshi@usc.edu)

Course Web Page: USC Blackboard (learn.usc.edu)

 

There is an abundance of data available on the Internet and there are many opportunities to combine this information to build new applications and tools.  However, there are many obstacles to exploiting the available data.  In many cases the information is not available in a structured representation, it is complicated to navigate to the required information, the format of the information changes over time, and the terminology used to describe the data varies from one data source to the next.  

This course will focus on the basic foundations and techniques in Information Integration as it applies to the Web.  There has been a great deal of interest and research over the last few years on this topic and the course will cover the research and tools for addressing the technical problems.  The topics covered will include structured data representation such as XML, view integration techniques, machine learning techniques for turning web sites into structured data sources, high-performance query execution systems based on streaming dataflow, constraint-based integration systems, and approaches to resolving naming inconsistencies across sites. 

The class will be run as a lecture course with lots student participation and hands-on experience.  As an integral part of the course each student will develop and build their own integrated Web application or related research project using the research and tools covered in the class.

 

Prerequisites:

CSCI561 -- Introduction to AI

CSCI585 – Database Systems

 

Recommended Course:

CSCI571— Issues of Programming Language Design

CSCI573—Advanced AI

 

Grading:

Homework -- 30%

Course project -- 35%

            (Paper 15%, Demo 10%, Presentation 10%)

Quizes – 10%

Final Exam -- 25%

 

Books: There is no required textbook.  We will be technical papers on each topic.

 

Lab: SAL 200C (there is a $175 lab fee for this course)

 

Course Syllabus and Schedule

 

·        Lecture 1 (January 15)

o       Topics

§         Introduction

§         Overview of the course

§         Course projects

§         XML Query Processing

 

·        Lecture 2 (January 22)

o       Topics

§         Wrapper Learning and Maintenance

§         Agent Builder (Rattapoon Tuchinda)

§         Assignment 1 Due

·        Xquery

 

·        Lecture 3 (January 29)

o       Topics

§         Automatic Wrapper Generation (Prof. Kristina Lerman)

§         Advanced Agent Builder (Rattapoon Tuchinda)

o       Homework

§         Assignment 2 Due

·        Wrapper Building

 

 

·        Lecture 4 (February 5)

o       Topics

§         Dataflow Architectures for Plan Execution

§         Theseus Agent Execution System (Dr. Greg Barish)

o       Homework

§         Assignment 3 Due

·        Advanced Wrapper Building

 

·        Lecture 5 (February 12)

o       Topics

§         Data Integration

§         Prometheus mediator (Snehal Thakkar)

o       Homework

§         Assignment 4 Due

·        Execution Plans

 

·        Lecture 6 (February 19)

o       Topics

§         Data Integration (cont.)

§         Web service integration (Snehal Thakkar)

o       Homework

§         Assignment 5 Due

·        Mediator Integration

 

·        Lecture 7 (February 26)

o       Topics

§         Constraint Integration

§         Heracles Constraint Integration System (Dr. Jose Luis Ambite)

o       Homework

§         Assignment 6 Due

·        Service integration

 

·        Lecture 8 (March 4)

o       Topics

§         Record Linkage

§         Apollo record linkage (Martin Michalowski)

o       Homework

§         Homework 7 Due

·        Constraint Integration in Heracles

§         Project Proposals Due

 

 

 

·        Lecture 9 (March 11)

o       Topics

§         Handling Inconsistency

§         The Semantic Web (Prof. Yolanda Gil)

o       Homework

§         Homework 8 Due

·        Record Linkage

 

·        Spring Break (March 18)

 

·        Lecture 10 (March 25)

o       Topics

§         Adaptive Execution Strategies

§         Speculative Execution (Dr. Greg Barish)

 

·        Lecture 11 (April 1)

o       Topics

§         Optimizing Query Plans

§         Planning by rewriting (Dr. Jose Luis Ambite)

o       Homework

§         Project Status Report Due

 

·        Lecture 12 (April 8)

o       Topics

§         Schema/Ontology Mapping (Prof. Anhai Doan)

§         Automatic Source Modeling

 

·        Lecture 13 (April 15)

o       Topics

§         Geospatial Data Integration

§         Biological Data Integration (Salim Khan)

 

·        Project Presentations (April 22)

 

·        Project Presentations (April 27, 2-4:50pm)

o       Location: TBD

 

·        Project Presentations (April 29)

 

·        Final Exam (Tuesday, May 4, 2-4pm)

o       Location: TBD