
![]()
CSCI 548
Information Integration on the Web
Spring 2004
Instructor: Craig Knoblock (knoblock@isi.edu)
Meeting Time: Thursdays
Location: THH 116
Office Hours:
Thursdays
Tuesdays
Teaching Assistant: Rattapoom
Tuchinda (pipet@isi.edu)
TA Office Hours:
Mondays
Wednesdays
Grader: Rahul
Bakshi (rbakshi@usc.edu)
Course Web Page: USC Blackboard (learn.usc.edu)
There is an abundance of data available on the
Internet and there are many opportunities to combine this information to build
new applications and tools. However,
there are many obstacles to exploiting the available data. In many cases the information is not
available in a structured representation, it is complicated to navigate to the
required information, the format of the information changes over time, and the
terminology used to describe the data varies from one data source to the next.
This course will focus on the basic foundations and
techniques in Information Integration as it applies to the Web. There has been a great deal of interest and
research over the last few years on this topic and the course will cover the
research and tools for addressing the technical problems. The topics covered will include structured
data representation such as XML, view integration techniques, machine learning
techniques for turning web sites into structured data sources, high-performance
query execution systems based on streaming dataflow, constraint-based
integration systems, and approaches to resolving naming inconsistencies across
sites.
The class will be run as a lecture course with
lots student participation and hands-on experience. As an integral part of the course each
student will develop and build their own integrated Web application or related
research project using the research and tools covered in the class.
Prerequisites:
CSCI561 -- Introduction to
AI
CSCI585 – Database Systems
Recommended Course:
CSCI571— Issues of
Programming Language Design
CSCI573—Advanced AI
Grading:
Homework -- 30%
Course project -- 35%
(Paper
15%, Demo 10%, Presentation 10%)
Quizes – 10%
Final Exam -- 25%
Books: There is no required
textbook. We will be technical papers on
each topic.
Lab: SAL 200C (there is a $175
lab fee for this course)
·
Lecture 1 (January 15)
o
Topics
§
Introduction
§
Overview of the course
§
Course projects
§
XML Query Processing
·
Lecture 2 (January 22)
o
Topics
§
Wrapper Learning and
Maintenance
§
Agent Builder (Rattapoon Tuchinda)
§
Assignment 1 Due
·
Xquery
·
Lecture 3 (January 29)
o
Topics
§
Automatic Wrapper
Generation (Prof. Kristina Lerman)
§
Advanced Agent Builder
(Rattapoon Tuchinda)
o
Homework
§
Assignment 2 Due
·
·
Lecture 4 (February 5)
o
Topics
§
Dataflow Architectures
for Plan Execution
§
Theseus Agent Execution System (Dr. Greg Barish)
o
Homework
§
Assignment 3 Due
·
Advanced
·
Lecture 5 (February 12)
o
Topics
§
Data Integration
§
Prometheus mediator (Snehal Thakkar)
o
Homework
§
Assignment 4 Due
·
Execution Plans
·
Lecture 6 (February 19)
o
Topics
§
Data Integration (cont.)
§
Web service integration (Snehal Thakkar)
o
Homework
§
Assignment 5 Due
·
Mediator Integration
·
Lecture 7 (February 26)
o
Topics
§
Constraint Integration
§
Heracles Constraint
Integration System (Dr. Jose Luis Ambite)
o
Homework
§
Assignment 6 Due
·
Service integration
·
Lecture 8 (March 4)
o
Topics
§
Record Linkage
§
Apollo record linkage
(Martin Michalowski)
o
Homework
§
Homework 7 Due
·
Constraint Integration
in Heracles
§
Project Proposals Due
·
Lecture 9 (March 11)
o
Topics
§
Handling Inconsistency
§
The Semantic Web
(Prof. Yolanda Gil)
o
Homework
§
Homework 8 Due
·
Record Linkage
·
Spring Break (March 18)
·
Lecture 10 (March 25)
o
Topics
§
Adaptive Execution
Strategies
§
Speculative Execution
(Dr. Greg Barish)
·
Lecture 11 (April 1)
o
Topics
§
Optimizing Query Plans
§
Planning by rewriting
(Dr. Jose Luis Ambite)
o
Homework
§
Project Status Report
Due
·
Lecture 12 (April 8)
o
Topics
§
Schema/Ontology
Mapping (Prof. Anhai Doan)
§
Automatic Source
Modeling
·
Lecture 13 (April 15)
o
Topics
§
Geospatial Data
Integration
§
Biological Data
Integration (Salim Khan)
·
Project Presentations (April 22)
·
Project Presentations (April 27,
o
Location: TBD
·
Project Presentations (April 29)
·
Final Exam (Tuesday, May 4,
o
Location: TBD