
![]()
CSCI 599
Information Integration on the Web
Spring 2003
Instructor: Craig Knoblock (knoblock@isi.edu)
Meeting Time: Tuesday 2-4:50pm
Location: THH 119
Office Hours:
Tuesday 1-2pm (PHE 416)
Tuesday 4:50 – 5:30 (THH
119)
By Appointment (ISI 907)
Teaching Assistant: Snehal Thakkar (thakkar@isi.edu)
TA Office Hours:
Monday 2-3pm (SAL 200c)
Tuesday
10-11am (SAL 200c)
Course Web Page: http://www.isi.edu/info-agents/courses/iiweb
There is an abundance
of data available on the Internet and there are many opportunities to combine
this information to build new applications and tools. However, there are many obstacles to exploiting the available
data. In many cases the information is
not available in a structured representation, it is complicated to navigate to
the required information, the format of the information changes over time, and
the terminology used to describe the data varies from one data source to the
next.
This course will
focus on the basic foundations and techniques in Information Integration as it
applies to the Web. There has been a
great deal of interest and research over the last few years on this topic and
the course will cover the research and tools for addressing the technical
problems. The topics covered will
include structured data representation such as XML, view integration
techniques, machine learning techniques for turning web sites into structured
data sources, high-performance query execution systems based on streaming
dataflow, constraint-based integration systems, and approaches to resolving
naming inconsistencies across sites.
The class will be run
as a lecture course with lots student participation and hands-on
experience. As an integral part of the
course each student will develop and build their own integrated Web application
or related research project using the research and tools covered in the class.
Prerequisites:
CSCI561 -- Introduction to
AI
CSCI585 – Database Systems
Recommended Course:
CSCI571— Issues of
Programming Language Design
Grading:
Class participation -- 10%
Homework -- 30%
Course project -- 30%
Mid-term and Final Exam -- 30%
Books: There is no required
textbook. We will be reading technical
papers on each topic.
Lab: SAL 200C (there is a $175
lab fee for this course)
·
Lecture 1 (January 14)
o Topics
§
Introduction
§
Overview of the course
§
Application demonstrations
§
Course projects
·
Lecture 2 (January 21)
o Topics
§
Wrapper Learning
§
Agent Builder (Snehal
Thakkar, TA)
·
Lecture 3 (January 28)
o Topics
§
Automatic Wrapper Generation
§
Advanced Agent Builder
(Snehal Thakkar, TA)
o Homework
§
Assignment 1 Due
·
Wrapper building
·
Lecture 4 (February 4)
o Topics
§
Wrapper Maintenance
§
Automatic Source Modeling
§
Web Services (Snehal Thakkar,
TA)
o Homework
§
Assignment 2 Due
·
Wrapper building
·
Lecture 5 (February 11)
o Topics
§
Dataflow Architectures for Plan Execution
§
Theseus Agent Execution
System
o Homework
§
Assignment 3 Due
·
.Net Integration
·
Lecture 6 (February 18)
o Topics
§
The Semantic Web (Professor Yolanda Gil)
§
XML Query Processing
o Homework
§
Assignment 4 Due
·
Plan Execution in Theseus
·
Lecture 7 (February 25)
o Topics
§
Adaptive Execution Strategies
§
Speculative Execution
§
Advanced Theseus
o Homework
§
Assignment 5 Due
·
XQuery
·
Lecture 8 (March 4)
o Topics
§
Constraint Integration
§
Heracles Constraint
Integration System (Dr. Jose Luis Ambite)
o Homework
§
Homework 6 Due
·
Advanced execution in Theseus
§
Written Project Proposals Due
·
Lecture 9 (March 11)
o Topics
§
Schema Mapping
§
Record Linkage
o Homework
§
Homework 7 Due
·
Constraint Integration in
Heracles
o Exams
§
Mid-term
·
Spring Break (March 18)
·
Lecture 10 (March 25)
o Topics
§
Query Planning for
Information Integration
·
Lecture 11 (April 1)
o Topics
§
Case Studies of Integration
Systems:
·
Global as View Systems
·
Local as View Systems
o Homework
§
Project Status Report Due
·
Lecture 12 (April 8)
o Topics
§
Optimizing Query Plans
·
Planning by rewriting
·
Selective Materialization
·
Sensing Actions
·
Lecture 13 (April 15)
o Topics
§
Geospatial Data Integration
§
Q&A Review
·
Project Presentations (April 22)
·
Project Presentations (April 29)
·
Final Exam (May 6, 2-4pm)