
![]()
CSCI
599
Information Integration on the Web
Instructor: Craig Knoblock (Knoblock@isi.edu)
Office Hours:
Thursday 1-2pm (PHE 416)
Thursday 4:50 – 5:20 (THH
112)
By Appointment (ISI 941)
Teaching Assistant: TBD
TA Office Hours:
TBD
(SAL 200c)
Meeting Time: Thursday 2-4:50pm
Location: TBD
Course Web Page:
http://www.isi.edu/info-agents/courses/csci599
There is an abundance
of data available on the Internet and there are many opportunities to combine
this information to build new applications and tools. However, there are many obstacles to exploiting the available
data. In many cases the information is
not available in a structured representation, it is complicated to navigate to
the required information, the format of the information changes over time, and
the terminology used to describe the data varies from one data source to the
next.
This course will
focus on the basic foundations and techniques in Information Integration as it
applies to the Web. There has been a
great deal of interest and research over the last few years on this topic and
the course will cover the research and tools for addressing the technical
problems. The topics covered will
include structured data representation such as XML, view integration
techniques, machine learning techniques for turning web sites into structured
data sources, high-performance query execution systems based on dataflow,
constraint-based integration systems, and approaches to resolving naming
inconsistencies across sites.
The class will be run
as a lecture course with lots student participation and hands-on
experience. As an integral part of the
course each student will develop and build their own integrated Web application
using the research and tools covered in the class.
Prerequisites:
CSCI561 -- Introduction to
AI
CSCI585 – Database Systems
Recommended Course:
CSCI571— Issues of
Programming Language Design
Grading:
Class participation -- 10%
Homework -- 30%
Course project -- 30%
Mid-term and Final Exam -- 30%
Books: There is no required
textbook. We will be reading technical
papers on each topic.
·
Lecture
1 (January 16)
o
Topics
§
Introduction
§
Overview
of the course
§
Application
demonstrations
§
Course
requirements
§
Course
projects
·
Lecture
2 (January 23)
o
Topics
§
Wrapper Learning
§
Agent Builder
·
Lecture
3 (January 30)
o
Topics
§
Automatic Wrapper Generation
§
Advanced Agent Builder
o
Homework
§
Assignment 1 Due
·
Wrapper building
·
Lecture
4 (February 6)
o
Topics
§
Wrapper Maintenance
§
Automatic Source Modeling
§
.Net Framework
o
Homework
§
Assignment 2 Due
·
Wrapper building
·
Lecture
5 (February 13)
o
Topics
§
The Semantic Web
§
XML Query Processing
o
Homework
§
Assignment 3 Due
·
.Net Integration
·
Lecture
6 (February 20)
o
Topics
§
Dataflow Architectures for Plan Execution
§
Theseus Agent Execution System
o
Homework
§
Assignment 4 Due
·
XQuery
·
Lecture
7 (February 27)
o
Topics
§
Adaptive Execution Strategies
§
Speculative Execution
§
Advanced Theseus
o
Homework
§
Assignment 5 Due
·
Plan Execution in Theseus
·
Lecture
8 (March 6)
o
Topics
§
Constraint Integration
§
Heracles Constraint Integration System
o
Homework
§
Homework 6 Due
·
Advanced execution in Theseus
§
Written Project Proposals Due
·
Lecture
9 (March 13)
o
Topics
§
Schema Mapping
§
Record Linkage
o
Homework
§
Homework 7 Due
·
Constraint Integration in Heracles
o
Exams
§
Mid-term
·
Spring
Break (March 20)
·
Lecture
10 (March 27)
o
Topics
§
Query Planning for Information Integration
·
Lecture
11 (April 3)
o
Topics
§
Case Studies of Integration Systems:
·
Global as View Systems
·
Local as View Systems
o
Homework
§
Project Status Report Due
·
Lecture
12 (April 10)
o
Topics
§
Optimizing Query Plans
·
Planning by rewriting
·
Selective Materialization
·
Sensing Actions
·
Lecture
13 (April 17)
o
Topics
§
Geospatial Data Integration
§
Q&A Review
·
Project
Presentations (April 24)
·
Project
Presentations (May 1)
·
Final
Exam (TBD)