CSCI 599: Information Integration on the Web
Spring 2003
Instructor: Craig Knoblock (knoblock@isi.edu)
Office Hours:
- Tuesday 1:00pm - 2:00pm (PHE 416)
- Tussday 4:50 - 5:30 (THH 112)
- By Appointment (ISI 907)
Teaching Assistant: Snehal Thakkar (thakkar@isi.edu)
Office Hours:
- Monday 2-3pm (SAL 200c)
- Tuesday 10-11am (SAL 200c)
Meeting Time:
- Tuesday 2:00-4:50 (THH 119)
Location:
What's New
- April 29th, 2003: All lecture slides have been posted
- April 16th, 2003: Consolidate Operator for Theseus is Available. Documentation, Jar File
- April 14th, 2003:All readings have been posted on the web
- March 26th, 2003:Readings for Lecture 11, slides for Lecture 9,10 are posted
- March 7th, 2003:Readings for Lecture 9, 10 are posted
- March 5th, 2003:Lecture notes for Lecture 8, homework 7, and heracles download instructions are posted
- February 28th, 2003:Lecture notes for Lecture 7, readings for Lecture 8, and homework 6 are posted
- February 12th, 2003:Theseus release is available here
Course Materials: Notes, Slides, Papers, and Homeworks
- Lecture 1: Introduction
- Readings:
- The ariadne approach
to web-based information integration
C. Knoblock, S. Minton, J.L. Ambite, N. Ashish, I. Muslea, A. Philpot,
S. Tejada.
International the Journal on Cooperative Information Systems (IJCIS)
Special Issue on Intelligent Information Agents: Theory and Applications,10(1/2):145-169,
2001.
- The TheaterLoc
Application
G. Barish, C. Knoblock, Y.-S. Chen, S. Minton, A. Philpot, C. Shahabi.
In Proceedings of the 12th Annual Conference on Innovative Applications
of Artificial Intelligence (IAAI-200), Austin, TX, 2000.
- Slides:
- Lecture 2: Wrapper Learning
- Lecture 3: Automated Wrapper Generation
- Readings:
- Automatic
Data Extraction from Lists and Tables in Web Sources
Kristina Lerman,
Craig A. Knoblock and Steven Minton,
Automatic Text Extraction and Mining
workshop (ATEM-01), IJCAI-01, Seattle, WA, August 2001.
- RoadRunner: Towards
Automatic Data Extraction from Large Web Sites
W. Crescenzi, G. Mecca,
P. Merialdo,
The VLDB Journal, 109-118, 2001.
- Using Grammatical Inference to
Automate Information Extraction from the Web,
Theodore W. Hong and Keith
L. Clark,
Lecture Notes in Computer Science,2168, 2001
- Slides:
- Hw 2: Advanced AgentBuilder
- Lecture 4: Automatic Source Modeling
- Lecture 5: Dataflow Architectures
- Lecture 6: XQuery and Semantic Web
- Lecture 7: Dataflow Architectures (Cont'd)
- Readings:
- Speculative Execution for Information Gathering Plans,
Greg Barish and Craig A. Knoblock,
Proceedings of the Sixth International Conference on AI Planning and Scheduling (AIPS-2002), Toulouse, France. April 2002.
- Eddies: Continuously Adaptive Query Processing,
Ron Avnur and Joseph M. Hellerstein,
In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data,Dallas, TX, May 2000.
- Partial Results for Online Query Processing,
Vijayshankar Raman and Joseph M. Hellerstein,
In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data,Madison, Wisconsin, June 2003.
- Slides:
- HW 6: Advanced Theseus
- Lecture 8: Constraint Integration
- Readings:
- Mixed-Initiative, Multi-source Information Assistants,
Craig A. Knoblock, Steve Minton, Jose Luis Ambite, Maria Muslea, Jean Oh, and Martin Frank,
The Tenth International World Wide Web Conference (WWW10), Hong Kong, 2001.
- Smartclients: Constraint satisfaction as a paradigm for scaleable intelligent information system,
M. Torrens and B. Faltings,
In Workshop: Artificial Intelligence for Electronic Commerce (AAAI-00), 1999.
- Slides:
- HW 7 Heracles relase, Oracle XQuery program (optional), use username heracles and same password as theseus download password.
- Lecture 9: Record Linkage
- Readings:
-
Reconciling Schemas of Disparate Data Sources: A Machine Learning Approach,
A. Doan, P. Domingos, and A. Halevy
Proceedings of the ACM SIGMOD Conf. on Management of Data, 2001.
-
Learning domain-independent string transformation weights for high accuracy object identification,
Sheila Tejada, Craig A. Knoblock, and Steven Minton,
In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2002), Edmonton, Alberta, Canada, 2002.
- Interactive deduplication using active learning,
Sunita Sarawagi and Anuradha Bhamidipaty,
In Proc. of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD-2002), Edmonton, Canada, July 2002.
- Slides:
- Lecture 10: Data Integration
- Lecture 11: Data Integration (Continued)
Lecture 12: Data Integration (Continued)
- Readings:
- Semi-automatic wrapper generation for Internet information sources
Naveen Ashish and Craig A. Knoblock
Proceedings of the Second IFCIS International Conference on Cooperative Information Systems, Kiawah Island, SC, 1997.
- Information Gathering Plans with Sensing Actions
Naveen Ashish, Craig A. Knoblock, and Alon Levy
European Conference on Planning, ECP-97, Toulouse, France.
- Flexible and scalable query planning in distributed and heterogeneous environments,
Jose Luis Ambite and Craig A. Knoblock.
In Proceedings of the Fourth International Conference on Artificial Intelligence Planning Systems, Pittsburgh, PA, 1998.
- Slides:
Lecture 12: Data Integration (Continued)
- Readings:
- Integrating GIS and Imagery through XML-Based Information Mediation,
Gupta, R. Marciano, I. Zaslavsky and C. Baru,
Proc. NSF International Workshop on Integrated Spatial Databases: Digital Images and GIS, Portland, Maine, June 1999 (Lecture Notes in Computer Science, 1737, Springer-Verlag, 211-34.
- Slides: