Proceedings of

 

IJCAI-03 Workshop on

Information Integration on the Web

(IIWeb-03)


 

 

 

 

 

 

 


August 9 - 10, 2003
Acapulco, Mexico


 

 

 


Edited by:
Subbarao Kambhampati, Arizona State University
Craig A. Knoblock, University of Southern California

 

 

 

 

 

IJCAI-03 Workshop on Information Integration on the Web


Organizing Committee

Craig Knoblock (Co-Chair), University of Southern California
Subbarao Kambhampati (Co-Chair), Arizona State University
Lise Getoor, University of Maryland
Alon Halevy, University of Washington
Sheila McIlraith, Stanford University

Program Committee

William Cohen, Carnegie Mellon University
Hasan Davulcu, Arizona State University
Anhai Doan, University of Illinois, Urbana-Champaign
Juliana Freire, Oregon Graduate Institute
C. Lee Giles, Pennsylvania State University
Joseph M. Hellerstein, University of California, Berkeley
Nick Kushmerick, University College Dublin
Andrew McCallum, University of Massachusetts Amherst
Giansalvatore Mecca, Universit¨¤ della Basilicata
Renee Miller, University of Toronto
Ami Motro, George Mason University
Jeffrey Naughton, University of Wisconsin
Louiqa Raschid, University of Maryland
Marie-Christine Rousset, University of Paris-Sud
Sheila Tejada, University of New Orleans

Sponsored by the Research Institute for Advanced Computer Science (RIACS)


 

 

Foreword

 

Effective integration of heterogeneous databases and information sources has been cited as the most pressing challenge in spheres as diverse as corporate data management, homeland security, counter-terrorism and the human genome project. An important impediment to scaling up integration frameworks to large-scale applications has been the fact that the autonomous and decentralized nature of the data sources constrains the mediators to operate with very little information about the structure, scope, profile, quality and inter-relations of the information sources they are trying to integrate.


The purpose of this workshop is to bring together researchers that are working in a variety of areas that are all related to the larger problem of integrating information on the Web. This includes research in the areas of machine learning, data mining, automated planning, constraint reasoning, databases, view integration, information extraction, semantic web, web services, and other related areas.


We were fortunate to assemble a diverse group of researchers from the AI and DB communities to help us in organizing this workshop. The workshop call for papers had a very good response. We received 40 submissions spanning a diverse set of issues relevant to information integration. Each submission was reviewed by at least two members of the program committee. Lise Getoor independently coordinated the reviews of papers co-authored by the co-chairs.


To encourage discussion, the workshop program is structured into topic-oriented panels and poster sessions. In addition to the contributed papers, the program also contains two invited panels--one on the perspectives of companies engaged in information integration technology and the other on the perspectives of funding agencies.


We would like to thank our organizing and program committees for their many invaluable inputs and thoughtful reviews. We thank Alma Nava for handling the review process and both Alma Nava and Kristin Ghent for assembling the proceedings. We would also like to thank the Research Institute for Advanced Computer Science (RIACS) for providing financial support for the workshop.


Subbarao Kambhampati
Craig Knoblock

Workshop Co-Chairs

 

 

 

 

 

 

 

 

 

Workshop Schedule

 

Aug 9, 2003

 

8:45-9:00am Opening Remarks

 

9:00-10:30am: Wrapping and Extracting (Chair: Nick Kushmerick)

 

Integrating Information to Bootstrap Information Extraction from Web Sites

Fabio Ciravegna, Alexiei Dingli, David Guthrie and Yorick Wilks

 

Trainability: Developing a responsive learning system

Steven N. Minton, Sorinel I. Ticrea and Jennifer Beach

 

On the Power of Semantic Partitioning of Web Documents

Guizhen Yang, Saikat Mukherjee, Wenfang Tan, I.V. Ramakrishnan & Hasan Davulcu

 

10:30-11:00am: Coffee Break

 

11:00-12:00pm: Name Matching (Chair: Andrew McCallum)

 

Employing Trainable String Similarity Metrics for Information Integration

Mikhail Bilenko and Raymond J. Mooney

 

A Comparison of String Distance Metrics for Name-Matching Tasks

William W. Cohen, Pradeep Ravikumar, Stephen E. Fienberg

 

12:00-1:00pm: Lunch Provided

 

1:00-2:30: Schema Matching (Chair: Subbarao Kambhampati)

 

Evaluating Matching Algorithms: the Monotonicity Principle

Ateret Anaby-Tavor, Avigdor Gal and Alberto Trombetta

 

Object Matching for Information Integration: A Profiler-Based Approach

AnHai Doan, Ying Lu, Yoonkyong Lee and Jiawei Han

 

Corpus-based Schema Matching

Jayant Madhavan,  Philip Bernstein, Kuang Chen, Alon Halevy and Pradeep Shenoy

 

2:30-4:30: Poster Session (overlaps with coffee break)

      

     2:30-3:00pm 2-minute poster advertisements

 

Wrapping and Extracting

Information Extraction from Tree Documents by Learning Subtree Delimiters

Boris Chidlovskii

 

 

Reconfigurable Web Wrapper Agents for Web Information Integration

Chun-Nan Hsu, Chia-Hui Chang, Harianto Siek, Jiann-Jyh Lu, Jen-Jie Chiou

 

Expressive Power of Tree and String Based Wrappers

Daisuke Ikeda, Yasuhiro Yamada and Sachio Hirokawa

 

Domain Event Extraction and Representation with Domain Ontology

Shih-Hung Wu, Tzong-Han Tsai and Wen-Lian Hsu

 

Name Matching

Toward Conditional Models of Identity Uncertainty with Application to Proper Noun Coreference

Andrew McCallum and Ben Wellner

 

            Meta-Data and Statistics

A Method for Semantically Enhancing the Service Discovery Capabilities of UDDI

Rama Akkiraju, Richard Goodwin, Prashant Doshi and Sascha Roeder

 

Source Update Capture in Information Agents

Naveen Ashish, Deepak Kulkarni and Yao Wang

 

Registry-Based Support for Information Integration

Deborah L. McGuinness and Paulo Pinheiro da Silva

 

            Query Processing and Execution

Combining Classification and Transduction for Value Prediction in Speculative Plan Execution

Greg Barish and Craig A. Knoblock

 

Visual Programming of Web Data Aggregation Applications

Robert Baumgartner, Georg Gottlob and Marcus Herzog

                                                                                                                                   

Two-phase Query Modification using Semantic Relations based on Ontologies

Kaoru Hiramatsu, Jun-ichi Akahani and Tetsuji Satoh

 

Integrating Information, Applications and Services on the Web

Juan C. Lavariega and Lorena G. Gomez-Martinez

 

Representation &  Management

 

An Ontology-Based Knowledge Management Platform

Arantza Aldea, Rene Banares-Alcantara, Jaime Bocio, Javier  Gramajo, David Isern, Antonis Kokossis, Laureano Jim¨¦nez, Antonio Moreno and David Riano                                                                        

 

Concept Linking for Information Integration in Open Book and Sentinel

Stuart Watt

 

Building Data Integration Systems: A Mass Collaboration Approach

AnHai Doan and Robert McCann

 

3:30-4:00: Coffee Break

 

4:30-6:00: Panel on The Economics of Information Integration: The Practical View of II on the Web (Chair: William Cohen)

 

William Cohen, Carnegie Mellon

Alon Halevy, University of Washington

Steven Minton, Fetch Technologies

David Pennock, Overture

 

 

 

 

 

Aug 10, 2003

 

9:00-10:45am: Meta-Data and Statistics (Chair: Chen Li)

 

Statistics Gathering for Learning from Distributed, Heterogeneous and Autonomous Data Sources

Doina Caragea, Jaime Reinoso, Adrian Silvescu, and Vasant Hanovar

 

Deep Annotation for Information Integration

Siegfried Handschuh, Steffen Staab, Raphael Volz and Leo Meyer

 

Automatically attaching semantic metadata to Web Services

Andreas Hess and Nicholas Kushmerick

 

Frequency-Based Coverage Statistics Mining for Data Integration

Zaiqing Nie and Subbarao Kambhampati

 

10:45-11:15am: Coffee Break

 

11:15-12:30pm: Bio-informatics (Chair: Naveen Ashish)

 

Exploring Life Sciences Data Sources

Zoe Lacroix, Felix Naumann, Louiqa Raschid, & Maria Esther Vidal

 

Query Answering Using Ontologies in Agent-based Resource Sharing Environment for Biological Web Information Integrating

Jiann-Jyh Lu & Chun-Nan Hsu

 

12:30-1:30pm: Lunch Provided



 

1:30pm-3:30: Query Processing and Execution (Chair: Alon Halevy)

 

Towards Inconsistency Management in Data Integration Systems

Ariel Fuxman and Renee J. Miller

 

Querying Distributed Data through Distributed Ontologies: A Simple but Scalable Approach

Francois Goasdoue and Marie-Christine Rousset

 

Describing and Utilizing Constraints to Answer Queries in Data-Integration Systems

Chen Li

 

Efficient Execution of Recursive Integration Plans

Snehal Thakkar and Craig A.  Knoblock


3:30-4:00: Coffee Break

 

4:00-5:00: Panel on Future Funding for Information Integration

                       (Chair: Craig Knoblock)

 

Michael Pazzani, National Science Foundation

Barney Pell, National Aeronautics and Space Administration

Nick Kushmerick, on the Science Foundation Ireland

 

 

5:00-5:30: Closing Discussion

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Table of Contents

 

 

Wrapping and Extracting

 

Information Extraction from Tree Documents by Learning Subtree Delimitersˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­.3

Boris Chidlovskii

 

Integrating Information to Bootstrap Information Extraction from Web Sitesˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­...9

Fabio Ciravegna, Alexiei Dingli, David Guthrie and Yorick Wilks

 

Reconfigurable Web Wrapper Agents for Web Information Integrationˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­..15

Chun-Nan Hsu, Chia-Hui Chang, Harianto Siek, Jiann-Jyh Lu, Jen-Jie Chiou

 

Expressive Power of Tree and String Based Wrappersˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­.ˇ­21

Daisuke Ikeda, Yasuhiro Yamada and Sachio Hirokawa

 

Trainability: Developing a responsive learning systemˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­27

Steven N. Minton, Sorinel I. Ticrea and Jennifer Beach

 

Domain Event Extraction and Representation with Domain Ontologyˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­33

Shih-Hung Wu, Tzong-Han Tsai and Wen-Lian Hsu

 

On the Power of Semantic Partitioning of Web Documentsˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­.39

Guizhen Yang, Saikat Mukherjee, Wenfang Tan, I.V. Ramakrishnan & Hasan Davulcu

 

 

Schema Matching

 

Evaluating Matching Algorithms: the Monotonicity Principleˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­.47

Ateret Anaby-Tavor, Avigdor Gal and Alberto Trombetta

 

Object Matching for Information Integration: A Profiler-Based Approachˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­..53

AnHai Doan, Ying Lu, Yoonkyong Lee and Jiawei Han

 

Corpus-based Schema Matchingˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­...59

Jayant Madhavan,  Philip Bernstein, Kuang Chen, Alon Halevy and Pradeep Shenoy

 

 

Name Matching

 

EmployingTrainable String Similarity Metrics for Information Integrationˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­.67

Mikhail Bilenko and Raymond J. Mooney

 

A Comparison of String Distance Metrics for Name-Matching Tasksˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­.73

William W. Cohen, Pradeep Ravikumar, Stephen E. Fienberg

 

Toward Conditional Models of Identity Uncertainty with Application to Proper Noun

Coreferenceˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­79

Andrew McCallum and Ben Wellner

Meta-Data & Statistics

 

A Method for Semantically Enhancing the Service Discovery Capabilities of UDDIˇ­ˇ­ˇ­ˇ­ˇ­ˇ­.87

Rama Akkiraju, Richard Goodwin, Prashant Doshi and Sascha Roeder

 

Source Update Capture in Information Agentsˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­.....93

Naveen Ashish, Deepak Kulkarni and Yao Wang

 

Statistics Gathering for Learning from Distributed, Heterogeneous and Autonomous Data Sourcesˇ­........................................................................................................................................ˇ­..99

Doina Caragea, Jaime Reinoso, Adrian Silvescu, and Vasant Honavar

 

Deep Annotation for Information Integrationˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­.105

Siegfried Handschuh, Steffen Staab, Raphael Volz and Leo Meyer

 

Automatically attaching semantic metadata to Web Servicesˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­.111

Andreas Hess and Nicholas Kushmerick

 

Registry-Based Support for Information Integrationˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­...117

Deborah L. McGuinness and Paulo Pinheiro da Silva

 

Frequency-Based Coverage Statistics Mining for Data Integrationˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­123

Zaiqing Nie and Subbarao Kambhampati

 

 

Query Processing and Execution

 

Combining Classification and Transduction for Value Prediction in Speculative Plan Executionˇ­.131

Greg Barish and Craig A. Knoblock

 

Visual Programming of Web Data Aggregation Applicationsˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­137

Robert Baumgartner, Georg Gottlob and Marcus Herzog

 

Towards Inconsistency Management in Data Integration Systemsˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­.143

Ariel Fuxman and Renee J. Miller

 

Querying Distributed Data through Distributed Ontologies: A Simple but Scalable Approachˇ­ˇ­.149

Francois Goasdoue and Marie-Christine Rousset

 

Two-phase Query Modification using Semantic Relations based on Ontologiesˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­...155

Kaoru Hiramatsu, Jun-ichi Akahani and Tetsuji Satoh

 

Integrating Information, Applications and Services on the Webˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­159

Juan C. Lavariega and Lorena G. Gomez-Martinez

 

Describing and Utilizing Constraints to Answer Queries in Data-Integration Systemsˇ­ˇ­ˇ­ˇ­ˇ­..163

Chen Li

 

Efficient Execution of Recursive Integration Plansˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­169

Snehal Thakkar and Craig A.  Knoblock

 

Representation and Management

 

An Ontology-Based Knowledge Management Platformˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­.177

Arantza Aldea, Rene Banares-Alcantara, Jaime Bocio, Javier  Gramajo, David Isern, Antonis Kokossis, Laureano Jim¨¦nez, Antonio Moreno and David Riano

 

 

Building Data Integration Systems: A Mass Collaboration Approachˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­183

AnHai Doan and Robert McCann

 

Concept Linking for Information Integration in Open Book and Sentinelˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­..189

Stuart Watt

 

 

Bio-Informatics Integration

 

Query Answering Using Ontologies in Agent-based Resource Sharing Environment for Biological Web Information Integratingˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­...197

Jiann-Jyh Lu & Chun-Nan Hsu

 

Exploring Life Sciences Data Sourcesˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­203

Zoe Lacroix, Felix Naumann, Louiqa Raschid, & Maria Esther Vidal

 

 

Abstracts

 

Using Categorical Clustering in Schema Discoveryˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­...211

Periklis Andritsos and Renee J. Miller

 

Constraint-driven hierarchical information extractionˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­213

Thomas Lee

 

 

Author Indexˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­ˇ­...215