Chen Li
"Integrating Information from Heterogeneous Data Sources"
1/24/2003: 10:30 AM - 12:00 PM
11th Floor Small Conference Room
Abstract: The goal of information integration is to support seamless access to
heterogenous, autonomous data sources. Many data-integration systems
use a mediation architecture, in which a mediator accepts a user query
and answers the query by accessing relevant sources through wrappers.
In this talk I will focus on two research problems in information
integration. The first one is how to do query processing and
optimization in the presence of limited query capabilities, i.e., data
sources do not allow simple scans of their data. I will discuss
several challenges such as how to describe source restrictions, how to
compute mediator capabilities, and how to answer queries efficiently.
The second problem is efficient record linkage. That is, given two
lists of records from two different sources, we want to determine all
record pairs that are similar to each other, where the overall
similarity between two records is defined based on domain-specific
similarities over individual attributes constituting the record. I
will report some of the initial results of our research conducted in
the Flamingo Project on Data cleansing.
About Chen Li: Dr. Chen Li is an assistant professor in the Department of Information
and Computer Science at the University of California, Irvine. He
received his Ph.D. degree in Computer Science from Stanford University
in 2001, and his B.S. degree in Computer Science from Tsinghua
University, China, in 1994. His research interests are in the fields
of database and information systems, including data integration, data
warehouses, data cleansing, multimedia databases, and XML.
Last updated: Mon Jun 19 17:44:06 2006
 |