University of Southern California
ISI Site Signature

University of Southern California


Craig A. Knoblock
 
 
    
  Current Projects 
 Information Extraction 
  Wrapper methods provide extraction techniques for semi-structured sources, such as similarly-looking Web pages, but lots of data on the World Wide Web exists in an unstructured and ungrammatical form.  
 Map Extraction 
  Raster maps are widely available for areas around the globe and are an important source of geospatial data. Comparing to other geospatial data, raster maps are easily accessible and provide geographic features that are difficult to find elsewhere, such as landmarks in historical maps.  
 Mashup Construction 
  Mashups provide an integrated and effective approach to extract, integrate and view diverse information. Some interesting examples of Mashups on the Internet are Zillow and WikiMapia. However, the process of creating a Mashup often requires programming knowledge and background information of widgets to use existing technologies such as Yahoo Pipes and Intel Mashmaker.  
 Geospatial Information Fusion 
  The ability to reason over geospatial entities using publicly available information is greatly enhanced by the abundance of geospatial data sources on the Internet. Traditional data sources such as satellite imagery, maps, gazetteers and vector data have long been used in geographic information systems (GIS).  
 Map Discovery 
  There are a huge number of high quality maps on the Internet that can be used to extract useful geospatial information about the region they describe. For example, by aligning satellite images with these maps, we can label the streets automatically.  
 Source Modeling 
  Only a very small portion of data on the Web is semantically annotated and available for use within Information Integration applications. Semantically annotating existing Web sources requires significant manual effort that must be repeated for each new data source.  
 Data Integration 
  We can utilize various extraction techniques to extract data from a wide variety of sources. However, different sources often have different schemas, access methods, and coverage. To address this issue, we have developed a data integration framework called Prometheus that facilitates uniform access to the sources.  
 Entity Linkage 
  The current approaches for linking information across sources, often called record linkage, require finding common attributes between the sources and comparing the records using those attributes. This often leads to unsatisfactory results because the sources are often missing information or contain incorrect or outdated information.  
  Past Projects 
 Constraint-based Integration 
  People use search engines today to find information, but in many cases what people actually want is an application that allows them to access a set of related sources, extract the information they need, and integrate the data in ways that allow them to solve their problems.  
 Geospatial Data Alignment 
  We utilized a wide variety of geospatial and textual data available on the Internet in order to efficiently and accurately identify objects in the satellite imagery. To demonstrate the utility of our technique, we built an application that utilizes the satellite imagery from online sources to annotate buildings on the imagery.  
 Plan Execution 
  Theseus is an execution platform for information agents. Its goals are to allow complex information management plans to be easily specified and to provide an infrastructure that optimizes the execution of such plans.  
 Record Linkage 
  The task of object identification occurs when integrating information from multiple websites. The same data objects can exist in inconsistent text formats across sites, making it difficult to identify matching objects using exact text match.  
 Wrapper Learning 
  With the expansion of the Web, computer users have gained access to a large variety of comprehensive information repositories. However, the Web is based on a browsing paradigm that makes it difficult to retrieve and integrate data from multiple sources.  
 Wrapper Maintenance 
  Wrappers facilitate access to Web-based information sources by providing a uniform querying and data extraction capability. A wrapper for the yellow pages source can take a query for a Mexican restaurant near Marina del Rey, CA, for example, retrieve the Web page containing results of the query and extract the restaurant's name, its address and the phone number.  
Background