| | | |
| | |
| |
| | |
| |
| | |
| |
| | |
| |
| | Source modeling automatically builds rich semantic description of online sources, including the ontological types of the data provided by a source as well as the functional relationships between the inputs and outputs of a source. We use machine learning methods to learn from known sources and then build semantic models of new online sources. These techniques can be used to automatically discover and model new sources of information, which can then be integrated with other sources of data.
| |
| |
| | |
| |
| | |
| |
| | |
| |
| | The proliferation of data and data sources has compounded one of the Information Age's major ironies: incompatible data sources and inadequate methods for integration. We have developed data integration tools that integrate data at both the schema and data level. Our work includes the development of Prometheus, an information mediation system that facilitates uniform access to data sources; and EntityBases, a scalable record linkage approach to integrating entities across heterogeneous sources.
| |
| |
| | |
| |
| | |
| |
| | |
| |
| | The rapid expansion of geospatial data sources on the Internet has sparked tremendous possibilities in integrating those sources to provide new information. We have developed tools to automatically discovery maps, extract the various layers from those maps, and align them with current satellite or aerial imagery of a region. Through geospatial information fusion, we are integrating nontraditional sources, such as phone books, with existing sources such as satellite images, maps and vector data to automatically identify roads and structures in imagery.
| |
| |
| | Such techniques have significant applicability to problems such as earthquake disaster intervention and recovery.
| |
| |
| | |
| |
| | |
| |
| | |
| |
| | Mashups such as Zillow and Wikimapia offer an integrated, effective means to extract, integrate and view diverse information. But the process of creating Mashups often requires programming expertise, putting their creation out of reach of many otherwise capable Web users. We have developed a mashup building tool, called Karma, that enables users to create a Mashup in a seamless, interactive process. We are currently refining and deploying Karma to the problem of integrating data to help develop effective cancer treatments.
| |
| |
| | |
| |
| | |
| |
| | |
| |
| | There are vast amounts of Web data that is unstructured, ungrammatical and visually dissimilar from one another. By exploiting other sources of data within a given domain, called a reference set, we can much more effectively extract and query this large amount of unstructured data. These techniques can be used to more effectively organize and query data from sources such as Craigslist or eBay.
| |
| |
| | |
| |
| | |
| |
| | |
| |
| | As people interact on the social Web, their activity affects the structure of the Web itself, with complex feedback between individual and collective decisions producing qualitatively new online behaviors. We are developing a mathematical framework that both will model collective behavior of social web users and will use these models to predict trends. For example, the framework may help forecast which news stories on Digg or Twitter will become popular.
| |
| |
| | |
| |
| | |
| |
| | |
| |
| | Many social Web sites allow users to create content and annotate it with descriptive metadata, such as tags, and to organize content within personal hierarchies. Structured social metadata offers invaluable evidence for learning how a community organizes knowledge. But such metadata also tends to be sparse, shallow, ambiguous, noisy and inconsistent. We are developing machine learning methods to aggregate social metadata to improve information discovery and learn common taxonomies (folksonomies).
| |
| |
| | |
| |
| | |
| |
| | |
| |
| | |
| |
| | |
| |
| | |
| |
| | This project is funded by the National Institute of Mental Health. A data warehouse is created for all the demographic, phenotypic, genotypic data present on the NIMH website. The data warehouse is accessible through a simple web interface where researchers can query and save results. This data will also be integrated with new data sources to allow queries across the different data sources.
| |
| |
| | |
| |
| |
| |