Nowadays we have access to lots of data to make decisions, but it is difficult to combine these data to act on them. The problem is that these data are scattered in different sources, in different formats and schemas, and with no metadata to describe their meaning and provenance. Data can be in databases, Excel spreadsheets, CSV, XML or JSON files, or is accessible only via a Web service or REST API. My research objective is to help the consumers of these data to easily clean, transform and combine data to do analysis, and to help providers publish their data with the appropriate metadata so it is more useful to consumers.
Our approach is based on two ideas: semantics and examples. When tools understand the meaning of data, they can more effectively help users combine it in a meaningful way. To this end, we are developing techniques to semi-automatically infer the semantics of the data from examples. Users then show the system using the sample data how to they want the data combined and processed, and the system infers a workflow that can be used in batch on large datasets (big data).
I am interested in technology and applications. Our information integration toolkit Karma, is open source software that you can download to solve your information integration problems. I also collaborate with multiple organizations to apply Karma to build interesting applications in multiple domains such as intelligence analysis, bioinformatics, cultural heritage and business intelligence.