4676 Admiralty Way, Marina del Rey, CA 90292
Detailed information about my work:
I like working with students. More info here.
Nowadays we have access to lots of data to make decisions, but it is difficult to combine these data to act on them. The problem is that these data are scattered in different sources, in different formats and schemas, and with no metadata to describe their meaning and provenance. Data can be in databases, Excel spreadsheets, CSV, XML or JSON files, or is accessible only via a Web service or REST API. My research objective is to help the consumers of these data to easily clean, transform and combine data to do analysis, and to help providers publish their data with the appropriate metadata so it is more useful to consumers.
Our approach is based on two ideas: semantics and examples. When tools understand the meaning of data, they can more effectively help users combine it in a meaningful way. To this end, we are developing techniques to semi-automatically infer the semantics of the data from examples. Users then show the system using the sample data how to they want the data combined and processed, and the system infers a workflow that can be used in batch on large datasets (big data).
I am interested in technology and applications. Our information integration toolkit Karma, is open source software that you can download to solve your information integration problems. I also collaborate with multiple organizations to apply Karma to build interesting applications in multiple domains such as intelligence analysis, bioinformatics, cultural heritage and business intelligence.
Here is a video that illustrates how we use Karma to publish the data from the Smithsonian American Art Museum as Linked Open Data:
If you want to schedule an appointment with me, do the following. 1) Take a look at my calendar to find free times. Note that I blocked off Monday afternoon and all day Wednesday to be on campus, so it shows busy, but check for the little blocks to see when I am really busy; 2) Send me email with suggested times.
I enjoy working with students. We currently have over 10 students working with us in various capacities, including two Ph.D. students, Mohsen Taheriyan and Bo Wu. I am happy to serve as dissertation advisor, and happy to guide Directed Research. There are many interesting projects that you can work on. Some are individual projects where you can work on a technical problem related to Karma, and others are group projects where you will work with a group of students to build an interesting application. For example, I am assembling a team of students for Fall 2013 to work on a smart museum tour guide that help visitors construct a customized museum tour that guides them to see the artworks they are interested in.
I expect DR students to work hard and produce an interesting demonstration, a video of the work and to contribute to a research publication (full conference paper, workshop paper or demo/poster paper). You must be prepared to work at least one day a week at ISI; we will give you office space and there is a shuttle that runs between campus and ISI every day. At ISI you will enjoy the opportunity to interact with other students and all the members of our team. I will also meet with you an additional day per week on campus to review progress and discuss any issues that come up. If you are interested, send me your CV and we can set up an appointment to discuss research topics. I also require students to complete a small programming assignment before approving the DR.
I like to meet with students at the Tutor Cafe right between Tutor Hall and EEB, right here. You will find me there at least every Wednesday between 2:30 and 3:30pm
At ISI I work in Craig Knoblock's Information Integration Group, and I collaborate very closely with him on most projects. I collaborate with Jose Luis Ambite on information integration topics, with Gully Burns on bioinformatics data integration, with Yolanda Gil on provenance and workflows, with Yao-Yi Chiang on data mining and geospatial data integration, and with Rajiv Maheswaran and Yu-Han Chang in analysis of spatio-temporal data.
I am working with Rudi Studer, Andreas Harth and Steffen Stadtmüller from KIT on combinging their Dat-Fu engine with Karma to support integration of dynamic data; with Ruben Verborgh from Ghent University to integrate an entity resolution (reconciliation) and entity extraction capabilities in Karma; with Freddy Priyatna in Oscar Corcho's group to use Karma in his work with Google Fusion tables; with Alex Viggio and other folks from the VIVO community to use Karma as a data ingestion tool for VIVO; with Rachel Allen from the Smithsonian American Art Museum and Eleanor Fink on our work to publish museum data to the Linked Open Data cloud; with Joan Cobb from the Getty on publication of the Getty vocabularies to the Linked Open Data cloud; with Miel Vander Sande and an enthusiastic group of USC undergraduates to adapt his wonderful everythingisconnected.be work to produce stories using the Smithsonian American Art Museum data.
I am always looking for new opportunities to collaborate, so please send me a note if you see any topics of mutual interest. Nowadays, I attend the Semantic Web conferences (ISWC and ESWC) and the Intelligent User Interfaces Conference (IUI), so look for me there.
In the past, I was conference chair for UIST and IUI, and I was IUI program co-chair in 2013. I regulary review for HCI, semantic web and AI conferences. I figure I should review at least as many papers as I send. I often have at least 2 coauthors, so things should balance out.
Lately, I became interested in promoting Semantic Web in Latinoamerica. In 2012 and 2013 I taught summer courses on Semantic Web in the Universidad de los Andes, my undergraduate college, and Pontificia Universidad Javeriana, both in Bogota, Colombia. Both times I had enthusiastic students and it was a pleasure to teach the course. I intend to go back every summer to teach this class (I would like to do it in Medellin in 2014, and I need an invitation, hint?). I am also working with a team from the Universidad de los Andes in Bogota and Universidad Simon Bolivar in Caracas on a bid to host the 2015 International Semantic Web Conference, yes ISWC, in Latinamerica.
There is a group of latinamerican Semantic Web researchers, scattered all around the world, but eager to work to promote Semantic Web technologies in latinamerica. Boris Villazon-Terrazas is doing the heavy lifting organizing the group, kudos to him, and if you can help, please email me or contact Boris.
Here is a list of other proejcts I worked on in the past.
Scientific metadata containing semantic descriptions of scientific data is expensive to capture and is typically not used across entire data analytic processes. We present an approach based on Karma and WINGS where semantic metadata is generated as scientific data is being prepared, and then subsequently used to configure models and to customize them to the data. The metadata captured includes sensor descriptions, data characteristics, data types, and process documentation. This metadata is then used in a workflow system to select analytic models dynamically and to set up model parameters automatically. In addition, all aspects of data processing are documented, and the system is able to generate extensive provenance records for new data products based on the metadata. Papers: ISWC'2011, Poster: AGU'2011. Collaborators: Craig Knoblock, Yolanda Gil, Tom Harmon.
Movement data can be combined with geospatial information and transformed into probabilistic graphical models that represent both social and temporal relationships between objects in the observed area. We then apply machine-learning techniques to cluster patterns in these graphical models to assist human users in performing strategic level analysis such as behavior prediction and anomaly detection. We extended Karma to enrich the track data with information that provides the features that the analysis algorithms need to detect patterns of life in the track data and enables users to understand the context of the data. Collaborators: Craig Knoblock, Yu-Han Chang, Rajiv Maheswaran.
We extended Karma to enrich sensor data with information that enables users to understand the context in which the sensor data was collected. With Karma, users first extract from open sources a variety of information for the region of interest: business information from online phone books and directories, weather, news, events, and road vectors from raster maps. Once information is extracted, Karma helps users geolocate it, and then integrate it and associate it with sensors and buildings. Karma then exports all the information in KML layers, so that the Cosmopolis game engine can load it and show it in the 3D environment. Collaborators: Craig Knoblock, Mike Zyda.
COMPASS is an interactive real-time tool that analyzes schedule uncertainty for a stochastic task network. An important feature is that it concurrently calculates stochastic critical paths and critical tasks. COMPASS visualizes this information on top of a traditional Gantt view, giving users insight into how delays caused by uncertain durations propagate down the schedule. Users use sliders to adjust the distribution of the duration of any set of activities and see in real-time the effects on the start and end times of activities, and the critical paths they give rise to. Evaluations with 10 users show that users can use \compass to answer a variety of questions about the possible evolutions of a schedule (e.g., what is the likelihood that all activities will complete before a given date?). Movie: COMPASS features, Paper: IUI'2012. Students: Yan Wang, Huihui Chen, Karan Singh. Collaborators: Rajiv Maheswaran, Yu-Han Chang
VizScript is a generic framework that expedites the process of creating visualizations to debug and under- stand complex multi-agent systems. VizScript combines a generic application instrumentation, a knowledge-base, and simple scene definitions primitives with a reasoning system, to produce an easy to use visualization platform. Using VizScript, users are able to recreate the visualizations for a complex multi- agent system with an order-of-magnitude less effort than was required in a Java implementation. Papers: IUI'2008a, IUI'2008b. Student: Jin Jing (graduated), Collaborators: Rajiv Maheswaran, Romeo Sanchez.
VizPattern is an interactive visual query environment that uses a comic strip metaphor to enable users to easily and quickly define and locate complex temporal patterns. Evalu- ations provide evidence that VizPattern is applicable in many domains, and that it enables a wide variety of users to answer questions about temporal data faster and with fewer errors than existing state-of-the-art visual analysis systems. Movie, Papers: VL/HCC'2009, VAST'2010. Student: Jin Jing (graduated).
The Living Classroom project seeked to offer children unique, differentiated learning experiences that reflect their specific needs and to do so with the affordability and ease of use that brings premiere teaching within the reach of every American student. Its goal: to provide teachers with the comprehensive, integrated information they need to customize each child's experiences readily every school day. Project web site
The objective is to create distributed intelligent software systems that will help fielded units adapt their mission plans as the situation around them changes and impacts their plans. Intelligent software Coordinators do this by reasoning about the tasks assigned to a given unit, the task timings, how the tasks interact with those of other units, and by evaluating possible changes such as changing task timings, task assignments, or selecting from pre-planned contingencies. Movies: AAMAS/ICAPS 2010 demo, system, Papers: ICAPS'2005 (wkshp), AAAI SS'2006, AAMAS'2006, AAMAS MSDM'2006, AAMAS LSMAS'2006, AAAI'2007, AAMAS'2008, ICAPS'2008, AAMAS'2009, ICAPS PSUU'2010, AAMAS OPTMAS'2010, AAMAS'2010 (best demo), AAMAS CHACIE'2010, AAMAS ARDE'2010, PRIMA'2011
The objective is to develop intelligent autonomous radio relay nodes that exploit movement to establish and manage mesh networks in urban settings using small, inexpensive, smart robotic radio relay nodes. As the situation changes, the nodes will adapt the network, self-healing if nodes are destroyed, stretching if soldiers move. Through movement and density, the LANdroids will enable effective communications in complex non-line-of-sight environments. Our bio-inspired, distributed control algorithm called TENTACLES directs robots' exploration to grow tentacles starting from soldiers of the gateway, establishing links when tentacles meet. Tentacles are disolved when they fail to meet other tentacles. Simulation movies 1, 2, 3. Paper: IROS'2009
The Commander's Coordinator retrieves current status from the units in the field and assembles and integrated plan overview that shows Commanders who is doing what and where. It monitors plan execution predicting possible failures, presents options for Commanders to choose from, intelligently alerts affected units about possible ripple effects, and distributes plan modifications to repair plans.
The objective was to build a human-in-the-loop scheduling system that enables operators to state scheduling goals and that helps them refine their goals and the schedule to produce schedules that balance many conflicting goals. The system was implemented in the context of US Marines aviation, and was used to help operators produce weekly and daily flight schedules. The system was deployed for testing in the US Marines air base in Yuma Arizona and onboard ships operating in the Middle East. More details available from the Final Report to DARPA, ANTS ebook and the project web site.
The objective was to build a system to enable end-users to specify rules for specifying the best contracts for fullfilling orders in a purchasing system. The system receives requisitions for items in electronic form, queries a database of preferred vendors to determine possible sources of supply, and applies rules to filter and rank these sources according to policies established by contract managers. The system was developed and tested in the context of applications for the Defense Logistics Agency (DLA). Project final documentation.