Completed Projects

AMR

Abstract Meaning Representation

The AMR Bank is a set of English sentences paired with simple, readable semantic representations. We hope that it will spur new research in natural language understanding, generation, and translation. Please visit the AMR page for more details. Thanks to NSF (IIS-0908532) for funding the initial design of AMR, and to DARPA MRP (FA-8750-09-C-0179) for supporting a group to construct consensus annotations and the AMR Editor. The initial AMR bank was built under DARPA DEFT FA-8750-13-2-0045 (PI: Stephanie Strassel; co-PIs: Kevin Knight, Daniel Marcu, and Martha Palmer) and DARPA BOLT HR0011-12-C-0014 (PI: Kevin Knight).

Visit AMR

AMR banner
Robot and person shaking hands in front of a computer screen

CWIC

Communicating Intelligently with Computers

In this project, we work on human/robot communication, and on creative language generation. This work is supported by Contract W911NF-15-1-0543 with the US Defense Advanced Research Projects Agency (DARPA).

DECODE

Deciphering Historical Manuscripts

In collaboration with colleagues at Uppsala University (Sweden), we are collecting enciphered manuscripts from the European modern era (1600-1800) and developing software to automatically decipher them. This work is supported by the Swedish Research Council and a gift from Google, Inc.

Scripts of old language
Drawing of people in front of word connected with lines and dots

Discerning Group Biases in Online Communities via Linguistic Analysis

Cultural models are shared understandings of the world by particular cultures, which help individuals interpret the world around them. These models help provide people with internal mental heuristics for what is right, what is just, what is edible, among others. Understanding these models helps us to better understand the nature of the cultures which adopt them, as they provide rules for everyday life. The underlying mechanisms for these models are currently collected through ethnographic analysis undertaken by social scientists using specialized tools like interviews with key informants, pile sorts, and other techniques to better extract the underlying workings of a culture. While social scientists can use specialized tools like interviews with key informants, pile sorts, this “thick data” about the cultures being studied. It is cumbersome and time-consuming, and often requires months to years of fieldwork from social scientists to understand these models and can not easily scale across different types of stereotypes. Fortunately, these shared understandings are redundancies that can be exploited by machine learning techniques to quickly understand the underlying principles of a culture. In the past decade, enormous quantities of textual data, including product reviews, discussion forums, and digital contents, have been generated by online communities. This provides an opportunity and challenge for computer scientists and social scientists to study the underlying group biases present in the online content at scale. The primary scientific goal of the project is to develop a computational framework for rapid data-driven ethnographic analysis of online communities using statistical machine learning and computational linguistic tools. This work is carried out with funding from the DARPA UGB program, HR0011890019.

Visit project

DIG

Domain-Specific Insight Graphs

DIG is a domain-specific indexing, search and analysis system. The DIG system harnesses state-of-the-art open source software combined with an open architecture and flexible set of APIs to facilitate the integration of a variety of extraction and analysis tools.

Please visit the DIG page for more details. This research is supported in part by the Defense Advanced Research Projects Agency (DARPA) and the Air Force Research Laboratory (AFRL) under contract number FA8750-14-C-0240, and in part by the National Science Foundation under Grant No. 1117913.

Visit DIG

DIG banner
Cube getting disassembled

ELICIT

A System for Extracting and Organizing Causal Information

Optimal decision making requires ready access to disparate sources of structured (e.g., databases) and unstructured (e.g., natural language) information. ELICIT researchers are developing a framework that integrates concepts of causality, factual knowledge, and meta-reasoning into a model-driven knowledge graph representation that allows decision makers to access relevant knowledge. Teammates include Rensselaer Polytechnic Institute, CMU, and Lockheed Martin ATL. Sponsored by the Defense Advanced Research Projects Agency (DARPA) Causal Exploration program (FA8650-17-C-7715).

visit elicit

L2K2R2

Learning to Know to Read

Scientists are overwhelmed with scientific literature. If we can build machines to read scientific papers and understand them, we can help science move faster. This work is sponsored by DARPA Big Mechanism (W911NF-14-1-0364).

Stack of old paper
Amazon Echo speaker

Viola

Neural Generation with Improvised Dialogues and Common Sense Reasoning

The goal of the Alexa Prize Challenge is to build a social bot that can have a conversation with users that receives an average rating of 4.0+/5 and lasts more than 20 minutes. Our approach draws inspiration from improvisational theatre and common sense reasoning to build a social bot that is robust beyond the preset topics that bounded previous systems. This work is sponsored by Amazon.

Visit Viola