The Natural Language Group


Current Projects

CWIC: Communicating Intelligently with Computers

In this project, we work on human/robot communication, and on creative language generation.This work is supported by Contract W911NF-15-1-0543 with the US Defense Advanced Research Projects Agency (DARPA).


ELICIT:  A System for Extracting and Organizing Causal Information

Optimal decision making requires ready access to disparate sources of structured (e.g., databases) and unstructured (e.g., natural language) information.  ELICIT researchers are developing a framework that integrates concepts of causality, factual knowledge, and meta-reasoning into a model-driven knowledge graph representation that allows decision makers to access relevant knowledge.  Teammates include Rensselaer Polytechnic Institute, CMU, and Lockheed Martin ATL. Sponsored by the Defense Advanced Research Projects Agency (DARPA) Causal Exploration program (FA8650-17-C-7715)

ELISA: Exploiting Language Information for Situational Awareness

Today's automatic parsers, translators, extractors, and dictionaries cover a tiny fraction of the world's languages. Can we use general knowledge of how language works to extend the reach of natural language tools?  In this project, we develop technology for rapidly constructing information extraction (IE), machine translation (MT), and topic and sentiment processing capabilities for new languages.  Our collaborators are ICSI, Brno University of Technology, University of Pennsylvania, University of Notre Dame, Rensselaer Polytechnic Institute, and Next Century, Inc.  This work is carried out with funding from DARPA (HR0011-15-C-0115).

SARAL: Summarization and Domain-Adaptive Retrieval Across Languages

How can a monolingual English speaker access and understand text and speech material in low-resource foreign languages?  In this project, we develop cross-lingual retrieval and summarization techniques that will work for any language in the world, given minimal resources to work with.  Our collaborators are University of Notre Dame, Rensselaer Polytechnic Institute, Massachusetts Institute of Technology, Northeastern University, University of Massachusetts, and Idiap. This work is supported by the IARPA MATERIAL program.  (Acknowledgement: The research is based upon work supported by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via  AFRL Contract FA8650-17-C-9116.  The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon.)

Previous Projects

AMR: Abstract Meaning Representation

The AMR Bank is a set of English sentences paired with simple, readable semantic representations.  We hope that it will spur new research in natural language understanding, generation, and translation.  Please visit the AMR page for details. Thanks to NSF (IIS-0908532) for funding the initial design of AMR, and to DARPA MRP (FA-8750-09-C-0179) for supporting a group to construct consensus annotations and the AMR Editor. The initial AMR bank was built under DARPA DEFT FA-8750-13-2-0045 (PI: Stephanie Strassel; co-PIs: Kevin Knight, Daniel Marcu, and Martha Palmer) and DARPA BOLT HR0011-12-C-0014 (PI: Kevin Knight).

DECODE: Deciphering Historical Manuscripts

In collaboration with colleagues at Uppsala University (Sweden), we are collecting enciphered manuscripts from the European modern era (1600-1800) and developing software to automatically decipher them. This work is supported by the Swedish Research Council and a gift from Google, Inc.


DIG: Domain-Specific Insight Graphs

DIG is a domain-specific indexing, search and analysis system. The DIG system harnesses state-of-the-art open source software combined with an open architecture and flexible set of APIs to facilitate the integration of a variety of extraction and analysis tools.  Please visit the DIG page for more details.  This research is supported in part by the Defense Advanced Research Projects Agency (DARPA) and the Air Force Research Laboratory (AFRL) under contract number FA8750-14-C-0240, and in part by the National Science Foundation under Grant No. 1117913.


L2K2R2: Learning to Know to Read

Scientists are overwhelmed with scientific literature. If we can build machines to read scientific papers and understand them, we can help science move faster.  This work is sponsored by DARPA Big Mechanism (W911NF-14-1-0364).