Current Projects

CHIRON: Complementing Human Intelligence to Recognize Opponent Narratives
Detecting Bias in the Law
SADIRI: Stylometric Authorship Discernment & Interpretation for Realistic Inputs
SBIR: Machine Translation for Indo-Pacific Low Resource Languages

CHIRON

Complementing Human Intelligence to Recognize Opponent Narratives

Whether engaged in high-stakes negotiation or in line to purchase coffee, humans constantly make decisions that come down to trusting the word of some other communicating entity that may or may not have malicious intent---and that may or may not be human. Our studies have shown that an entity projecting confidence is likely to be believed by an unsuspecting human, even when the entity provides false or even harmful information. For particularly thorny decisions and sensitive situations, it can be helpful to have a friend to talk to, solicit advice from, explore alternate opinions with, and bounce ideas off of. The goal of this project is to create a centaur, a human user with an ever-present machine advisor, who is better at navigating a complex communication landscape than a human alone. This work is a collaboration with the University of Maryland and the University of Sydney. This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Agreement No. HR00112490374. Approved for public release; distribution is unlimited.

Caricature of Person with Horse Face with Arms Crossed Wearing Suit

Blindfolded Lady Justice next to a Massive Click Mouse Cursor

Detecting Bias in the Law

Biases in language models (LMs) can cause real-world financial and legal harms to targets of those biases. For instance, LMs are used to make investment decisions based on company news and earnings calls; if an investment is concerned with, say, a particular ethnic group, a biased model may misinterpret ambiguous speech in an unflattering light, leading to a decision not to invest. LMs have been used to make first pass hiring decisions and have specifically disfavored female applicants. We anticipate the use of LMs in the acceleration of legal decision-making, an especially important and high-risk domain. We believe the first step to combatting bias is to identify it, ideally before actual harm is done. In this work we will develop a technique for detecting bias in text, specifically in the legal subdomain. Our previous work showed that it is possible to alter a model's bias by simply exposing it to text from a targeted community; the resulting model is changed in bias relative to the bias of the community, as reflected by the words they use. We will develop tools that can take a step further than simply identifying bias and will leverage key explainability components of current architecture to determine specific words, phrases, and documents in the legal corpus that contribute to the bias. The identification of these documents can be the first component in a system that reveals the harms specific laws may impose on targeted communities. This work is a collaboration with USC's Gould School of Law.

SADIRI

Stylometric Authorship Discernment & Interpretation for Realistic Inputs

The goal of SADIRI is to develop effective, explainable, and multi-lingual technology for authorship attribution and privacy protection. We are developing techniques to generate a linguistically-informed feature space that is robust to topic and genre shift, allowing us to generate an “authorial fingerprint” vector for each document (or set of documents) that the system reads. This fingerprint is a neural representation of the author’s unique “style”, distinct from any content present in the original documents. We use these fingerprints directly to perform authorship attribution tasks (“who wrote this tweet?”) and probe them automatically to generate meaningful, linguistic explanations for system decisions. Working on the flip side of the attribution problem, we are also developing techniques to “translate” documents from one style to another (replacing the author’s original “fingerprint”), allowing us to protect author privacy while preserving meaning and fluency. Our collaborators are the University of Michigan, the University of Maryland, and the University of Birmingham. This research is based upon work supported by IARPA's HIATUS program, contract #2022-22072200006.

Person Holding Phone in their Hand with Speak Now Message

SBIR

Machine Translation for Indo-Pacific Low Resource Languages

Most of the languages present in the Indo-Pacific region are not supported by commercial translation services. The Indo-Pacific region exhibits the greatest language diversity of any comparable region in the world. For instance, the expansive Austronesian family encompasses over 1,200 languages, while the diverse Papuan languages, spread across multiple families and unclassified languages, account for more than 800, making Papua New Guinea the most linguistically diverse place on earth. Research in Natural Language Processing (NLP) has spawned a wider range of practical applications, from machine translation to stance detection and summarization technologies. However, these advances have been primarily concentrated on languages spoken within educated, industrialized and affluent societies. Furthermore, in the commercial realm, the majority of translation technologies are accessed via APIs, lacking customized model training. Moreover, most commercial services carry out higher-level analysis like stance detection on translated media, rather than in the original language, where best practices would advise development. We previously developed any-to-English Multilingual translation model capable of translating more than six hundred languages to English. In this SBIR we will transition the model's zero-shot and fine-tunable translation functionality to a user-friendly interface for practical deployment. This work is a collaboration with Inferlink Corp. (Prime).