ISI at ACL ‘24

by Julia Cohen

Image
Published on August 8th, 2024Last updated on August 15th, 2024

At the 62nd Annual Meeting of the Association for Computational Linguistics (ACL), held August 11-16, 2024 in Bangkok, Thailand, researchers from USC’s Information Sciences Institute (ISI), a unit of the USC Viterbi School of Engineering, will present 11 research papers, covering a breadth of topics. The annual conference is one of the premiere conferences for natural language research.

Research Spotlights 2024

Spotting the Scoop: Using AI to Predict Newsworthiness

In the paper, Tracking the Newsworthiness of Public Documents, Alexander Spangher, a computer science Ph.D. student at the USC Viterbi School of Engineering who previously worked as a data scientist at The New York Times, addresses the challenge journalists face in identifying newsworthy stories from vast amounts of textual data, such as leaks, bills and press releases. The researchers focus on news coverage of local public policy in the San Francisco Bay Area by The San Francisco Chronicle. The team uses probabilistic relational modeling to gather and link news articles, public policy documents and meeting recordings, showing that this low-annotation linking methodology outperforms other retrieval methods. The study introduces a new task: newsworthiness prediction, aimed at determining if a policy item will receive coverage.

Winning a Board Game, but Not the Conversation

In More Victories, Less Cooperation: Assessing Cicero’s Diplomacy Play, researchers examine Cicero, an AI agent for the board game Diplomacy, focusing on its strategic and communicative abilities. ISI authors Yanze Wang, Ulf Hermjakob, ISI senior research scientist, and Jonathan May, research associate professor at the USC Viterbi School of Engineering, along with co-authors from the University of Maryland, Princeton University and University of Sydney, found that while Cicero surpasses human players in strategy, it struggles with effective communication, a key element of the game. They used abstract meaning representation to separate in-game tactics from core language. In 24 games involving over 200 hours of play, findings show that Cicero excels in strategy but falls short in deception and persuasion. This reliance on strategy highlights Cicero’s current limitations in achieving true communicative and cooperative AI.

To “Howdy,” or Not to “Howdy,” That Is the Question

The Butterfly Effect of Altering Prompts: How Small Changes and Jailbreaks Affect Large Language Model Performance explores how changing the wording of prompts affects the performance of large language models (LLMs) in labeling data. Co-author Abel Salinas, ISI research assistant, said: “We are relying on these models for so many things, asking for output in certain formats, and wondering in the back of our heads, ‘what effect do prompt variations or output formats actually have?’ So we were excited to finally find out.” Salinas, along with Fred Morstatter, USC Viterbi research assistant professor of computer science, tested different prompt variations and found that even small changes, like adding a space or starting with a friendly greeting can alter the LLM’s responses. They also discovered that specific formatting requests and attempts to bypass content restrictions can significantly impact the labeled data. These findings emphasize the importance of designing prompts carefully to ensure LLMs produce reliable and consistent results.

Not Feeling It: AI’s Emotional Disconnect

In the paper Whose Emotions and Moral Sentiments do Language Models Reflect?, ISI researchers explore how language models (LMs) represent emotional and moral perspectives and their impact on tasks like content moderation and hate speech detection. Unlike previous research on how well LMs mimic social group opinions, this study measures “affective alignment” — how well LMs’ emotional tones match different groups. Comparing 36 LMs’ responses to Twitter messages, researchers found significant mismatches with both liberal and conservative groups, even larger than the U.S. partisan divide. Despite efforts to steer LMs toward specific viewpoints, liberal biases persisted. Co-author Zihao He, a computer science Ph.D. student at USC Viterbi School of Engineering explained, “Aligning emotional responses improves the understanding and acceptance of AI-generated content, but our findings show that inherent biases are deeply entrenched in these models.” This highlights the need to address emotional biases in LMs to ensure fairer and more accurate representations.

Complete list of accepted ISI papers below:

Can LLMs Reason with Rules? Logic Scaffolding for Stress-Testing and Improving LLMs (Main Conference)
Siyuan Wang, Zhongyu Wei, Yejin Choi, Xiang Ren

GPT is Not an Annotator: The Necessity of Human Annotation in Fairness Benchmark Construction (Main Conference)
Virginia K. Felkner, Jennifer A. Thompson, Jonathan May

More Victories, Less Cooperation: Assessing Cicero’s Diplomacy Play (Main Conference)
Wichayaporn Wongkamjan, Feng Gu, Yanze Wang, Ulf Hermjakob, Jonathan May, Brandon M. Stewart, Jonathan K. Kummerfeld, Denis Peskoff, Jordan Lee Boyd-Graber

Relying on the Unreliable: The Impact of Language Models’ Reluctance to Express Uncertainty (Main Conference)
Kaitlyn Zhou, Jena D. Hwang, Xiang Ren, Maarten Sap

Tracking the Newsworthiness of Public Documents (Main Conference)
Alexander Spangher, Emilio Ferrara, Ben Welsh, Nanyun Peng, Serdar Tumgoren, Jonathan May

Tree-of-Traversals: A Zero-Shot Reasoning Algorithm for Augmenting Black-box Language Models with Knowledge Graphs (Main Conference)
Elan Markowitz, Anil Ramakrishna, Jwala Dhamala, Ninareh Mehrabi, Charith Peris, Rahul Gupta, Kai-Wei Chang, Aram Galstyan

Are Machines Better at Complex Reasoning? Unveiling Human-Machine Inference Gaps in Entailment Verification (Findings)
Soumya Sanyal, Tianyi Xiao, Jiacheng Liu, Wenya Wang, Xiang Ren

The Butterfly Effect of Altering Prompts: How Small Changes and Jailbreaks Affect Large Language Model Performance (Findings)
Abel Salinas, Fred Morstatter

Faithful Persona-based Conversational Dataset Generation with Large Language Models (Findings)
Pegah Jandaghi, XiangHai Sheng, Xinyi Bai, Jay Pujara, Hakim Sidahmed

Whose Emotions and Moral Sentiments do Language Models Reflect? (Findings)
Zihao He, Siyi Guo, Ashwin Rao, Kristina Lerman

BotEval: Facilitating Interactive Human Evaluation (Demo)
Hyundong Cho, Thamme Gowda, Yuyang Huang, Zixun Lu, Tianli Tong, Jonathan May

Published on August 12th, 2024

Last updated on August 15th, 2024

Want to write about this story?