ISI at the NAACL ’25 Conference

From planning creatively to aligning AI with human values, USC ISI researchers push the boundaries of natural language processing.

by Julia Cohen

April 29, 2025

At the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL), held April 29-May 4, 2025 in Albuquerque, New Mexico, researchers from USC Viterbi’s Information Sciences Institute (ISI) will present nine papers across a wide range of natural language processing topics, including interpretability, reasoning, creativity, and alignment.

Run by NAACL, which provides a regional focus for members of the Association for Computational Linguistics (ACL) in North, Central and South America, the annual conference is one of the premiere conferences for natural language research. This year, NAACL received a record 3,099 paper submissions—up from last year’s high of 2,434. The acceptance rate for the main conference was 22%, and 37% when including both the main conference and Findings of NAACL.

Research Spotlights 2024

Teaching AI to Plan Like a Writer

Whether it’s a journalist outlining a story or a scientist deciding on experiments to run, planning is a key part of the creative process. But while large language models (LLMs) can produce fluent text, they often struggle with deciding on and performing all the steps that precede writing in a meaningful way. In their NAACL 2025 tutorial, Creative Planning with Language Models: Practice, Evaluation and Applications, ISI research assistants Alexander Spangher and Tenghao Huang, together with co-authors from Microsoft and UCLA, explore how planning has been learned and deployed in creative workflows. The session brings together approaches for learning from complete or partial data, and applying them in creative domains like computational journalism and web-based agents.

“Planning is where we’ve seen some of the most exciting improvements in language modeling in recent months — think about models like OpenAI’s o1, Deepseek’s r1, or other models that ‘think’ before they write,” said Spangher. “Planning models only learn to do their wonders when they have a clearly defined reward — or a signal that tells them that what they’ve generated is good or bad. However, creative contexts are some of the most challenging contexts to learn to plan, because what is considered ‘good’ or ‘bad’ creative outputs is highly subjective. We want to bridge the gap between what humans do when they plan a piece of writing and what models are currently capable of.” Spangher and his co-authors will be presenting the tutorial on Saturday, May 3, 2025.

Why Can’t LLMs Count the Letters in ‘Strawberry’?

Despite their impressive performance on complex tasks like math problem solving and code generation, LLMs often fail at surprisingly simple ones. In their paper, LLM The Genius Paradox: A Linguistic and Math Expert’s Struggle with Simple Word-based Counting Problems, ISI research assistant Nan Xu, a Ph.D. student at the Thomas Lord Department of Computer Science at USC Viterbi School of Engineering, and ISI Lead Scientist Xuezhe Ma investigate why state-of-the-art models struggle with basic word-level counting questions, such as identifying how many times a letter appears in a word. The paper systematically evaluates common explanations, including tokenization, model architecture, and training data coverage, and finds that none fully explain the issue. The authors also test several strategies for improving performance, and show that prompting models to reason step-by-step is the most reliable fix.

Making Alignment with Human Values More Practical

Preference optimization (*PO) has become one of the leading methods for aligning language models with human values, but deploying these techniques often requires time-consuming hyperparameter tuning. In A Practical Analysis of Human Alignment with *PO, ISI Research Assistant Kian Ahrabian and ISI Principal Scientist Jay Pujara, working with collaborators at Microsoft, take a closer look at how these methods perform in real-world settings. “We wanted to move beyond reported performances and ask what alignment methods actually work in practice,” said Ahrabian. The paper compares several leading alignment algorithms across a wide range of conditions, focusing on robustness and practical usability. In parallel, they propose a simple extension to an existing state-of-the-art algorithm, that results in more concise responses without sacrificing quality.

Letting Style Transfer Learn From Its Mistakes — with a Little Help from the Past

Text style transfer challenges models to rewrite content in a new voice—like turning Shakespeare into tweets—while keeping the meaning intact. In Style Transfer with Multi-iteration Preference Optimization (STAMP), ISI Research Engineer Shuai Liu and Principal Scientist Jonathan May take inspiration from an unexpected source: the early days of machine translation.

Instead of training a model all at once, their approach lets it learn step-by-step, improving by studying its own successes and failures. Borrowing a “hope vs. fear” strategy from classic translation work, the model compares examples it wants to create with ones it wants to avoid, which helps it get better with each iteration. STAMP also introduces a smarter way to create training examples and a system for balancing fluency, meaning, and style. The result? Models that beat state-of-the-art baselines on major style transfer benchmarks. “Instead of training a model all at once, we let the model learn from its own successes and failures iteratively,” said Liu. “This encourages the model to produce higher-quality style transfers within its capabilities and avoid generating poor ones it might otherwise tend to produce.”

Complete list of accepted ISI papers below:

Aggregation Artifacts in Subjective Tasks Collapse Large Language Models’ Posteriors
Georgios Chochlakis, Alexandros Potamianos, Kristina Lerman, Shrikanth Narayanan

Creative Planning with Language Models: Practice, Evaluation and Applications (Tutorial)
Alexander Spangher, Tenghao Huang, Philippe Laban, Nanyun Peng

From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning
Nan Xu, Fei Wang, Sheng Zhang, Hoifung Poon, Muhao Chen

KG-LLM-Bench: A Scalable Benchmark for Evaluating LLM Reasoning on Textualized Knowledge Graphs
Elan Markowitz, Krupa Galiya, Greg Ver Steeg, Aram Galstyan

LLM The Genius Paradox: A Linguistic and Math Expert’s Struggle with Simple Word-based Counting Problems
Nan Xu, Xuezhe Ma

Personalized Help for Optimizing Low-Skilled Users’ Strategy
Feng Gu, Wichayaporn Wongkamjan, Jonathan K. Kummerfeld, Denis Peskoff, Jonathan May, Jordan Boyd-Graber

A Practical Analysis of Human Alignment with *PO
Kian Ahrabian, Xihui Lin, Barun Patra, Vishrav Chaudhary, Alon Benhaim, Jay Pujara, Xia Song

Style Transfer with Multi-iteration Preference Optimization
Shuai Liu, Jonathan May

Tuning-Free Personalized Alignment via Trial-Error-Explain In-Context Learning
Hyundong Cho, Karishma Sharma, Nicolaas Jedema, Leonardo F. R. Ribeiro, Alessandro Moschitti, Ravi Krishnan, Jonathan May

Note: Every effort was made to include all ISI-affiliated papers at NAACL25. If your paper was inadvertently left out, please let us know at [email protected] so the list can be updated.

Published on

Last updated on

This article may feature some AI-assisted content for clarity, consistency, and to help explore complex scientific concepts with greater depth and creative range.