NLG Seminars

A weekly meeting of the Natural Language Group. Seminars usually take place on Thursdays from 11:00am until 12:00pm.

Subscribe Add to Calendar

For scheduling a talk, contact the current organizers Katy Felkner and Jon May by emailing to nlg-seminar-host(at)isi.edu.

Visitor Information

Virtual visit: Meeting hosts only admit guests that they know to the Zoom meeting. Hence, you're highly encouraged to use your USC account to sign into Zoom. If you're an outside visitor, please inform us at (nlg-seminar-host(at)isi.edu.) beforehand so we'll be aware of your attendance and let you in.

In-person visit: Outside visitors should go to the tenth floor lobby where they will be met and escorted to the appropriate location 5 minutes before the talk. For further directions on how to get to ISI, please click here.

Click on the titles to view the abstracts/time/location/online meeting link:

Date	Speaker	Title
April 10, 2025	Tolúlọpẹ́ Ògúnrẹ̀mí	An Investigation of Intermediate Representations in Spoken Language Models Abstract: Spoken language models, large language models trained to process speech and audio inputs by leveraging speech encoder representations, have rapidly increased in popularity as a new modelling approach to speech processing tasks. These models train modality adapters to adapt speech encoder output into language model input. In this work, we use CommonVoice and FLEURS automatic speech recognition (ASR) data in several languages to investigate the output of the modality adapter of spoken language models. We introduce an algorithm to determine whether the modality adapter output resembles a transcription, transliteration or a semantic representation of the speech. We also find that the representation of a language in the language model affects the modality adapter output and transcription abilities of the spoken language models. Bio: Tolúlọpẹ́ Ògúnrẹ̀mí is a Computer Science PhD candidate at Stanford University in the Stanford NLP Group. Her work focusses on speech and language processing for low-resource languages, currently African languages. Her research combines linguistic investigations of these languages and community-based projects that integrate the concerns of local language communities with technological advances. Before, she did a Masters in Speech and Language Processing at the University Edinburgh. Talk Details: here
April 17, 2025	Hao Zhu	TBA Abstract: TBA Bio: TBA Talk Details: TBA
May 1, 2025	Jay Huang	TBA Abstract: TBA Bio: TBA Talk Details: TBA

View seminars and events from

Click on the titles to view the abstracts/time/location/online meeting link:

Date	Speaker	Title
March 27, 2025	Julie Kallini	MrT5: Dynamic Token Merging for Efficient Byte-level Language Models Abstract: Models that rely on subword tokenization have significant drawbacks, such as sensitivity to character-level noise like spelling errors and inconsistent compression rates across different languages and scripts. While character- or byte-level models like ByT5 attempt to address these concerns, they have not gained widespread adoption—processing raw byte streams without tokenization results in significantly longer sequence lengths, making training and inference inefficient. This work introduces MrT5 (MergeT5), a more efficient variant of ByT5 that integrates a token deletion mechanism in its encoder to dynamically shorten the input sequence length. After processing through a fixed number of encoder layers, a learned delete gate determines which tokens are to be removed and which are to be retained for subsequent layers. MrT5 effectively “merges” critical information from deleted tokens into a more compact sequence, leveraging contextual information from the remaining tokens. In continued pre-training experiments, we find that MrT5 can achieve significant gains in inference runtime with minimal effect on performance, as measured by bits-per-byte. Additionally, with multilingual training, MrT5 adapts to the orthographic characteristics of each language, learning language-specific compression rates. Furthermore, MrT5 shows comparable accuracy to ByT5 on downstream evaluations such as XNLI, TyDi QA, and character-level tasks while reducing sequence lengths by up to 75%. Our approach presents a solution to the practical limitations of existing byte-level models. Bio: Julie Kallini is a second-year Ph.D. student in Computer Science at Stanford University, advised by Christopher Potts and Dan Jurafsky. Her research focuses on natural language processing (NLP), with an emphasis on computational linguistics/cognitive science, tokenization, and model architecture. Her paper, "Mission: Impossible Language Models," won Best Paper Award at ACL 2024. Her work is supported by the NSF Graduate Research Fellowship, the Stanford School of Engineering Graduate Fellowship, and the Stanford EDGE Fellowship. Before starting her Ph.D., Julie was a software engineer at Meta, where she worked on machine learning for advertisements. Julie graduated summa cum laude from Princeton University with a B.S.E. in Computer Science and a minor in Linguistics. Talk Details: here
March 13, 2025	Arjun Subramonian	From “Democratization” to Personal Names: Reimagining NLP Practices Towards Justice Abstract: Current natural language processing (NLP) practices operate within a set of logics which codify new, and entrench existing, social inequalities and power dynamics. In this talk, I will delve into two troubling NLP practices: the discussion of “democratizing” language technologies and the association of personal names with sociodemographic characteristics. I will reveal how current use of the term “democratization” in NLP can be inconsistent and irresponsible, which risks misrepresenting the distribution of power in and public control of AI; I will further provide recommendations to strengthen progress towards democratic technologies beyond just superficial access. Furthermore, I will survey the issues inherent to associating personal names with sociodemographic attributes, covering problems of validity (e.g., systematic error, construct validity) and ethical concerns (e.g., harms, differential impact, cultural insensitivity). Then, I will offer guiding questions along with normative recommendations to avoid these pitfalls. Ultimately, constructively examining NLP practices through a critical lens is important for advancing justice in the field. Bio: Arjun Subramonian is a Computer Science PhD candidate at the University of California, Los Angeles. Their research focuses on the fairness and ethics of machine learning and natural language processing. They are further a core organizer of Queer in AI. They are a recipient of an Amazon Fellowship, NSF NRT Fellowship, Eugene V. Cota-Robles Fellowship, and FAccT 2023 Best Paper Award. Talk Details: here
February 27, 2025	Justin Cho	Contextualization for Human-AI Interactions Abstract: Recent developments in AI are nothing short of amazing, but goal posts move, and we quickly discover that AI remains insufficient for fulfilling many real world tasks. The shortcoming can be largely attributed to a lack of contextual understanding on the AI’s part. This is not surprising given that the dominant training and evaluation paradigm for AI models prioritizes scale and rapid progress. As a result, we’ve developed a bias for textual data, instruction data with transactional interactions, aggregated and simple preference data, and evaluation tasks that can be easily verified. In this talk, I present research that explores the opposite side of this bias for enabling more useful and contextualized human-AI interactions. Specifically, I introduce three research directions to demonstrate that utility is a function of context and that teaching an AI model to understand the specific context of its interaction with humans is crucial for successful outcomes. (1) How an interaction takes places: human-AI interactions will expand beyond communicating with a textual interface, such as speech. I present how we can adapt language models for speech-based interactions with literature-guided prompts and speech-based preference data. (2) Who the user is: every user is different and sparingly share their data. I demonstrate how we can align language models to individual users without any fine-tuning and using small amount of per-user data. (3) What is the goal: complex tasks require evaluations that take a more holistic approach that goes beyond the immediate model response. Through a case study of using language models as content moderators, I argue that evaluations for complex tasks need to account for each group of stakeholders as the perceived effectiveness of language models vary significantly among them. Bio: Justin Cho is a PhD Candidate at University of Southern California advised by Jonathan May. His research has centered around refining the context in which language models are involved in human-AI interactions, such as enhancing dialogue models with grounding techniques and understanding of the interaction modality, personalizing language model outputs, and applying them for social good. He has previously led USC’s team into the semifinals for the fourth Alexa Prize Socialbot Grand Challenge, co-organized the Conversational AI workshop at ICML 2024, and interned at Meta’s Conversational AI team, Amazon Alexa, and Amazon AGI. Talk Details: here
January 23, 2025	Mozhdeh Gheini	Inductive Biases for Data- and Parameter-Efficient Transfer Learning Abstract: Data- and resource-intensive pre-training and fine-tuning applied upon Transformer-based models is the dominant paradigm at the forefront of rapid advancements in natural language processing, human language technologies, and most notably, large language models. Such reliance on massive amounts of data, computation, and energy, while effective and impressive from a performance-only perspective, can hinder open, nonexclusive, and sustainable development of these technologies. In this talk, we present how certain inductive biases can be devised to adjust current natural language methods under resource-constrained scenarios and provide insights into why the proposed inductive biases are successful in such cases. Specifically, we discuss four research directions on data and parameter efficiency of fine-tuning and transfer learning in natural language processing: (1) a universal regimen that creates a single pre-trained checkpoint suitable for machine translation transfer to practically any language pair and eliminates the need for ad hoc pre-training; (2) an architecture-guided parameter-efficient fine-tuning method that performs competitively with full fine-tuning while exclusively updating cross-attention parameters; (3) an analysis of Mega, a recently introduced augmentation of the Transformer architecture to incorporate explicit recency bias, through the lens of transfer learning; and (4) a meta-learning algorithm to prime pre-trained models for specific fine-tuning strategies. Bio: Mozhdeh "Mo" Gheini is a PhD candidate at the University of Southern California advised by Jonathan May. Her PhD focus has been on investigating different inductive biases to build data- and parameter-efficient methods for transfer learning for natural language processing tasks like machine translation and beyond. She has also spent three summers interning with Apple, where she will be joining again in February as Machine Learning Research Engineer. Talk Details: here
January 9, 2024	Alex Spangher	Planning in Creative Contexts Abstract: The use of AI in human-centered creative tasks — such as journalism, scientific writing, and storytelling — has showcased their potential for assistance but highlighted a critical gap: planning. “Planning” describes actions performed before (and during) human workflows; “creative” refers to tasks humans execute where the rewards are not clearly defined. I will focus on tasks related to journalism, with specific focus on retrieving a set of sources relevant to a news story. We will show that suggestions made by current AI models do not align with decisions made by humans, and we will show methods for increasing alignment with humans. I will outline a research agenda based on this work to apply such approaches to novel creative tasks. Bio: Alexander Spangher is pursuing his PhD in computer science at the University of Southern California; he is formerly a writer and data scientist at the New York Times. He focuses on computational journalism and is advised by Jonathan May, Emilio Ferrara and Nanyun Peng. His research is broad and has pursued the following side directions: he has worked at Microsoft Research under the mentorship of Eric Horvitz to detect misinformation. He has collaborated with EleutherAI to build state-of-the-art symbolic music models. Finally, he has collaborated with the MIT Plasma Science and Fusion Center (PFSC) to model disruptions in nuclear fusion reactions. His work has received numerous awards: 2 Outstanding Paper Awards at EMNLP 2024, 1 Spotlight Award at ICML 2024, and an Outstanding Paper Award at NAACL 2022. He is fortunate to be supported by a 4-year Bloomberg PhD Fellowship. Talk Details: here
December 12, 2024	Rebecca Dorn	Harmful Speech Detection by Language Models Exhibits Gender-Queer Dialect Bias Abstract: Content moderation on social media platforms shapes the dynamics of online discourse, influencing whose voices are amplified and whose are suppressed. Recent studies have raised concerns about the fairness of content moderation practices, particularly for aggressively flagging posts from transgender and non-binary individuals as toxic. In this study, we investigate the presence of bias in harmful speech classification of gender-queer dialect online, focusing specifically on the treatment of reclaimed slurs. We introduce a novel dataset, QueerReclaimLex, based on 109 curated templates exemplifying non-derogatory uses of LGBTQ+ slurs. Dataset instances are scored by gender-queer annotators for potential harm depending on additional context about speaker identity. We systematically evaluate the performance of five off-the-shelf language models in assessing the harm of these texts and explore the effectiveness of chain-of-thought prompting to teach large language models (LLMs) to leverage author identity context. We reveal a tendency for these models to inaccurately flag texts authored by gender-queer individuals as harmful. Strikingly, across all LLMs the performance is poorest for texts that show signs of being written by individuals targeted by the featured slur (F1 ≤ 0.24). We highlight an urgent need for fairness and inclusivity in content moderation systems. By uncovering these biases, this work aims to inform the development of more equitable content moderation practices and contribute to the creation of inclusive online spaces for all users. Bio: Rebecca Dorn is a PhD candidate at the University of Southern California's Information Science Institute where they are co-advised by Kristina Lerman and Fred Morstatter. Previously, they earned their B.S. in Computer Science at UC Santa Cruz, advised by Lise Getoor. Their research focuses on the intersection between AI fairness, natural language processing and computational social science. Lately, their focus has surrounded how NLP systems treat dialects of historically marginalized communities. Talk Details: here
December 5, 2024	Zihao He	The Duality of Bias in Large Language Models: Leveraging Community Perspectives and Uncovering Ideological Vulnerabilities Abstract: Large language models (LLMs) have demonstrated remarkable capabilities in understanding and generating human-like text. As these models become increasingly integrated into various applications, it is crucial to understand their potential for both beneficial and problematic impacts on society. In this talk, I will explore the dual nature of bias in LLMs through two recent studies that employ similar methodologies but reveal contrasting implications. First, I will discuss COMMUNITY-CROSS-INSTRUCT, an innovative framework that aligns LLMs with online community perspectives to create “digital twins” for efficient public opinion analysis. Then, I will present findings on LLMs’ susceptibility to ideological influences through targeted instruction tuning. By examining these complementary perspectives, I aim to showcase the innovative potential of LLMs in social science research while also highlighting the importance of understanding their malleability. This presentation will contribute to the ongoing dialogue on responsible AI development, illustrating how careful application of LLM capabilities can lead to valuable insights while also emphasizing the need for awareness of their limitations and vulnerabilities. Bio: Zihao He is a final-year PhD candidate in computer science at University of Southern California (USC). He is advised by Prof. Kristina Lerman. His research interests lie at the intersection of natural language processing and computational social science. Specifically, Zihao has been focusing on evaluating the societal impacts of large language models (LLMs) and investigating their vulnerability to ideological influences. His work has been published in top-tier conferences like ACL, EMNLP, and ICWSM. Previously, Zihao received his undergraduate degree from Beijing University of Posts and Telecommunications (BUPT). He spent one year of master’s studies at Tsinghua University. He has interned at TikTok, Amazon, and DiDi Global. Talk Details: https://www.isi.edu/events/5246/the-duality-of-bias-in-large-language-models-leveraging-community-perspectives-and-uncovering-ideological-vulnerabilities/
November 21, 2024	Da Yin	Weakly Supervised Learning for Adaptive LLM Agents Abstract: LLM agents are revolutionizing complex task-solving through multi-step planning, reasoning, and real-world or simulated interactions. However, their adaptability to unseen tasks and environments remains a challenge, especially with limited training resources. In this talk, I will first introduce Agent Lumos (ACL 2024), a foundational framework for training general-purpose, open-source LLM agents that enables better generalization across domains, by the unified training over the trajectories converted from the ubiquitous, unstructured annotated reasoning rationales. I will also discuss Trial and Error (ACL 2024) and Q* Agent, which foster self-exploration, and collect trajectories for preference optimization and process reward modeling based on environmental feedback. Finally, I will outline future directions, including agent critique and world models, to enhance LLM adaptability with minimal effort. Bio: Da Yin is a final-year PhD student in Computer Science at UCLA, advised by Prof. Kai-Wei Chang, working in the UCLA NLP lab. He was awarded Amazon PhD Fellowship and Best Paper Award at EMNLP Pan-DL workshop in 2023. He was also the co-organizer of 1st ACL MML workshop, publicity chair of 4th SocalNLP Symposium, and area chair at ACL ARR from 2023. His research interest is building generalizable, adaptive, and inclusive language processing models that can be applied across applications and regions. Talk Details: here
November 7, 2024	Jaspreet Ranjit	OATH-Frames: Characterizing Online Attitudes towards Homelessness with LLM Assistants Abstract: Public attitudes towards key societal issues, expressed on online media, are of immense value in policy and reform efforts, yet challenging to understand at scale. We study one such social issue: homelessness in the U.S., by leveraging the remarkable capabilities of large language models to assist social work experts in analyzing millions of posts from Twitter. We introduce a framing typology: Online Attitudes Towards Homelessness (OATH) Frames: nine hierarchical frames capturing critiques, responses and perceptions. We release annotations with varying degrees of assistance from language models, with immense benefits in scaling: 6.5× speedup in annotation time while only incurring a 3 point F1 reduction in performance with respect to the domain experts. Our experiments demonstrate the value of modeling OATH-Frames over existing sentiment and toxicity classifiers. Our large-scale analysis with predicted OATH-Frames on 2.4M posts on homelessness reveal key trends in attitudes across states, time periods and vulnerable populations, enabling new insights on the issue. Our work provides a general framework to understand nuanced public attitudes at scale, on issues beyond homelessness. Bio: Jaspreet Ranjit is a third-year Computer Science PhD student at the University of Southern California, advised by Professor Swabha Swayamdipta in the DILL Lab and also a Student Leader of the Center for AI in Society. Her research interests lie in investigating to what extent language models can help us understand sensitive societal issues (i.e. homelessness, suicide interventions) by exploring collaborative settings between social science experts and generative models. Previously, she earned her M.S. and B.S. degree from the University of Virginia in Computer Science as a Rodman Scholar. . Talk Details: here
October 31, 2024	Ziyi Liu	InterIntent: Investigating Social Intelligence of LLMs via Intention Understanding in a Game context Abstract: Large language models (LLMs) have demonstrated the potential to mimic human social intelligence. However, most studies focus on simplistic and static self-report or performance-based tests, which limits the depth and validity of the analysis. In this paper, we developed a novel framework, INTERINTENT, to assess LLMs’ social intelligence by mapping their ability to understand and manage intentions in a game setting. We focus on four dimensions of social intelligence: situational awareness, self-regulation, self-awareness, and theory of mind. Each dimension is linked to a specific game task: intention selection, intention following, intention summarization, and intention guessing. Our findings indicate that while LLMs exhibit high proficiency in selecting intentions, achieving an accuracy of 88%, their ability to infer the intentions of others is significantly weaker, trailing human performance by 20%. Additionally, game performance correlates with intention understanding, highlighting the importance of the four components towards success in this game. These findings underline the crucial role of intention understanding in evaluating LLMs’ social intelligence and highlight the potential of using social deduction games as a complex testbed to enhance LLM evaluation. INTERINTENT contributes a structured approach to bridging the evaluation gap in social intelligence within multiplayer games. Bio: Ziyi Liu is a second-year PhD student at the University of Southern California, advised by Professor Jieyu Zhao in LIME Lab. Previously, she earned her master’s degree at USC and was a Research Assistant in USC ISI’s Ink Lab for two years under the guidance of Professor Xiang Ren. Her research focuses on social intelligence and hallucination detection in human-LLM interactions, particularly in evaluating LLM behaviors and aligning LLM values with those of humans. Her work is driven by two key questions: (1) How can we make interactions between models and humans more seamless? (2) How can we ensure the faithfulness of LLMs and avoid hallucinations during interactions? . Talk Details: here
October 24, 2024	Julie Kallini	Mission: Impossible Language Models Abstract: Chomsky and others have very directly claimed that large language models (LLMs) are equally capable of learning languages that are possible and impossible for humans to learn. However, there is very little published experimental evidence to support such a claim. Here, we develop a set of synthetic impossible languages of differing complexity, each designed by systematically altering English data with unnatural word orders and grammar rules. These languages lie on an impossibility continuum: at one end are languages that are inherently impossible, such as random and irreversible shuffles of English words, and on the other, languages that may not be intuitively impossible but are often considered so in linguistics, particularly those with rules based on counting word positions. We report on a wide range of evaluations to assess the capacity of GPT-2 small models to learn these uncontroversially impossible languages, and crucially, we perform these assessments at various stages throughout training to compare the learning process for each language. Our core finding is that GPT-2 struggles to learn impossible languages when compared to English as a control, challenging the core claim. More importantly, we hope our approach opens up a productive line of inquiry in which different LLM architectures are tested on a variety of impossible languages in an effort to learn more about how LLMs can be used as tools for these cognitive and typological investigations. Bio: Julie Kallini is a second-year Computer Science Ph.D. student at Stanford University advised by Christopher Potts and Dan Jurafsky. Her research spans several topics in natural language processing, including computational linguistics, cognitive science, interpretability, and model architecture. Julie's work is generously supported by the NSF Graduate Research Fellowship, the Stanford School of Engineering Graduate Fellowship, and the Stanford EDGE Fellowship. . Talk Details: here Recording: here
September 26, 2024	Lee Kezar	Modeling American Sign Language via Linguistic Knowledge Infusion Abstract: As language technologies rapidly gain popularity and utility, many of the 70 million deaf and hard-of-hearing people who prefer a sign language are left behind. While NLP research into American Sign Language (ASL) is gaining popularity, we continue to face serious challenges like data scarcity and low engagement with ASL users and experts. This presentation will cover how ASL models strongly benefit from neuro-symbolically learning the linguistic structure of signs, yielding gains with respect to their data efficiency, explainability, and generalizability. Concretely, we show that phonological, morphological, and semantic knowledge "infusion" can increase sign recognition accuracy by 30%, enable few- and zero-shot sign understanding, reduce sensitivity to signer demographics, and address longstanding research questions in sign language phonology and language acquisition. Bio: Lee Kezar (he/they) is fifth-year Ph.D. candidate in the USC Viterbi School of Engineering, advised by Jesse Thomason in the Grounding Language in Actions, Multimodal Observations, and Robotics (GLAMOR) Lab. Their research blends computational, linguistic, and psychological models of ASL to increase access to language technologies and advance theoretical perspectives on signing and co-speech gesture. . Talk Details: here Recording (includes closed captioning and ASL interpreter): here
May 9, 2024	Tanmay Parekh	Event Extraction for Epidemic Prediction Abstract: Early warnings and effective control measures are among the most important tools for policymakers to be prepared against the threat of any epidemic. Social media is an important information source here, as it is more timely than other alternatives like news and public health and is publicly accessible. Given the sheer volume of daily social media posts, there is a need for an automated system to monitor social media to provide early and effective epidemic prediction. To this end, I introduce two works to aid the creation of such an automated system using information extraction. In my first work, we pioneer exploiting Event Detection (ED) for better preparedness and early warnings of any upcoming epidemic by developing a framework to extract and analyze epidemic-related events from social media posts. We curate an epidemic event ontology comprising seven disease-agnostic event types and construct a Twitter dataset SPEED focused on the COVID-19 pandemic. Experimentation reveals how ED models trained on COVID-based SPEED can effectively detect epidemic events for three unseen epidemics of Monkeypox, Zika, and Dengue. Furthermore, we show that reporting sharp increases in the extracted events by our framework can provide warnings 4-9 weeks earlier than the WHO epidemic declaration for Monkeypox. Since epidemics can originate across the globe, social media posts discussing them can be in varied languages. However, training supervised models on every language is a tedious and resource-expensive task. The alternative is the usage of zero-shot cross-lingual models. In this work, we introduce a new approach for label projection that can be used to generate synthetic training data in any language using the translate-train paradigm. This novel approach, CLaP, translates text to the target language and performs contextual translation on the labels using the translated text as the context, ensuring better accuracy for the translated labels. We leverage instruction-tuned language models with multilingual capabilities as our contextual translator, imposing the constraint of the presence of translated labels in the translated text via instructions. We benchmark CLaP with other label projection techniques on zero-shot cross-lingual transfer across 39 languages on two representative structured prediction tasks — event argument extraction (EAE) and named entity recognition (NER), showing over 2.4 F1 improvement for EAE and 1.4 F1 improvement for NER. Bio:Tanmay Parekh is a third-year PhD student in Computer Science at the University of California Los Angeles (UCLA). He is advised by Prof. Nanyun Peng and Prof. Kai-Wei Chang. Previously, he completed his Masters at the Language Technologies Institute at Carnegie Mellon University (CMU) where he worked with Prof. Alan Black and Prof. Graham Neubig. He has completed his undergraduate studies at the Indian Institute of Technology Bombay (IITB). He has also worked in the industry at Amazon and Microsoft. He has worked on a wide range of research topics in multilingual, code-switching, controlled generation, and speech technologies. His current research focuses on improving the utilization and generalizability of Large Language Models (LLMs) for applications in Information Extraction (specifically Event Extraction) across various languages and domains. . Talk Details: https://www.isi.edu/events/4885/nl-seminar-how-to-steal-chatgpts-embedding-size-and-other-low-rank-logit-tricks/
April 25, 2024	Matthew Finlayson	How to Steal ChatGPT’s Embedding Size, and Other Low-rank Logit Tricks Abstract: The commercialization of large language models (LLMs) has led to the common practice of restricting access to proprietary models via a limited API. In this work we show that, with only a conservative assumption about the model architecture, it is possible to learn a surprisingly large amount of non-public information about an API-protected LLM from a relatively small number of API queries (e.g., costing under $1000 USD for OpenAI’s gpt-3.5-turbo). Our findings are centered on one key observation: most modern LLMs suffer from a softmax bottleneck, which restricts the model outputs to a linear subspace of the full output space. We exploit this fact to unlock several capabilities, including (but not limited to) obtaining cheap full-vocabulary outputs, auditing for specific types of model updates, identifying the source LLM given a single full LLM output, and even efficiently discovering the LLM’s hidden size. Our empirical investigations show the effectiveness of our methods, which allow us to estimate the embedding size of OpenAI’s gpt-3.5-turbo to be about 4096. Lastly, we discuss ways that LLM providers can guard against these attacks, as well as how these capabilities can be viewed as a feature (rather than a bug) by allowing for greater transparency and accountability. Bio:Matthew Finlayson is a PhD student studying NLP at the University of Southern California. Previously he was a predoctoral researcher at the Allen Institute for AI (AI2) after completing his bachelors degree in computer science and linguistics at Harvard University. Matthew is interested in the practical consequences of the architectural design of language models, from security to generation, as well as understanding how language models learn and generalize from data. . Talk Details: https://www.isi.edu/events/4885/nl-seminar-how-to-steal-chatgpts-embedding-size-and-other-low-rank-logit-tricks/
April 18, 2024	Oliver Liu	DeLLMa: A Framework for Decision Making Under Uncertainty with Large Language Models Abstract: Large language models (LLMs) are increasingly used across society, including in domains like business, engineering, and medicine. These fields often grapple with decision-making under uncertainty, a critical yet challenging task. In this paper, we show that directly prompting LLMs on these types of decision-making problems yields poor results, especially as the problem complexity increases. To overcome this limitation, we propose DeLLMa (Decision-making Large Language Model assistant), a framework designed to enhance decision-making accuracy in uncertain environments. DeLLMa involves a multi-step scaffolding procedure, drawing upon principles from decision theory and utility theory, to provide an optimal and human-auditable decision-making process. We validate our framework on decision-making environments involving real agriculture and finance data. Our results show that DeLLMa can significantly improve LLM decision-making performance, achieving up to a 40% increase in accuracy over competing methods. Bio:Ollie Liu (https://ollieliu.com/) is second-year Ph.D student in Computer Science at University of Southern California, co-advised by Prof. Dani Yogatama and Prof. Willie Neiswanger. In life, I usually go by Oliver 🫒 My current research interests lie in (multimodal) foundation models, especially their algorithmic reasoning capabilities and applications in sciences. . Talk Details: https://www.isi.edu/events/4875/nl-seminar-dellma-a-framework-for-decision-making-under-uncertainty-with-large-language-models/
April 4, 2024	Kevin Knight	30 Years of Perplexity Abstract: NLP scientists have been trying for decades to accurately predict the next word in running text. Why were we so determined to succeed at this strange task? How did we track our successes (and failures)? Why was word prediction at the center of early statistical work in text compression, machine translation, and speech recognition? Will it lead to artificial general intelligence (AGI) in the 2020s? I'll attempt to answer these questions with anecdotes drawn from three decades of research in NLP, text compression, and code-breaking. Bio: Dr. Kevin Knight served on the faculty of the University of Southern California (26 years), as Chief Scientist at Language Weaver, Inc. (9 years), and as Chief Scientist for Natural Language Processing at Didi Global (4 years). He received a PhD in computer science from Carnegie Mellon University and a bachelor's degree from Harvard University. Dr. Knight's research interests include machine translation, natural language generation, automata theory, decipherment of historical documents, and number theory. He has co-authored over 150 research papers on natural language processing, as well as the widely adopted textbook "Artificial Intelligence" (McGraw-Hill). Dr. Knight served as President of the Association for Computational Linguistics (ACL) in 2011, as General Chair for ACL in 2005, as General Chair for the North American ACL (NAACL) in 2016, and as co-program chair for the inaugural Asia-Pacific ACL (2020). He received an Outstanding Paper Award at NAACL 2018, and Test-of-Time awards at ACL 2022 and ACL 2023. He is a Fellow of the ACL, the Association for the Advancement of Artificial Intelligence (AAAI), and the USC Information Sciences Institute (ISI). . Talk Details: https://www.isi.edu/events/4667/30-years-of-perplexity/
March 28, 2024	Shivanshu Gupta	Informative Example Selection for In-Context Learning Abstract: In-context Learning (ICL) uses large language models (LLMs) for new tasks by conditioning them on prompts comprising a few task examples. With the rise of LLMs that are intractable to train or hidden behind APIs, the importance of such a training-free interface cannot be overstated. However, ICL is known to be critically sensitive to the choice of in-context examples. Despite this, the standard approach for selecting in-context examples remains to use general-purpose retrievers due to the limited effectiveness and training requirements of prior approaches. In this talk, I’ll posit that good in-context examples demonstrate the salient information necessary to solve a given test input. I’ll present efficient approaches for selecting such examples, with a special focus on preserving the training-free ICL pipeline. Through results with a wide range of tasks and LLMs, I’ll demonstrate that selecting informative examples can indeed yield superior ICL performance. Bio: Shivanshu Gupta is a Computer Science Ph.D. Candidate at the University of California Irvine, advised by Sameer Singh. Prior to this, he was a Research Fellow at LinkedIn and Microsoft Research India, and completed his B.Tech. and M.Tech. in Computer Science at IIT Delhi. His primary research interests are systematic generalization, in-context learning, and multi-step reasoning capabilities of large language models. . Talk Details: https://www.isi.edu/events/4638/informative-example-selection-for-in-context-learning/
March 21, 2024	Anthony Chen	The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI Abstract: The arms race to train language models on vast, diverse, and inconsistently documented datasets has raised pressing concerns about the legal and ethical risks for practitioners. To remedy these practices threatening data transparency and understanding, we introduce the Data Provenance Initiative, a multi-disciplinary effort between legal and machine learning experts to systematically audit and trace 1800+ text datasets. We develop tools and standards to trace the lineage of these datasets, from their source, creators, series of license conditions, properties, and subsequent use. Our landscape analysis highlights the sharp divides in composition and focus of commercially open vs closed datasets, with closed datasets monopolizing important categories: lower resource languages, more creative tasks, richer topic variety, newer and more synthetic training data. Bio: Anthony Chen is an engineer at Google DeepMind doing research on factuality and long-context language models. He received his PhD from UC Irvine last year where he focused on generative evaluation and factuality in language models. Talk Details: https://www.isi.edu/events/4398/nl-seminar-the-data-provenance-initiative-a-large-scale-audit-of-dataset-licensing-attribution-in-ai/
March 18, 2024	Sky C.H. Wang	Do Androids Know They're Only Dreaming of Electric Sheep? Abstract: We design probes trained on the internal representations of a transformer language model that are predictive of its hallucinatory behavior on in-context generation tasks. To facilitate this detection, we create a span-annotated dataset of organic and synthetic hallucinations over several tasks. We find that probes trained on the force-decoded states of synthetic hallucinations are generally ecologically invalid in organic hallucination detection. Furthermore, hidden state information about hallucination appears to be task and distribution-dependent. Intrinsic and extrinsic hallucination saliency varies across layers, hidden state types, and tasks; notably, extrinsic hallucinations tend to be more salient in a transformer's internal representations. Outperforming multiple contemporary baselines, we show that probing is a feasible and efficient alternative to language model hallucination evaluation when model states are available. Bio: Sky is a Ph.D. candidate in Computer Science at Columbia University advised by Zhou Yu and Smaranda Muresan. His research primarily revolves around Natural Language Processing (NLP), with broad interests in the area where NLP meets Computational Social Science (CSS). Here, his research primarily revolves around three major areas: (1) revealing and designing for social difference and inequality, (2) cross-cultural NLP, and (3) mechanistic interpretability. His research is supported by a NSF Graduate Research Fellowship and has received two outstanding paper awards at EMNLP. He has previously been an intern at Microsoft Semantic Machines, Google Research, and Amazon AWS AI. . Talk Details: https://www.isi.edu/events/4396/nl-seminar-do-androids-know-theyre-only-dreaming-of-electric-sheep/
March 7, 2024	Zixiang Chen	Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models Abstract: Harnessing the power of human-annotated data through Supervised Fine-Tuning (SFT) is pivotal for advancing Large Language Models (LLMs). In this talk, I will introduce our newest fine-tuning method, Self-Play Fine-Tuning (SPIN), which improves LLMs without the need for additional human-annotated data. SPIN utilizes a self-play mechanism, where the LLM enhances its capabilities by generating its own training data through interactions with instances of itself. Specifically, the LLM generates its own training data from its previous iterations, refining its policy by discerning these self-generated responses from those obtained from human-annotated data. As a result, SPIN unlocks the full potential of human-annotated data for SFT. Our empirical results show that SPIN can improve the LLM's performance across a variety of benchmarks and even outperform models trained through direct preference optimization (DPO) supplemented with extra GPT-4 preference data. Additionally, I will outline the theoretical guarantees of our method. For more details and access to our codes, visit our GitHub repository (https://github.com/uclaml/SPIN). Bio: Zixiang Chen is currently a Ph.D. student in computer science at the Department of Computer Science, University of California, Los Angeles (UCLA), advised by Prof. Quanquan Gu. He obtained his bachelor’s degree in mathematics from Tsinghua University. He is broadly interested in the theory and applications of deep learning, optimization, and control, with a focus on generative models, representation learning, and multi-agent reinforcement learning. Recently, he has been utilizing AI to enhance scientific discovery in the domain of public health. He was a visiting graduate student in the theory of reinforcement learning program at the Simons Institute for the Theory of Computing. Talk Details: https://www.isi.edu/events/4400/nl-seminar-self-play-fine-tuning-converts-weak-language-models-to-strong-language-models/
February 22, 2024	Yihan Wang	Red Teaming Language Model Detectors with Language Models Abstract:The prevalence and strong capability of large language models (LLMs) present significant safety and ethical risks if exploited by malicious users. To prevent the potentially deceptive usage of LLMs, recent works have proposed algorithms to detect LLM-generated text and protect LLMs. In this paper, we investigate the robustness and reliability of these LLM detectors under adversarial attacks. We study two types of attack strategies: 1) replacing certain words in an LLM's output with their synonyms given the context; 2) automatically searching for an instructional prompt to alter the writing style of the generation. In both strategies, we leverage an auxiliary LLM to generate the word replacements or the instructional prompt. Different from previous works, we consider a challenging setting where the auxiliary LLM can also be protected by a detector. Experiments reveal that our attacks effectively compromise the performance of all detectors in the study with plausible generations, underscoring the urgent need to improve the robustness of LLM-generated text detection systems. This talk may also introduce some of our other recent works on trustworthy and ethical LLMs. Bio: Yihan is a PhD candidate in the department of Computer Science at UCLA advised by Prof. Cho-Jui Hsieh. Her research interest lies in trustworthy and generalizable machine learning. She is one of the recipients of 2023 UCLA-Amazon Fellowship. More detail can be found at https://yihanwang617.github.io. Talk Details: https://www.isi.edu/events/4392/nl-seminar-red-teaming-language-model-detectors-with-language-models/
February 1, 2024	Yufei Tian	Harnessing Black-Box Control to Boost Commonsense in LM's Generation Abstract: Large language models like Alpaca and GPT-3 generate coherent texts but sometimes lack commonsense, yet improving their commonsense via fine-tuning is resource expensive in terms of both data and computation. In this talk, I’ll present BOOST, a resource-efficient framework that steers a frozen Pre-Trained Language Model (PTLM) towards more reasonable outputs. This involves creating an interpretable and reference-free evaluator that assigns a sentence with a commonsensical score which grounds the sentence to a dynamic commonsense knowledge base. Using this evaluator as a guide, we extend the NADO controllable generation method to train an auxiliary head that improves the PTLM’s output. Our framework was tested on various language models, including GPT-2, Flan-T5, and Alpaca-based models. On two constrained concept-to-sentence benchmarks, human evaluation results show that BOOST consistently generates the most commonsensical content. Finally, I will demonstrate how ChatGPT outputs are different from and sometimes less favored than our outputs. Bio: Yufei Tian is a CS PhD student at UCLA advised by Prof. Nanyun (Violet) Peng. Her research is centered around creative and controllable text generation, machine reasoning and its interaction with cognitive science, as well as designing evaluation metrics for open-ended NLG tasks. She is supported by the UCLA-Amazon fellowship program. Talk Details: https://www.isi.edu/events/4386/nl-seminar-harnessing-black-box-control-to-boost-commonsense-in-lms-generation/
January 25, 2024	Ulf Hermjakob	𓂋𓏤𓈖𓆎𓅓𓏏𓊖 An Introduction to Egyptian from Hieroglyphs to Coptic Abstract: The Egyptian language, with its written history of more than 5,000 years, continues to fascinate people, especially with its hieroglyphs and its complex system of logograms, phonograms and determinatives. This introduction will focus on hieroglyphs, but also cover later scripts (Hieratic, Demotic, Coptic), the decryption of hieroglyphs 200 years ago, samples from Ancient Egyptian texts, and a few linguistically interesting tidbits. The Getty Villa has two Egyptian exhibitions on offer, The Egyptian Book of the Dead (until January 29) and Sculpted Portraits from Ancient Egypt (opening January 24). The ISI CuteLabName (NLP) group will visit the Getty Villa on Saturday afternoon (Jan. 27). Bio: Ulf is a senior research scientist and computational linguist in the Natural Language Group at ISI, working on a wide range of languages. He has a Ph.D. in computer science from the University of Texas at Austin. Talk Details: https://www.isi.edu/events/4394/%F0%93%82%8B%F0%93%8F%A4%F0%93%88%96%F0%93%86%8E%F0%93%85%93%F0%93%8F%8F%F0%93%8A%96-an-introduction-to-egyptian-from-hieroglyphs-to-coptic/

Information Sciences Institute

Natural Language Group

Visitor Information

Upcoming Talks

Past Talks