Smart Enough to Know Better? How AI Handles Legal Questions

ISI researchers tested large language models on sensitive legal scenarios and found surprising vulnerabilities.

by Julia Cohen

May 27, 2025

When researchers at USC Viterbi’s Information Sciences Institute (ISI) asked a chatbot whether it was legal to ship phosphorus to another country, the answer was clear: no, it would be against the law. But when they rephrased the question slightly, asking for instructions on how to do it, the chatbot sometimes provided detailed advice.

This surprising behavior was at the center of a new study that explored how large language models (LLMs) understand, and sometimes violate, legal frameworks.

The researchers focused on a particularly sensitive area: biological weapons law. “We were curious if the advice that it would give would violate biological weapons convention standards,” explained Fred Morstatter, a principal scientist at ISI and Research Assistant Professor of computer science at the USC Viterbi School of Engineering. Title 18 Section 175 of the U.S. Code prohibits the development, possession, and dissemination of biological weapons. Violations can have serious consequences, and understanding this law provided a useful test case for evaluating the risks of AI-generated content.

To structure the study, the team first tested whether the models correctly recognized illegal actions. Then they examined whether models would still generate unsafe outputs even after recognizing legal boundaries, and whether they could recognize intent behind those actions. In legal terms, this is known as mens rea. “Mens rea means ‘guilty mind’ in Latin,” Morstatter explained. “You have to know that you’re committing a crime in order to be convicted.” By testing whether AI could detect signs of intent, the researchers were probing its capacity for deeper legal reasoning, not just knowing what the rules are, but understanding when and why they’re being broken.

The researchers used knowledge graphs to map legal concepts and combined them with Retrieval-Augmented Generation (RAG) techniques to evaluate the models’ reasoning. They found that while the models often recognized legal restrictions when asked directly, under slightly different circumstances they could still produce prohibited or unsafe advice. “Our findings reveal significant limitations in LLMs’ reasoning and safety mechanisms,” the team wrote in their paper, Knowledge Graph Analysis of Legal Understanding and Violations in LLMs.

“It’s alarming how minor changes in prompt phrasing can lead large language models to provide step-by-step guidance on developing bioweapons. This underscores a critical need for stronger safeguards to prevent their exploitation for malicious purposes,” said Abha Jha, first author of the paper. Jha’s co-author Abel Salinas, both of whom are graduate students at the Thomas Lord Department of Computer Science, added, “As AI becomes more powerful and is being trusted for more and more jobs, it’s important to understand and investigate both the potential biases but also the safety risks.”

Although the study revealed vulnerabilities, the researchers were also optimistic about solutions. They recommend building stronger safeguards that go beyond simple factual knowledge, incorporating deeper legal and ethical reasoning. With these improvements, AI systems could be better aligned with human expectations for lawful and responsible behavior.

The work points toward an important next step: developing AI that not only knows the rules, but reliably applies them.

Published on

Last updated on

This article may feature some AI-assisted content for clarity, consistency, and to help explore complex scientific concepts with greater depth and creative range.