“Attention Is All You Need”: USC Alumni Paved Path for ChatGPT

by Caitlin Dawson

Published on March 9th, 2023Last updated on March 9th, 2023

ChatGPT has taken the world by storm, but seeds of the groundbreaking technology were sown at the USC Viterbi School of Engineering. The seminal paper “Attention Is All You Need,” which laid the foundation for ChatGPT and other generative AI systems, was co-authored by Ashish Vaswani, a PhD computer science graduate (’14) and Niki Parmar, a master’s in computer science graduate (’15).

The landmark paper was presented at the 2017 Conference on Neural Information Processing Systems (NeurIPS), one of the top conferences in AI and machine learning. In the paper, the researchers introduced the transformer architecture, a powerful type of neural network that has become widely used for natural language processing tasks, from text classification to language modeling.

Attention is all you need
The paper, published at NeurIPS in 2017, has been cited more than 60,000 times.

The paper, published at NeurIPS in 2017, has been cited more than 60,000 times.

Transformer models apply mathematical techniques called “attention” that allow the model to selectively focus on different words and phrases of the input text, and to generate more coherent, contextually relevant responses. By understanding the relationships between words in a text, the model can better capture the underlying meaning and context of the input text. ChatGPT uses a variant of the transformer called the GPT (or Generative Pre-Trained Transformer).

The transformer architecture is considered a paradigm shift in artificial intelligence and natural language processing, making Recurrent Neural Networks (RNNs), the once-dominant architecture in language processing models, largely obsolete. It is considered a crucial element of ChatGPT’s success, alongside other innovations in deep learning and open-source distributed training.

“The important components in this paper were doing parallel computation across all the words in the sentence and the ability to learn and capture the relationships between any two words in the sentence,” said Parmar, “not just neighboring words as in long short-term memory networks and convolutional neural network-based models.”

A universal model

Vaswani refers to ChatGPT as “a clear landmark in the arc of AI.”

“There is going to be a time before Chat-GPT and a time after Chat-GPT,” said Vaswani, the paper’s first author. “We’re seeing the beginnings of profound tools for thought that will eventually make us much more capable in the digital world.”

“I knew that Ashish and Niki could do amazing things together, but I never imagined that they would change our entire field.” Morteza Dehghani

But it wasn’t necessarily his goal when he started working on the transformer model in 2016.

“For me, personally, I was seeking a universal model. A single model that would consolidate all modalities and exchange information between them, just like the human brain.”

As of publishing, “Attention Is All You Need” has received more than 60,000 citations, according to Google Scholar. Its total citation count continues to increase as researchers build on its insights and apply transformer architecture techniques to new problems, from image and music generation, to predicting protein properties for medicine.

It also set the stage for a third USC Viterbi engineer to take the torch and catapult this foundational research to the product we know today as ChatGPT. In September 2022, Barret Zoph (BS Computer Science, ’16), joined OpenAI as tech lead, playing a fundamental role in developing the language model.

“It’s especially interesting to think back to when I first got started on research in 2016. The amount of progress from that time to where it is now is almost unfathomable,” said Zoph.

“I think exponential growth is quite a hard thing to grasp. But I think [models like ChatGPT] are going to continue to get better, to help people and to improve their day-to-day lives.”

As an undergraduate student at USC, Zoph worked with professors Kevin Knight and David Kempe on research and “studying deeper problems in computer science,” he said.

“I think getting involved in research as an undergraduate is a really great thing that’s offered at USC and finding the right professor can make the world of difference,” said Zoph, who also worked at Google Brain after graduation before joining OpenAI.

“Understanding deep learning systems, how to build the infrastructure, and train the models has been a relevant skill for me throughout my entire career. But in general, I learned a lot about having a good research process, trying to tackle hard problems and thinking deeply about the things I’m working on.”

In keeping with this legacy, on March 9, USC President Carol Folt announced the university is launching the Center for Generative AI and Society with $10 million for research that will draw together leading experts from Viterbi and four other schools: USC Annenberg School for Communication and Journalism, USC School of Cinematic Arts, USC Iovine and Young Academy and USC Rossier School of Education.

A USC connection

Born in India and raised there and in the Middle East, Vaswani interned at both IBM and Google before joining USC as a computer science PhD candidate in 2004, working under the supervision of Liang Huang, now a professor at Oregon State University, and David Chiang, now at the University of Notre Dame.

Vaswani visited Papua New Guinea in 2012 for a project on natural language processing to document endangered languages. Photo/David Chiang.

Vaswani visited Papua New Guinea in 2012 for a project on natural language processing to document endangered languages. Photo/David Chiang.

“Ashish was my first PhD student and one of the very first people to see the potential for deep learning in natural language processing back in 2011,” said Chiang.

Huang refers to Vaswani as a “visionary” during his time at USC and recalls him building a GPU workstation in his office from scratch when few people understood the importance of GPUs in AI or natural language processing (NLP).

“To me, Ashish’s transformer paper is kind of a ‘reward’ for his early belief in deep learning,” said Huang. “At that time, the nascent paradigm of deep learning, being rather immature, was by no means the dominant paradigm in AI or NLP. However, Ashish showed a unique and keen interest. He was enthusiastic, unique, thinking differently, and ahead of his time.”

During this time as a computer science student, he worked in the USC Information Sciences Institute’s natural language processing group, led by former USC Computer Science Professor Kevin Knight (who also served as research advisor to Zoph).

With Knight, Vaswani worked on neural language models, early versions of what underlies ChatGPT. In a paper titled “Decoding with Large-Scale Neural Language Models Improves Translation,” Vaswani and his co-authors showed that neural language models improved automatic language translation accuracy. He also co-authored a paper titled “Simple Fast Noise-Contrastive Estimation for Large RNN Vocabularies” that developed a technique for efficiently training neural language models.

Pursuing bold ideas

Vaswani said he thrived at USC, where he was free to explore the creative realms of his research and pursue his burgeoning interest in deep learning.

“It was a vibrant, tremendous research group pursuing bold ideas.” Ashish Vaswani.

“It’s not just an intellectual environment – there was a certain culture that focused on pragmatic, real results, and communication was so important,” said Vaswani. “Everything I learned at USC shaped how I do my research and how I learn and absorb information. It was a vibrant, tremendous research group pursuing bold ideas, and that’s rare.”

After graduation, he joined Google Brain as a research scientist in 2016. A year later, he co-authored the pioneering paper with a team of researchers including his Google Brain colleague and fellow USC graduate Niki Parmar. Vaswani and Parmar had first met at USC when Vaswani gave a guest lecture on neural networks, and the pair became fast friends and research collaborators.

Niki Parmar (center) and Morteza Dehghani (right) pictured at the USC campus. Photo/Morteza Dehghani.

Niki Parmar (center) and Morteza Dehghani (right) pictured at the USC campus. Photo/Morteza Dehghani.

As a master’s student, Parmar joined the Computational Social Science Lab led by Morteza Dheghani, an associate professor of psychology and computer science.

“I was working on applying NLP techniques to better understand the behavioral dynamics between users on social media websites and how it related to moral values and homophily studies,” said Parmar.

Parmar joined Google right after graduation, where she researched state-of-the-art models for sentence similarity and question answering.

“Niki collaborated on several projects with me, and it was clear that she is simply an outstanding thinker,” said Dehghani, who first met Niki in an introduction to AI class he taught in 2013.

“I am immensely proud her,” adds Dehghani, who also co-authored a paper with Vaswani. “I knew that Ashish and Niki could do amazing things together, but I never imagined that they would change our entire field.”

For everyone who worked with Vaswani and Parmar, the paper’s longevity and legacy is a remarkable feat and a testament to their prescience.

“An amazing thing about the transformers paper is that it’s held strong for six years now,” said Knight. “Given the rapid current pace of developments in NLP and machine learning—where you see new state-of-the-art algorithms every few months—you’d expect transformers to be replaced by now. So, they tapped into something fundamental there.”

Huang agrees, adding: “Wherever I visit, I can find Ashish’s paper on people’s desks everywhere.”

Published on March 9th, 2023

Last updated on May 19th, 2023

Want to write about this story?