Seminars and Events

Artificial Intelligence Seminar

Experiments in Scaling Reinforcement Learning with Verifiable Rewards

Event Details

Speaker  Nathan Lambert, Allen Institute

Virtual event

Join Zoom Meeting

https://usc.zoom.us/j/94409584905?pwd=Sm5LVkd0bndUdEluM3piK0NWTUQrUT09

Meeting ID: 944 0958 4905
Passcode: 822247

Abstract

With the release of DeepSeek’s R1 reasoning model, interest in reinforcement learning may be at an all time high. Academics are pouring energy into the space, trying to replicate DeepSeek’s results and establish clear trade-offs and capabilities of this new era of reinforcement learning on language models. This talk discusses these new results with language models trained with Reinforcement Learning with Verifiable Rewards (RLVR), our efforts at scaling them for Ai2’s OLMo and Tülu language models, hints that we may have missed indicating that RL is more effective than people give credit for, and some history from my background in model-based RL/robotics. The goal of the talk is to present a mix of (recent) historical context on language modeling and cutting edge research with RL to forecast how the rapidly expanding industry of language models may change in the near future.

Speaker Bio

Nathan Lambert is a Senior Research Scientist and post-training lead at the Allen Institute for AI focusing on building open language models. At the same time he founded and operates Interconnects.ai to increase transparency and understanding of current AI models and systems.

Previously, he helped build an RLHF research team at HuggingFace. He received his PhD from the University of California, Berkeley working at the intersection of machine learning and robotics. He was advised by Professor Kristofer Pister in the Berkeley Autonomous Microsystems Lab and Roberto Calandra at Meta AI Research. He was lucky to intern at Facebook AI and DeepMind during his Ph.D. Nathan was was awarded the UC Berkeley EECS Demetri Angelakos Memorial Achievement Award for Altruism for his efforts to better community norms.

If speaker approves to be recorded for this seminar it will be posted on the USC/ISI YouTube page within 1-2 business days: https://www.youtube.com/user/USCISI.
Subscribe here to learn more about upcoming seminars: https://www.isi.edu/events/ .
Host Eric Boxer
POC Justina Gilleland