Seminars and Events
Experiments in Scaling Reinforcement Learning with Verifiable Rewards
Event Details
Speaker Nathan Lambert, Allen Institute
Virtual event
Join Zoom Meeting
https://usc.zoom.us/j/94409584905?pwd=Sm5LVkd0bndUdEluM3piK0NWTUQrUT09
Meeting ID: 944 0958 4905
Passcode: 822247
Abstract
With the release of DeepSeek’s R1 reasoning model, interest in reinforcement learning may be at an all time high. Academics are pouring energy into the space, trying to replicate DeepSeek’s results and establish clear trade-offs and capabilities of this new era of reinforcement learning on language models. This talk discusses these new results with language models trained with Reinforcement Learning with Verifiable Rewards (RLVR), our efforts at scaling them for Ai2’s OLMo and Tülu language models, hints that we may have missed indicating that RL is more effective than people give credit for, and some history from my background in model-based RL/robotics. The goal of the talk is to present a mix of (recent) historical context on language modeling and cutting edge research with RL to forecast how the rapidly expanding industry of language models may change in the near future.
Speaker Bio
Nathan Lambert is a Senior Research Scientist and post-training lead at the Allen Institute for AI focusing on building open language models. At the same time he founded and operates Interconnects.ai to increase transparency and understanding of current AI models and systems.
Previously, he helped build an RLHF research team at HuggingFace. He received his PhD from the University of California, Berkeley working at the intersection of machine learning and robotics. He was advised by Professor Kristofer Pister in the Berkeley Autonomous Microsystems Lab and Roberto Calandra at Meta AI Research. He was lucky to intern at Facebook AI and DeepMind during his Ph.D. Nathan was was awarded the UC Berkeley EECS Demetri Angelakos Memorial Achievement Award for Altruism for his efforts to better community norms.