Seminars and Events
Benchmarking MLLMs for Embodied Decision Making and Cognitive World Modeling
Event Details
Location: ISI #1135
Speaker: Yu (Bryan) Zhou, UCLA
Abstract: While MLLMs have been widely used for spatial cognition and decision making in embodied agent environments, we still lack a systematic understanding of their performance because they are usually applied in different domains, for different purposes, and built based on different inputs and outputs. This presentation will introduce two benchmarks that aim to shed light on this problem by introducing systematic definitions and interfaces that supports the formalized evaluation of various abilities and input-output specifications of MLLM-based embodied agent modules. Furthermore, we will detail scalable data collection methods and verifiable evaluation methods based on simulation environments.
Zoom Meeting Link
Meeting ID: 919 9598 8425
Passcode: 2025
Speaker Bio
Yu Zhou is a second year CS Ph.D. student at UCLA, co-advised by Prof. Nanyun Peng and Prof. Kai-Wei Chang. His research focuses on Multimodality (Vision + Language) and Embodied Learning. He is also a part-time research scientist intern in the Segment Anything Team at Meta Superintelligence Labs. Prior to his PhD, He was a visiting researcher at Stanford SVL, working with Prof. Jiajun Wu and Prof. Fei-Fei Li. He has organized the Workshop on Foundation Models + Embodied Agents at CVPR 2025 and the Embodied Agent Interface Challenge at NeurIPS 2025. His work has led him to receive the Best Paper Award at SoCal NLP Symposium 2024.
Subscribe here to learn more about upcoming seminars: https://www.isi.edu/events/
Hosts: Jonathan May and Dhananjay Ashok
POC: Maura Covaci