Seminars and Events

Artificial Intelligence Seminar

Do Blind AI Navigation Agents Build Maps?

Event Details

The embodiment hypothesis is the idea that “intelligence emerges in the interaction of an agent with an environment and as a result of sensorimotor activity”. Imagine walking up to a home robot and asking “Hey robot – can you go check if my laptop is on my desk? And if so, bring it to me”. Or asking an egocentric AI assistant (operating on your smart glasses): “Hey – where did I last see my keys?”. In order to be successful, such an embodied agent would need a range of skills – visual perception (to recognize & map scenes and objects), language understanding (to translate questions and instructions into actions), and action (to move and find things in a changing environment). I will first give an overview of work happening at my groups at Georgia Tech and with collaborators at FAIR building up to this grand goal of embodied AI.

Next, I will dive into a recent project where we asked if machines – specifically, navigation agents – build cognitive maps. Specifically, we train ‘blind’ AI agents – with sensing limited to only egomotion – to perform PointGoal navigation (‘go to delta-x, delta-y relative to start’) via reinforcement learning. We find that blind AI agents are surprisingly effective navigators in unseen environments (~95% success). Further still, we find that (1) these blind AI agents utilize memory over long horizons (remembering ~1,000 steps of past experience in an episode); (2) this memory enables them to take shortcuts, i.e. efficiently travel through previously unexplored parts of the environment; (3) there is emergence of maps in this memory, i.e. a detailed occupancy grid of the environment can be decoded from the agent memory; and (4) the emergent maps are selective and task dependent – the agent forgets unnecessary excursions and only remembers the end points of such detours. Overall, our experiments and analysis show that blind AI agents take shortcuts and build cognitive maps purely from learning to navigate, suggesting that cognitive maps may be a natural solution to the problem of navigation and shedding light on the internal workings of AI navigation agents.

Speaker Bio

Dhruv Batra is an Associate Professor in the School of Interactive Computing at Georgia Tech and a Research Scientist at Facebook AI Research (FAIR). The long-term goal of his research is to develop agents that 'see' (or more generally perceive their environment through vision, audition, or other senses), 'talk' (i.e. hold a natural language dialog grounded in their environment), 'act' (e.g. navigate their environment and interact with it to accomplish goals), and 'reason' (i.e., consider the long-term consequences of their actions). He is a recipient of the Presidential Early Career Award for Scientists and Engineers (PECASE) 2019, a number of young researcher awards (ECASE-Army award 2018, ONR YIP award 2017, NSF CAREER award 2014, ARO YIP award 2014), and several best paper awards and nominations (ICCV 2019, EMNLP 2017) and teaching commendations. His research is supported by NSF, ARO, ARL, ONR, DARPA, Amazon, Google, Microsoft, and NVIDIA. Research from his lab has been extensively covered in the media (with varying levels of accuracy) at CNN, BBC, CNBC, Bloomberg Business, The Boston Globe, MIT Technology Review, Newsweek, The Verge, New Scientist, and NPR.