Language as a Scaffold for Visual Recognition

Friday, April 20, 2018, 3:00 pm - 4:00 pm PDTiCal
Conf. Rms #1135 and #1137
This event is open to the public.
NL Seminar
Mark Yatskar (AI2)

Abstract: In this talk we propose to use natural language as a guide for what people can perceive about the world from images and what ultimately machines should aim to see as well. We discuss two recent structured prediction efforts in this vein: scene graph parsing in Visual Genome, a framework derived from captions, and visual semantic role labeling in imSitu, a formalism built on FrameNet and WordNet. In scene graph parsing, we examine the problem of modeling higher order repeating structure (motifs) and present new state-of-the-art baselines and methods. We then look at the problem semantic sparsity in visual semantic role labeling: infrequent combinations of output semantics are frequent. We present new compositional and data-augmentation methods for dealing with this challenge, significantly improving on prior work.

Bio: Mark Yatskar is a post-doc at the Allen Institute for Artificial Intelligence and recipient of their Young Investigator Award. His primary research is in the intersection of language and vision, natural language generation, and ethical computing. He received his Ph.D. from the University of Washington with Luke Zettlemoyer and Ali Farhadi and in 2016 received the EMNLP best paper award and his work has been featured in Wired and the New York Times.

« Return to Upcoming Events