Artificial Intelligence

Visual Question Answering: the Good, the Bad, and the Ugly

Friday, July 20, 2018, 3:00pm - 4:00pm PDTiCal
Conf. Rm #1135
This event is open to the public.
NL Seminar
Wei-Lun "Harry" Chao (USC --> OSU)

Abstract: Visual question answering (Visual QA) requires comprehending and reasoning with both visual and language information, a characteristic ability that AI should strive to achieve. Merely in the past three years, over a dozen datasets have been released, together with many learning-based models that have been narrowing the gap between the humans’ performance and the machines’. On one popular dataset VQA, the state-of-the-art model achieves 71.4% accuracy, just 17% shy of that by humans.

While seemingly remarkable, it needs a deeper investigation on what knowledge the machine actually learns—does it understand the multi-modal information? Or it relies on and over-fits to the incidental dataset statistics. Moreover, current experimental setups mainly focus on training and testing within the same dataset. It is unclear how the learned model can be applied to the real environment where both the visual and language data might have mismatch.

In this talk, I will present our recent studies to answer these questions. We show that the dataset design has a significant impact on what a model learns. Specifically, the resulting model can ignore the visual information, the question, or both while still doing well on the task. We thus propose automatic procedures to remedy such design deficiencies. We then show that the mismatch in language hinders transferring a learned model across datasets. To this end, we develop a domain adaptation algorithm for Visual QA to facilitate knowledge transfer. Finally, I will present a probabilistic framework of Visual QA algorithms to effectively leverage the answer semantics, drastically increasing the transferability. I will conclude the talk with future directions to advance Visual QA.

Bio: Wei-Lun (Harry) Chao is a Computer Science PhD candidate at University of Southern California, working with Fei Sha. His research interests are in machine learning and its applications to computer vision, artificial intelligence, and health care. His recent work has focused on transfer learning toward vision and language understanding in the wild. His earlier research includes work on probabilistic inference, structured prediction for video summarization, and face understanding. He will be joining The Ohio State University as an assistant professor in 2019 Fall, following a one-year postdoc at Cornell University.

« Return to Events