Advances in Vision-and-Language Research Beyond Performance

Vision-and-language models (e.g., CLIP) have achieved remarkable performance on a wide array of tasks. Nevertheless, with great power comes with great responsibility. In this talk, I will discuss two emerging research directions for vision-and-language models beyond performance—fairness and privacy. First, we identify a unique gender bias problem in image search and propose two simple yet effective solutions to debias the models during both training and inference. Then, we view language as a recipient to study multimodal model’s equal treatment of languages and investigate the multilingual accuracy disparity across other fairness dimensions including gender, race, and age. In the end, I will briefly talk about our recent efforts on building a privacy-preserving embodied agent to protect people’s privacy while serving them effectively.

Xin (Eric) Wang is an Assistant Professor of Computer Science and Engineering at UC Santa Cruz. His research interests include Natural Language Processing, Computer Vision, and Machine Learning, with an emphasis on building embodied AI agents that can communicate with humans using natural language to perform real-world multimodal tasks. He obtained his Ph.D. degree from UC Santa Barbara and Bachelor's degree from Zhejiang University. He also interned at Google AI, Facebook AI Research, Microsoft Research, and Adobe Research. Xin has served as Area Chair for ACL, NAACL, EMNLP, etc., and Senior Program Committee (SPC) for AAAI and IJCAI. He organized multiple workshops and tutorials at CVPR, ICCV, ACL, NAACL, AACL, etc. He received the CVPR Best Student Paper Award in 2019.

Host: Muhao Chen, POC: Amy Feng

