Seminars and Events

Artificial Intelligence Seminar

Does the Data Induce Capacity Control in Deep Learning?

Event Details

Accepted statistical wisdom suggests that larger the model class, the more likely it is to overfit the training data. And yet, deep networks generalize extremely well. The larger the deep network, the better its accuracy on new data. This talk seeks to shed light upon this apparent paradox. We will argue that deep networks are successful because of a characteristic structure in the space of learning tasks. The input correlation matrix for typical tasks has a peculiar (“sloppy”) eigenspectrum where, in addition to a few large eigenvalues (salient features), there are a large number of small eigenvalues that are distributed uniformly over exponentially large ranges. This structure in the input data is strongly mirrored in the representation learned by the network. A number of quantities such as the Hessian, the Fisher Information Matrix, as well as others activation correlations and Jacobians, are also sloppy. Even if the model class for deep networks is very large, there is an exponentially small subset of models (in the number of data) that fit such sloppy tasks. This talk will demonstrate the first analytical non-vacuous generalization bound for deep networks that does not use compression. We will also discuss an application of these concepts that develops new algorithms for semi-supervised learning.References1. Does the data induce capacity control in deep learning?. Rubing Yang, Jialin Mao, and Pratik Chaudhari. [arXiv preprint, 2021] https://arxiv.org/abs/2110.141632. Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks. Pratik Chaudhari and Stefano Soatto [ICLR ’18] https://arxiv.org/abs/1710.110293. Deep Reference Priors: What is the best way to pretrain a model? Yansong Gao, Rahul Ramesh, Pratik Chaudhari. https://arxiv.org/abs/2202.00187

Speaker Bio

Pratik Chaudhari is an Assistant Professor in Electrical and Systems Engineering and Computer and Information Science at the University of Pennsylvania. He is a member of the GRASP Laboratory. From 2018—19, he was a Senior Applied Scientist at Amazon Web Services and a Postdoctoral Scholar in Computing and Mathematical Sciences at Caltech. Pratik received his PhD (2018) in Computer Science from UCLA, his Master's (2012) and Engineer's (2014) degrees in Aeronautics and Astronautics from MIT. He was a part of NuTonomy Inc. (now Hyundai- Aptiv Motional) from 2014—16. He is the recipient of the NSF CAREER award in 2022.

YOU ONLY NEED TO REGISTER ONCE TO ATTEND THE ENTIRE SERIES – We will send you email announcements with details of the upcoming speakers.
Register in advance for this webinar: https://usc.zoom.us/webinar/register/WN__0VhakI6Q6i3JsasdmNWcA. After registering, you will receive an email confirmation containing information about joining the Zoom webinar.

The recording for this Seminar talk will be posted on our USC/ISI YouTube page within 1-2 business days: https://www.youtube.com/user/USCISI.

Host: Muhao Chen POC: Alma Nava