Announcement revised 11/22/2021 to include the Zoom link.

---------------------------------------------------------------------------------

Yi Zhang will present his FPO "Generalization of Deep Neural Networks in Supervised Learning, Generative Modeling and Adaptive Data Analysis" on Friday, December 3, 2021 at 3PM via Zoom.

Zoom Link: https://princeton.zoom.us/j/91333853352

The members of his committee are as follows:

Examiners: Sanjeev Arora (Adviser), Jason Lee, Karthik Narasimhan

Readers: Chi Jin, Elad Hazan

A copy of his thesis is available upon request. Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis.

Everyone is invited to attend the talk.

Abstract follows below:

Why can neural nets with a vast number of parameters, trained on small datasets, still accurately classify unseen data? This "generalization mystery" has become a central question in deep learning. Besides the traditional supervised learning setting, the success of deep learning extends to many other regimes where our understanding of generalization behavior is even more elusive. In this thesis, we begin with supervised learning and ultimately aim to shed light on the generalization performance of deep neural nets in generative modeling and adaptive data analysis by presenting novel theoretical frameworks and practical tools.

First, we prove a generalization bound for supervised deep neural networks building upon an empirical observation that the inference computations of deep nets trained on real-life datasets are highly resistant to noise. Following an information-theoretic principle that noise stability indicates redundancy and compressibility, we propose a new succinct compression of the trained net, which leads to drastically better generalization estimates.

Next, we establish a finite capacity analysis of Generative Adversarial Networks (GANs). Our study gives insights into the limitations of GANs' ability to learn distributions, and we provide empirical evidence that well-known GANs approaches do result in degenerate solutions. Despite the negative results, we proceed to demonstrate a surprising positive use case of GANs: the test performance of deep neural net classifiers can be predicted accurately using synthetic data generated from a GAN model that was trained on the same training set.

Finally, we probe the question "has deep learning models overfitted to standard datasets such as ImageNet after years of data reuse?" We provide a simple estimate, Rip Van Winkle's Razor, for measuring overfitting due to data overuse. It relies upon a new notion of the amount of information that would have to be provided to an expert referee who is familiar with the field and relevant math and who has just been woken up after falling asleep at the moment of the creation of the test set (like in the fairy tale). We show this estimate is non-vacuous in many ImageNet models.