Nikunj Saunshi will present his General Exam on Wednesday, October 23, 2019 at 9:15am in CS 402.

The members of his committee are Sanjeev Arora (adviser), Elad Hazan, and Karthik Narasimhan. Everyone is invited to attend his talk, and those faculty wishing to remain for the oral exam following are welcome to do so. His abstract and reading list follow below.

Title: A theoretical framework to analyze contrastive learning methods for representation learning

Abstract: Contrastive learning, reminiscent of word2vec, is a family of methods that use unlabeled data to learn data representations, — e.g., sentence embeddings — that are useful for downstream classification tasks. Training leverages access to semantically similar pairs of data points and encourages high inner product for representations of similar pairs and low inner product for random pairs of data points. We provide a theoretical framework for understanding such methods by positing that similar pairs come from the same latent class. Using a within-class independent sampling assumption, we show that small unsupervised loss in contrastive learning guarantees good performance on linear classification tasks involving these latent classes. This novel connection between the unlabeled data distribution and downstream tasks also helps us show (labeled) sample complexity benefits of such representation learning methods. We then extend this framework by leveraging a graph-theoretic view of the unlabeled data distribution that motivates a more relaxed and realistic data model for similar pairs: cross subclass independence (CSI). Under this new assumption, we show similar guarantees for downstream classification performance by employing a duality-based analysis. Using the graph formulation, we also highlight the need for incorporating assumptions about inductive biases of the representation function class in order to show guarantees under even weaker assumptions

Reading list
=========================================

1.        Shalev-Shwartz, Shai, and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge University Press, 2014.
2.        Logeswaran, L. and Lee, H. An efficient framework for learning sentence representations. In Proceedings of the International Conference on Learning Representations, 2018
3.        Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. Distributed representations of words and phrases and their compositionality. In Neural Information Processing Systems, 2013.
4.        Arora, S., Li, Y., Liang, Y., Ma, T. and Andrej Risteski. A latent variable model approach to PMI-based word embeddings. Transaction of Association for Computational Linguistics, 2016.
5.        Gutmann, M. and Hyvarinen, A. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 297–304, 2010.
6.        Ma, Z. and Collins, M. Noise contrastive estimation and negative sampling for conditional models: Consistency and statistical efficiency. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
7.        Maurer, A., Pontil, M., and Romera-Paredes, B. The benefit of multitask representation learning. J. Mach. Learn. Res., 2016.
8.        Syed, U. and Schapire, R. A reduction from apprenticeship learning to classification. In Advances in Neural Information Processing Systems 2010.
9.        Ross, S., Gordon, G. and Bagnell, D. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics 2011.
10.        Sun, W., Vemula, A., Boots, B. and Bagnell, D.. Provably Efficient Imitation Learning from Observation Alone. Proceedings of the 36th International Conference on Machine Learning, 2019.

==========================================