Orestis Plevrakis will present his General Exam "A Theoretical Analysis of Contrastive Unsupervised Representation Learning" on May 13, 2019 at 2pm in CS 401.

The members of his committee are Sanjeev Arora (adviser), Elad Hazan, and Rob Schapire.

Everyone is invited to attend his talk, and those faculty wishing to remain for the oral exam following are welcome to do so. His abstract and reading list follow below.

Abstract:
Recent empirical works have successfully used unlabeled data to learn feature representations that are broadly useful in downstream classification tasks. Several of these methods are reminiscent of the well-known word2vec embedding algorithm: leveraging availability of pairs of semantically “similar” data points and “negative samples,” the learner forces the inner product of representations of similar pairs with each other to be higher on average than with negative samples. We use the term contrastive learning for such algorithms and we present a theoretical framework for analyzing them by introducing latent classes and hypothesizing that semantically similar points are sampled from the same latent class. This framework allows us to show provable guarantees on the performance of the learned representations on the average classification task that is comprised of a subset of the same set of latent classes. Our generalization bound also shows that learned representations can reduce the (labeled) sample complexity on downstream tasks. We conduct controlled experiments in both the text and image domains to support the theory.

References:
1. Shalev-Shwartz, Shai, and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.

2. Logeswaran, L. and Lee, H. An efficient framework for learning sentence representations. In Proceedings of the International Conference on Learning Representations, 2018.

3. Sanjeev Arora, Yuanzhi Li, Yingyu Liang, Tengyu Ma, and Andrej Risteski. 2015. Rand-walk: A latent variable model approach to word embeddings. arXiv preprint arXiv:1502.03520(2015).

4. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. Distributed representations of words and phrases and their compositionality. In Neural Information Processing Systems, 2013.

5. Blum, A. and Mitchell, T. Combining labeled and unlabeled data with co-training. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory, COLT’ 98, 1998.

6. Kumar, A., Niculescu-Mizil, A., Kavukcoglu, K., and Daume ́, H. A binary classification framework for two-stage multiple kernel learning. In Proceedings of the 29th International Conference on Machine Learning, ICML’12, 2012.

7. Srebro, N. How Good is a Kernel When Used as a Similarity Measure. In COLT, 2007.

8. Brian Kulis. Metric learning: A survey. Foundations and Trends in Machine Learning, 5(4):287–364, 2012.

9. Gutmann, M. and Hyvarinen, A. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 297–304, 2010.

10. Ma, Z. and Collins, M. (2018). Noise contrastive estimation and negative sampling for conditional models: Consistency and statistical efficiency. arXiv preprint arXiv:1809.01812 .