2. Logeswaran, L. and Lee, H. An efficient framework for learning sentence representations. In Proceedings of the International Conference
on Learning Representations, 2018.
3. Sanjeev Arora, Yuanzhi Li, Yingyu Liang, Tengyu Ma, and Andrej Risteski. 2015. Rand-walk: A latent variable model approach to word embeddings. arXiv
preprint arXiv:1502.03520(2015).
4. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. Distributed representations of words and phrases
and their compositionality. In Neural Information Processing Systems, 2013.
5. Blum, A. and Mitchell, T. Combining labeled and unlabeled data with co-training. In Proceedings of the Eleventh Annual Conference
on Computational Learning Theory, COLT’ 98, 1998.
6. Kumar, A., Niculescu-Mizil, A., Kavukcoglu, K., and Daume ́, H. A binary classification framework
for two-stage multiple kernel learning. In Proceedings of the 29th International Conference on Machine Learning, ICML’12, 2012.
7. Srebro, N. How Good is a Kernel When Used as a Similarity Measure. In COLT,
2007.
8. Brian Kulis. Metric learning: A survey. Foundations and Trends in
Machine Learning, 5(4):287–364, 2012.
9. Gutmann, M. and Hyvarinen, A. Noise-contrastive estimation: A new estimation principle for unnormalized statistical
models. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 297–304, 2010.
10. Ma, Z. and Collins, M. (2018). Noise contrastive estimation and negative sampling for conditional models: Consistency
and statistical efficiency. arXiv preprint arXiv:1809.01812 .