Dingli Yu will present his General Exam on Thursday, October 24, 2019 at 9:30am in CS 401.
Dingli Yu will present his General Exam on Thursday, October 24, 2019 at 9:30am in CS 401. The members of his committee are as follows: Sanjeev Arora (adviser), Jason D. Lee (ELE), and Jia Deng. Everyone is invited to attend his talk, and those faculty wishing to remain for the oral exam following are welcome to do so. His abstract and reading list follow below. Title: Harnessing the Power of Infinitely Wide Deep Nets Abstract: Recent research shows that the following two models are equivalent: (a) infinitely wide neural networks (NNs) trained under l2 loss by gradient descent with infinitesimally small learning rate (b) kernel regression with respect to so-called Neural Tangent Kernels (NTKs) (Jacot et al., 2018). An efficient algorithm to compute the NTK, as well as its convolutional counterparts (CNTK), appears in Arora et al. (2019a), yielded the finding that classification accuracy of CNTK on CIFAR-10 is within 6-7% of that of the corresponding CNN architecture (best figure being around 78%). Here, we show that NTK and CNTK or their enhanced version perform strongly in the following tasks. 1. On a standard testbed of classification/regression tasks from the UCI database, NTK SVM beats the previous gold standard, Random Forests (RF), and also the corresponding finite nets. 2. On VOC07 testbed for few-shot image classification tasks on ImageNet with transfer learning (Goyal et al., 2019), replacing the linear SVM currently used with a Convolutional NTK SVM consistently improves performance. 3. On CIFAR-10 with 10 – 640 training samples, CNTK consistently beats ResNet-34 by 1% - 3%. On the whole CIFAR-10 dataset, CNTK achieves 89% accuracy, matching the performance of AlexNet (Krizhevsky et al., 2012) by incorporating horizontal flip data augmentation, a pre-processing technique proposed by Coates et al. (2011) and a new operation called Local Average Pooling (LAP) which preserves efficient computability of the kernel and inherits the spirit of standard data augmentation using pixel shifts. Reading list: 1. Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of Machine Learning. MIT Press, 2018 2. Peter L Bartlett, and Shahar Mendelson, Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3(Nov):463–482, 2002. 3. Sanjeev Arora, Simon S Du, Wei Hu, Zhiyuan Li, Ruslan Salakhutdinov, and Ruosong Wang. On exact computation with an infinitely wide neural net. arXiv preprint arXiv:1904.11955, 2019. 4. Sanjeev Arora, Simon S Du, Wei Hu, Zhiyuan Li, and Ruosong Wang. Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks. arXiv preprint arXiv:1901.08584, 2019b. 5. Simon S. Du, Xiyu Zhai, Barnabas Poczos, and Aarti Singh. Gradient descent provably optimizes over-parameterized neural networks. In International Conference on Learning Representations, 2019. 6. Simon S Du, Jason D Lee, Haochuan Li, LiweiWang, and Xiyu Zhai. Gradient descent finds global minima of deep neural networks. arXiv preprint arXiv:1811.03804, 2018. 7. Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, and Vaishaal Shankar. Do imagenet classifiers generalize to imagenet? arXiv preprint arXiv:1902.10811, 2019. 8. Matthew Olson, Abraham Wyner, and Richard Berk. Modern neural networks generalize on small data sets. In Advances in Neural Information Processing Systems, pp. 3619-3628, 2018. 9. Manuel Fernandez-Delgado, Eva Cernadas, Senen Barro, and Dinani Amorim. Do we need hundreds of classifiers to solve real world classification problems? The Journal of Machine Learning Research, 15(1):3133-3181, 2014. 10. Julien Mairal, Piotr Koniusz, Zaid Harchaoui, and Cordelia Schmid. Convolutional kernel networks.In Advances in neural information processing systems, pages 2627–2635, 2014. 11. Adam Coates, Andrew Ng, and Honglak Lee. An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp. 215–223, 2011. 12. Shuxiao Chen, Edgar Dobriban, and Jane H Lee. Invariance reduces variance: Understanding data augmentation in deep learning and beyond. arXiv preprint arXiv:1907.10905, 2019.
participants (1)
-
Nicki Mahler