Zachary Teed will present his General Exam, "Video to Depth with Differentiable Structure from Motion" on Tuesday, January 14, 2020 at 4pm in CS 401.
The members of his committee are Jia Deng (adviser), Olga Russakovsky, and Szymon Rusinkiewicz.
Everyone is invited to attend his talk, and those faculty wishing to remain for the oral exam following are welcome to do so. His abstract and reading list follow below.
Title: Video to Depth with Differentiable Structure from Motion
Abstract: In video to depth, the task is to estimate depth from a video sequence. This problem has traditionally been approached using Structure from Motion (SfM), which takes a collection of images as input, and jointly optimizes over 3D structure and camera motion. In parallel, deep learning has been highly successful on a number of 3D reconstruction tasks, but as recent work has shown, it is often hard to train generic network layers to directly utilize multiview geometry. We propose an end-to-end learning architecture for predicting depth from video which combines the representation ability of neural networks with the geometric principles governing image formation. We compose a collection of classical geometric algorithms, which are converted into trainable modules and combined into an end-to-end differentiable architecture. We evaluate our approach on 5 separate datasets, and outperform all existing single-view or multi-view approaches, while also demonstrating that our method can generalize across datasets.
Textbook: Szeliski, Richard. Computer vision: algorithms and applications. Springer Science & Business Media, 2010.
Papers:
Engel, Jakob, Thomas Schöps, and Daniel Cremers. "LSD-SLAM: Large-scale direct monocular SLAM." European conference on computer vision. Springer, Cham, 2014.
Engel, Jakob, Vladlen Koltun, and Daniel Cremers. "Direct sparse odometry." IEEE transactions on pattern analysis and machine intelligence 40.3 (2017): 611-625.
Kümmerle, Rainer, et al. "g 2 o: A general framework for graph optimization." 2011 IEEE International Conference on Robotics and Automation. IEEE, 2011.
Ranftl, Rene, et al. "Dense monocular depth estimation in complex dynamic scenes." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
Schonberger, Johannes L., and Jan-Michael Frahm. "Structure-from-motion revisited." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
Quiroga, Julian, et al. "Dense semi-rigid scene flow estimation from rgbd images." European Conference on Computer Vision. Springer, Cham, 2014.
Amos, Brandon, and J. Zico Kolter. "Optnet: Differentiable optimization as a layer in neural networks." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017.
Dosovitskiy, Alexey, et al. "Flownet: Learning optical flow with convolutional networks." Proceedings of the IEEE international conference on computer vision. 2015.
Kendall, Alex, et al. "End-to-end learning of geometry and context for deep stereo regression." Proceedings of the IEEE International Conference on Computer Vision. 2017.
Tang, Chengzhou, and Ping Tan. "Ba-net: Dense bundle adjustment network." arXiv preprint arXiv:1806.04807 (2018).
Zhou, Huizhong, Benjamin Ummenhofer, and Thomas Brox. "Deeptam: Deep tracking and mapping." Proceedings of the European Conference on Computer Vision (ECCV). 2018.