Zeyu Ma will present his General Exam "Multiview Stereo with Cascaded Epipolar RAFT" on Monday, May 2, 2022 at 10:00 AM in CS 402 and via zoom. Zoom link: https://princeton.zoom.us/j/4673688313 Committee Members: Jia Deng (advisor), Szymon Rusinkiewicz, Felix Heide Abstract: Multiview stereo (MVS) is an important task in 3D computer vision. It seeks to reconstruct a full 3D model, typically in the form of a dense 3D point cloud, from multiple RGB images with known camera intrinsics and poses. It is a difficult task that remains unsolved; the main challenge is producing a 3D model that is not only accurate but also complete, that is, no parts should be missing and all fine details should be recovered. Classical methods essentially formulate multiview stereo as an optimization problem, which seeks to find a 3D model that is most compatible with the observed images. The compatibility is typically based on some hand-designed notion of photo-consistency, assuming that pixels that are projections of the same 3D point should have similar appearance. Many of the latest results of multiview stereo are achieved by deep networks. In particular, many recent leading methods are variants of MVSNet, a deep architecture that consists of two main steps: (1) constructing a 3D cost volume in the frustum of a reference view, by warping features from other views, and (2) using 3D convolutional layers to transform, or regularize, the cost volume before using it to predict a depth map. The resulting depth maps, one from each reference view, are then combined to form a single 3D point cloud through a heuristic procedure. However, a drawback of MVSNet is that regularizing the 3D plane-sweeping cost volume using 3D convolutions can be costly in terms of computation and memory. Therefore, We propose CER-MVS (Cascaded Epipolar RAFT Multiview Stereo), a new approach based on the RAFT (Recurrent All-Pairs Field Transforms) architecture developed for optical flow. CER-MVS introduces five new changes to RAFT: epipolar cost volumes, cost volume cascading, multiview fusion of cost volumes, dynamic supervision, and multiresolution fusion of depth maps. CER-MVS is significantly different from prior work in multiview stereo as it operates by updating a disparity field. Experiments show that our approach achieves state-of-the-art performance on the DTU and Tanks-and-Temples benchmarks (both intermediate and advanced set). Reading List: https://docs.google.com/document/d/1ELU3MmNlojf4FnyrLcFScN7AsARx7cJCA0VvAnQX... Everyone is invited to attend the talk, and those faculty wishing to remain for the oral exam following are welcome to do so. Louis Riehl Graduate Administrator Computer Science Department, CS213 Princeton University (609) 258-8014