Kyle Genova will present his Pre-FPO on Thursday, January 14, 2021 at 10am via Zoom.

Zoom link:

Meeting ID: 959 9089 2901

The members of his committee are as follows: Tom Funkhouser (advisor), Olga Russakovsky (reader), Forrester Cole (reader, Google), Adam Finkelstein (examiner), Szymon Rusinkiewicz (examiner).

All are welcome to attend.

Abstract:

Often, computer vision tasks require understanding or generating data that is explicitly three dimensional. Such tasks can be relatively low level, like recovering geometric surface properties for object insertion or relighting, or high level, like semantic segmentation of point clouds for mapping or navigation. Yet, the most common form of input data or data label is an image. RGB and depth cameras are ubiquitous sensors, and it is often also easier to collect labels at scale in 2D compared to 3D. Based on this observation, we propose methods for recovering 3D shapes from input 2D views, and understanding 3D data from 2D supervision.

First, we demonstrate a method for recovering the geometry of human faces where both the input data and labels are 2D RGB images. This is possible only because face geometry can be represented with a domain-specific parametric 3D morphable model, which can be separately learned from 3D data. We then consider more general classes of shape, for which we propose a novel representation, the Structured Implicit Function. We show that this representation is useful for learning general shape priors and correspondences, and demonstrate it can be used for RGB image -> 3D shape recovery. We extend this representation with local deep implicit functions, and demonstrate its utility for completing 2D depth observations of shapes in a more robust way than existing approaches.

In addition to methods for recovering 3D geometry from views, we propose methods for understanding geometry from views and for understanding how geometry is typically viewed. The first method enables training a 3D semantic segmentation network from only a labeled image collection, in comparison to previous methods that require labeled 3D training data. We then propose a method to synthesize views that mimic a diverse input image collection, by modeling the collection as samples from a view distribution that can be repeatedly sampled via rendering synthetic geometry. Our methods demonstrate that leveraging and understanding the relationship between input views and geometry is useful for a spectrum of explicitly 3D tasks.