[talks] Fisher Yu FPO - March 27, 2018 at 10:30 am Room CS 402

Tue Mar 20 17:38:17 EDT 2018

Good afternoon,

Fisher Yu will present his FPO on, March 27, 2018 at 10:30 am in Room CS 402.

The members of his Committee are: 

Thomas Funkhouser - Princeton University
Adam Funkelstein - Princeton University
Olga Russakovsky - Princeton University

All are welcomed to attend.

Title:  Pixel-Level Prediction: From Geometry to Semantics

Abstract:  

Pixel-level prediction generalizes a wide range of computer vision tasks including semantic image segmentation and dense depth prediction. They are fundamental for image recognition, receiving continual attention from the community. However, although they share common traits that may admit a general solution, they are usually studied in isolation because of different domain characteristics. This thesis aims to study the essential problems behind those tasks and shed light on a general framework.
This thesis starts with an algorithm that can predict plausible depth from almost identical images based on geometric optimization. The motion between those images is called ``Accidental Motion''. The analysis of accidental motion shows that motion optimization has special convexity properties. It leads to a reconstruction pipeline that can produce a plausible dense depth map for the reference image, which is shown to enable depth based camera effects.

The second part then studies learning pixel representation to predict semantic properties based on the single reference image. Previous works usually use learned upsampling to recover the pixel-level information. This work proposes to use Dilated Convolution to transform the classification networks such that high-resolution prediction is achieved without upsampling. Dilated Convolution can also render an exponential increase in receptive field, which is ideal for learning global context. A context module is proposed based on this property that can improve the network performance significantly and consistently. Dilation is still a standard component in the state-of-the-art method for semantic image segmentation.

The further study of dilated residual networks shows that same high-resolution prediction can also improve image classification results. This indicates no essential network architecture difference exists between image classification and segmentation. Further inspection of class activation maps and layer responses uncover peculiar gridding patterns and their cause. This finding leads to new designs of convolutional networks that can remove the gridding artifacts and produce activations with better spatial consistency. The new networks can improve the performance of both image classification and semantic segmentation.

The presented method and results may inspire new research in building a unified framework for image recognition of geometry and semantics.

Barbara Mooring
Interim Graduate Coordinator
Computer Science Department
Princeton University