[talks] Fisher Yu will present his Pre-FPO on Monday, July 18, 2016 at 1pm in CS 402

Wed Jul 13 15:22:11 EDT 2016

Fisher Yu will present his Pre-FPO on Monday, July 18, 2016 at 1pm in CS 402.

The members of his committee are: Thomas Funkhouser (adviser), Vladlen Koltun (reader;Intel), Szymon Rusinkiewicz (reader), Adam Finkelstein (non-reader),  Kai Li (non-reader).

Everyone is invited to attend his talk.  The talk title and abstract follow below:

Title: Pixel-Level Prediction of Depth and Semantics in Images

Outline:

Pixel-Level prediction is a common task in computer vision. Human eyes can
assign each pixel 3D properties, such as depth and normal, and semantic
meaning, such as object category and boundary. The pixel-level information
can help algorithm process online content with finer granularity and help
autonomous agents navigate in complicated environment. I will present my
research in 3D geometry to predict pixel depth in previously impossible
cases and convolutional networks that can improve image segmentation
results substantially. I will also talk about my efforts to create large
scale image dataset which may create new possibilities in computer vision
research.

I will first talk about pixel-level depth prediction from a set of images
with very small motion. The images are taken from videos recording hand
motion when he/she tries to hold the camera still. The scenario is
challenging due to very small baseline. We analyze the SfM problem from
geometry perspective and find an algorithm that can recover camera motion
for most of the cases. Further, a multi-view stereo method robust to depth
noise is developed to predict the depth at each pixel. We also show that
the recovered depth can be useful for some computational photography
applications.

To predict the semantics of each pixel, or to do semantic image
segmentation, we turn to convolutional networks, which is more powerful
than the alternatives. A lot of works have studied how to use the network
designed for image classification as basis to solve the segmentation
problem. I argue that in order to design a better model for image
segmentation, we have to change the basis to make it more suitable for
segmentation. Toward this goal, I introduce dilated convolution as an
additional building block for semantic segmentation and study its
properties. We use dilated convolution to build a deeper network that has
much wider receptive field than the original image classification network.
Our model has got competitive performance on different datasets.

Finally, I will introduce my on-going project, LSUN, which aims to make it
possible to explore weakly annotated data to help solve the image
segmentation problem. In LSUN project, we have collected orders of
magnitude more labeled and unlabeled images than any previous datasets. To
reach the scale, we design an iterative method to combine the advantages of
classification models and human labeling. We also introduce additional
statistical tests along the process so that the final labeled set has
controlled precision. Up to now, we are able to collect around 70 million
labeled images with about 90% labeling accuracy. Additional experiments
show that LSUN dataset is poised to give us more insight in understanding
images.