Fisher Yu will present his Pre-FPO on Monday, July 18, 2016 at 1pm in CS 402. The members of his committee are: Thomas Funkhouser (adviser), Vladlen Koltun (reader;Intel), Szymon Rusinkiewicz (reader), Adam Finkelstein (non-reader), Kai Li (non-reader). Everyone is invited to attend his talk. The talk title and abstract follow below: Title: Pixel-Level Prediction of Depth and Semantics in Images Outline: Pixel-Level prediction is a common task in computer vision. Human eyes can assign each pixel 3D properties, such as depth and normal, and semantic meaning, such as object category and boundary. The pixel-level information can help algorithm process online content with finer granularity and help autonomous agents navigate in complicated environment. I will present my research in 3D geometry to predict pixel depth in previously impossible cases and convolutional networks that can improve image segmentation results substantially. I will also talk about my efforts to create large scale image dataset which may create new possibilities in computer vision research. I will first talk about pixel-level depth prediction from a set of images with very small motion. The images are taken from videos recording hand motion when he/she tries to hold the camera still. The scenario is challenging due to very small baseline. We analyze the SfM problem from geometry perspective and find an algorithm that can recover camera motion for most of the cases. Further, a multi-view stereo method robust to depth noise is developed to predict the depth at each pixel. We also show that the recovered depth can be useful for some computational photography applications. To predict the semantics of each pixel, or to do semantic image segmentation, we turn to convolutional networks, which is more powerful than the alternatives. A lot of works have studied how to use the network designed for image classification as basis to solve the segmentation problem. I argue that in order to design a better model for image segmentation, we have to change the basis to make it more suitable for segmentation. Toward this goal, I introduce dilated convolution as an additional building block for semantic segmentation and study its properties. We use dilated convolution to build a deeper network that has much wider receptive field than the original image classification network. Our model has got competitive performance on different datasets. Finally, I will introduce my on-going project, LSUN, which aims to make it possible to explore weakly annotated data to help solve the image segmentation problem. In LSUN project, we have collected orders of magnitude more labeled and unlabeled images than any previous datasets. To reach the scale, we design an iterative method to combine the advantages of classification models and human labeling. We also introduce additional statistical tests along the process so that the final labeled set has controlled precision. Up to now, we are able to collect around 70 million labeled images with about 90% labeling accuracy. Additional experiments show that LSUN dataset is poised to give us more insight in understanding images.