Shuran Song will present her FPO "Data-Driven 3D Scene Understanding" on Tuesday, 10/23/2018 at 10am in CS 402.

16 Oct 2018

      Shuran Song will present her FPO "Data-Driven 3D Scene Understanding" on Tuesday, 10/23/2018 at 10am in CS 402. 

The members of her committee are as follows: Thomas Funkhouser (adviser); Examiners: Adam Finkelstein, Szymon Rusinkiewicz, and Thomas Funkhouser; Readers: Olga Russakovsky and Alberto Rodriguez (MIT) 

A copy of her thesis is available upon request. 

Everyone is invited to attend her talk. The talk title and abstract follow below: 
Intelligent robots require advanced vision capabilities to perceive and interact with 
the real physical world. While computer vision has made great strides in recent 
years, its predominant paradigm still focuses on analyzing image pixels to infer two 
dimensional outputs (e.g. 2D bounding boxes, or labeled 2D pixels.), which remain 
far from sufficient for real-world robotics applications. 
This dissertation presents the use of amodal 3D scene representations that enable 
intelligent systems to not only recognize what is seen (e.g. Am I looking at a chair?), 
but also predict contextual information about the complete 3D scene beyond visible 
surfaces (e.g. What could be behind the table? Where should I look to find an exit?). 
More specifically, it presents a line of work that demonstrates the power of these 
representations: First it shows how 3D amodal scene representation can be used to 
improve the performance of a traditional tasks such as object detection. We present 
SlidingShapes and DeepSlidingShapes for the task of amodal 3D object detection, 
where the system is designed to fully exploit the advantage of 3D information provided 
by depth images. Second, we introduce the task of semantic scene completion and 
our approach SSCNet, whose goal is to produce a complete 3D voxel representation of 
volumetric occupancy and semantic labels for a scene from a single-view depth map 
observation. Third, we introduce the task of semantic-structure view extrapolation 
and our approach Im2Pano3D, which aims to predict the 3D structure and semantic 
labels for a full 360 panoramic view of an indoor scene when given only a partial 
observation. Finally, we present two large-scale datasets (SUN RGB-D and SUNCG) 
that enable the research on data-driven 3D scene understanding. 
This dissertation demonstrates that leveraging a complete 3D scene representations 
not only significantly improves algorithm's performance for traditional computer 
vision tasks, but also paves the way for new scene understanding tasks that have previously 
been considered ill-posed given only 2D representations.

Nicki Gotsis

tags

participants (1)