Shuran Song will present her FPO "Data-Driven 3D Scene Understanding" on Tuesday, 10/23/2018 at 10am in CS 402.
Shuran Song will present her FPO "Data-Driven 3D Scene Understanding" on Tuesday, 10/23/2018 at 10am in CS 402. The members of her committee are as follows: Thomas Funkhouser (adviser); Examiners: Adam Finkelstein, Szymon Rusinkiewicz, and Thomas Funkhouser; Readers: Olga Russakovsky and Alberto Rodriguez (MIT) A copy of her thesis is available upon request. Everyone is invited to attend her talk. The talk title and abstract follow below: Intelligent robots require advanced vision capabilities to perceive and interact with the real physical world. While computer vision has made great strides in recent years, its predominant paradigm still focuses on analyzing image pixels to infer two dimensional outputs (e.g. 2D bounding boxes, or labeled 2D pixels.), which remain far from sufficient for real-world robotics applications. This dissertation presents the use of amodal 3D scene representations that enable intelligent systems to not only recognize what is seen (e.g. Am I looking at a chair?), but also predict contextual information about the complete 3D scene beyond visible surfaces (e.g. What could be behind the table? Where should I look to find an exit?). More specifically, it presents a line of work that demonstrates the power of these representations: First it shows how 3D amodal scene representation can be used to improve the performance of a traditional tasks such as object detection. We present SlidingShapes and DeepSlidingShapes for the task of amodal 3D object detection, where the system is designed to fully exploit the advantage of 3D information provided by depth images. Second, we introduce the task of semantic scene completion and our approach SSCNet, whose goal is to produce a complete 3D voxel representation of volumetric occupancy and semantic labels for a scene from a single-view depth map observation. Third, we introduce the task of semantic-structure view extrapolation and our approach Im2Pano3D, which aims to predict the 3D structure and semantic labels for a full 360 panoramic view of an indoor scene when given only a partial observation. Finally, we present two large-scale datasets (SUN RGB-D and SUNCG) that enable the research on data-driven 3D scene understanding. This dissertation demonstrates that leveraging a complete 3D scene representations not only significantly improves algorithm's performance for traditional computer vision tasks, but also paves the way for new scene understanding tasks that have previously been considered ill-posed given only 2D representations.
participants (1)
-
Nicki Gotsis