Yinda Zhang will present his general exam on Monday, May 09, 2016 at 10am in CS 402.
![](https://secure.gravatar.com/avatar/9d4a00facedd23758daa6e1d1bb321b6.jpg?s=120&d=mm&r=g)
Yinda Zhang will present his general exam on Monday, May 09, 2016 at 10am in CS 402. The members of his committee are Jianxiong Xiao (adviser), Tom Funkhouser, and Szymon Rusinkiewicz. Everyone is invited to attend his talk, and those faculty wishing to remain for the oral exam following are welcome to do so. His abstract and reading list follow below. Reading list: [DalalTriggs] Histograms of oriented gradients for human detection. [SIFT] Distinctive image features from scale-invariant keypoints. [ExemplarSVMs] Ensemble of exemplar-svms for object detection and beyond. [DPM] Object Detection with Discriminatively Trained Part Based Models. [SelectiveSearch] Segmentation as selective search for object recognition. [Context] Auto-context and Its Application to High-level Vision Tasks [RCNN] Rich feature hierarchies for accurate object detection and semantic segmentation [Places] Learning Deep Features for Scene Recognition using Places Database. [Room Layout] Ecient Exact Inference for 3D Indoor SceneUnderstanding [Geometric Context] Geometric Context from a Single Image [Spatial Layout] Recovering the Spatial Layout of Cluttered Rooms Text book: Richard Szeliski, Computer Vision: Algorithm and Application Abstract: My research mainly focus on scene understanding, which includes a variety of topics to allow computer to perceive and interact with the surrounding environment. These topics, e.g. scene parsing, eye gaze tracking, scene classification, act as important roles in many critical robotics applications. The goal of scene parsing is to understand the spatial layout of the environment and localize objects in 3D space. Apart from traditional object recognition, scene parsing analyze the input image from a large scale that involves multiple objects, which allows the investigation of relationships among objects. Scene parsing also takes advantage of evidence beyond the visual appearance shown in the color or depth image, like past and future states of the object, underlying physics rule, functionality and interaction between human and objects. All of these key factors for scene understanding is more invariant and natural to be modeled in 3D space. Our first effort for 3D scene parsing is the PanoContext, which models an entire scene in 3D space. The experiments show that based solely on 3D context, PanoContext can achieve comparable performance with the state-of-the-art 2D object detectors. The output of the algorithm can be directly rendered in 3D for a virtual exploration of the indoor environment. Recently, we proposed a context-aware deep learning architecture for scene parsing. We hardwires the prior context knowledge into the network, which can be optimized end-to-end. The model requires only a single pass of the data to detect all the major objects in the room, and therefore is very efficient. Besides these, I also worked actively on creating large scaled dataset for scene classification, eye tracking saliency, and room layout estimation. Building a dataset with several magnitudes of bigger than the state of arts requires new interface and algorithm to enable data collection on crowd sourcing platform and achieve efficiency sub-linear to the scale of the data. While the scale of the 2D image dataset already goes to millions, RGBD images dataset still contains roughly 10K images, mostly due to the laborious labelling procedure. We are currently working on utilizing a large amount of unlabelled 3D reconstructed scene to learn a better feature representation for 3D object, and thus achieve better 3D object recognition performance.
participants (1)
-
Nicki Gotsis