[talks] ​​​​​Yinda Zhang will present his general exam on Monday, May 09, 2016 at 10am in CS 402.

Nicki Gotsis ngotsis at CS.Princeton.EDU
Mon May 2 13:28:35 EDT 2016


Yinda Zhang will present his general exam on Monday, May 09, 2016 at 10am in CS 402.

The members of his committee are Jianxiong Xiao (adviser), Tom Funkhouser, and Szymon Rusinkiewicz.

Everyone is invited to attend his talk, and those faculty wishing to remain for the oral exam following are welcome to do so.  His abstract and reading list follow below.

Reading list:
[DalalTriggs] Histograms of oriented gradients for human detection.
[SIFT] Distinctive image features from scale-invariant keypoints.
[ExemplarSVMs] Ensemble of exemplar-svms for object detection and beyond.
[DPM] Object Detection with Discriminatively Trained Part Based Models.
[SelectiveSearch] Segmentation as selective search for object recognition.
[Context] Auto-context and Its Application to High-level Vision Tasks
[RCNN] Rich feature hierarchies for accurate object detection and semantic
segmentation
[Places] Learning Deep Features for Scene Recognition using Places Database.
[Room Layout] Ecient Exact Inference for 3D Indoor SceneUnderstanding
[Geometric Context] Geometric Context from a Single Image
[Spatial Layout] Recovering the Spatial Layout of Cluttered Rooms
Text book:
Richard Szeliski, Computer Vision: Algorithm and Application

Abstract:
My research mainly focus on scene understanding, which includes a variety
of topics to allow computer to perceive and interact with the surrounding
environment. These topics, e.g. scene parsing, eye gaze tracking, scene
classification, act as important roles in many critical robotics
applications.

The goal of scene parsing is to understand the spatial layout of the
environment and localize objects in 3D space. Apart from traditional object
recognition, scene parsing analyze the input image from a large scale that
involves multiple objects, which allows the investigation of relationships
among objects. Scene parsing also takes advantage of evidence beyond the
visual appearance shown in the color or depth image, like past and future
states of the object, underlying physics rule, functionality and
interaction between human and objects. All of these key factors for scene
understanding is more invariant and natural to be modeled in 3D space. Our
first effort  for 3D scene parsing is the PanoContext, which models an
entire scene in 3D space. The experiments show that based solely on 3D
context, PanoContext can achieve comparable performance with the
state-of-the-art 2D object detectors. The output of the algorithm can be
directly rendered in 3D for a virtual exploration of the indoor
environment. Recently, we proposed a context-aware deep learning
architecture for scene parsing. We hardwires the prior context knowledge
into the network, which can be optimized end-to-end. The model requires
only a single pass of the data to detect all the major objects in the room,
and therefore is very efficient.

Besides these, I also worked actively on creating large scaled dataset for
scene classification, eye tracking saliency, and room layout estimation.
Building a dataset with several magnitudes of bigger than the state of arts
requires new interface and algorithm to enable data collection on crowd
sourcing platform and achieve efficiency sub-linear to the scale of the
data.

While the scale of the 2D image dataset already goes to millions, RGBD
images dataset still contains roughly 10K images, mostly due to the
laborious labelling procedure. We are currently working on utilizing a
large amount of unlabelled 3D reconstructed scene to learn a better feature
representation for 3D object, and thus achieve better 3D object recognition
performance.


More information about the talks mailing list