A 2D + 3D Rich Data Approach to Scene Understanding
Jianxiong Xiao,
Massachusetts Institute of Technology
Wednesday, March 13, 2013, 4:30pm
Computer Science 105
On your one-minute walk from the coffee machine to your desk each
morning, you pass by dozens of scenes -- a kitchen, an elevator, your
office -- and you effortlessly recognize them and perceive their 3D
structure. But this one-minute scene-understanding problem has been an
open challenge in computer vision for decades. Recently, researchers
have come to realize that big data is critical for building
scene-understanding systems that can recognize the semantics and
reconstruct the 3D structure. In this talk, I will share my experience
in leveraging big data for scene understanding, shifting the paradigm
from 2D view-based categorization to 3D place-centric representations.
To push the traditional 2D representation to the limit, we built the
Scene Understanding (SUN) Database, a large collection of images that
exhaustively spans all scene categories. However, the lack of a "rich"
representation still significantly limits the traditional recognition
pipeline. While an image is a 2D array, the world is 3D and our eyes see
it from a viewpoint, but this is not traditionally modeled. This
paradigm shift toward rich representation also opens up new challenges
that require a new kind of big data -- data with extra descriptions,
namely rich data. Specifically, we focus on a highly valuable kind of
rich data -- multiple viewpoints in 3D -- and we build the SUN3D
database to obtain an integrated "place-centric" representation of
scenes. This novel representation with rich data opens up exciting new
opportunities for integrating scene recognition over space and for
obtaining a scene-level reconstruction of large environments. It also
has many applications such as organizing big visual data to provide
photo-realistic indoor 3D maps. Finally, I will discuss some open
challenges and my future plans for rich data and representation.
Jianxiong Xiao is a Ph.D. candidate in the Computer Science and
Artificial Intelligence Laboratory (CSAIL) at Massachusetts Institute of
Technology (MIT). Before that, he received a B.Eng. and a M.Phil. from
the Hong Kong University of Science and Technology. His research
interests are in computer vision, with a focus on scene understanding.
His work has received the Best Student Paper Award at the European
Conference on Computer Vision (ECCV) in 2012, and has appeared in
popular press. Jianxiong was awarded the Google U.S./Canada Ph.D.
Fellowship in Computer Vision in 2012 and MIT CSW Best Research Award in
2011. More information can be found on his website:
http://mit.edu/jxiao.