[talks] Talk by Alex Berg on Monday March 6

Tue Feb 28 18:43:40 EST 2017

PIXL Lunch Talk

Speaker: *Alex Berg*
Date: Monday, March 06, 2017
Time: 12:30-1:30PM (lunch is served)
Location: CS 402

*Title**: *Rethinking object detection in computer vision

*Abstract: *Object detection is one of the core problems in computer 
vision and is a lens through which to view the field. It brings together 
machine learning (classification, regression) with the variation in 
appearance of objects and scenes with pose and articulation (lighting, 
geometry) , and the difficulty of what to recognize for what purpose 
(semantics) all in a setting where computational complexity is not 
something to talk about in abstract terms, but matters every millisecond 
for inference and where it can take exaflops to train a model (computation).

I will talk about our ongoing work attacking all fronts of the detection 
problem. One is the speed-accuracy trade-off, which determines the 
settings where it is reasonably possible to use detection. Our work on 
single shot detection (SSD) is currently the leading approach [1,2]. 
Another direction is moving beyond detecting the presence and location 
of an object to detecting 3D pose. We are working on both learning 
deep-network models of how visual appearance changes with pose and 
object [3], as well as integrating pose estimation as a first class 
element in detection [4].

One place where pose is especially important is for object detection in 
the world around us, e.g in robotics, as opposed to on isolated internet 
images without context. I call this setting "situated recognition". A 
key illustration that this setting is under addressed is the lack of 
work in computer vision on the problem of active vision, where 
perception is integrated in a loop with sensor platform motion, a key 
challenge in robotics. I will present our work on a new approach to 
collecting datasets for training and evaluating situated recognition, 
allowing computer vision researchers to study active vision, for 
instance training networks using reinforcement learning on a densely 
sampled data of real RGBD imagery without the difficulty of operating a 
robot in the training loop. This is a counterpoint to recent work using 
simulation and CG for such reinforcement learning, where our use of real 
images allows studying and evaluating real-world perception.

I will also briefly mention our lower-level work on computation for 
computer vision and deep learning algorithms and building tools for 
implementation on GPUS and fPGAs, as well as other ongoing projects.

Collaborators for major parts of this talk UNC Students- Wei Liu, 
Cheng-Yang Fu, Phil Ammirato, Ric Poirson, Eunbyung Park Outside 
academic collaborator- Prof. Jana Kosecka (George Mason University) 
Adobe: Duygu Ceylan, Jimei Yang, Ersin Yumer; Google: Dragomir Anguelov, 
Dumitru Erhan, Christian Szegedy, Scott Reed Amazon: Ananth Ranga, 
Ambrish Tyagi

[1] SSD: Single Shot MultiBox Detector Wei Liu, Dragomir Anguelov, 
Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander 
C. Berg ECCV 2016 https://arxiv.org/pdf/1512.02325.pdf

[2] DSSD : Deconvolutional Single Shot Detector Cheng-Yang Fu, Wei Liu, 
Ananth Ranga, Ambrish Tyagi, and Alexander C. Berg arXiv preprint 
arXiv:1701.06659 https://arxiv.org/pdf/1701.06659.pdf

[3] Transformation-Grounded Image Generation Network for Novel 3D View 
Synthesis Eunbyung Park, Jimei Yang, Ersin Yumer, Duygu Ceylan, 
Alexander C. Berg To appear CVPR 2017

[4] Fast Single Shot Detection and Pose Estimation Patrick Poirson, 
Philip Ammirato, Cheng-Yang Fu, Wei Liu, Jana Kosecka, Alexander C. Berg 
3DV 2016 https://arxiv.org/pdf/1609.05590

[5] A Dataset for Developing and Benchmarking Active Vision Phil 
Ammirato,Patrick Poirson, Eunbyung Park, Jana Kosecka, and Alexander C. 
Berg to appear ICRA 2017

*Bio: *Alex Berg's research concerns computational visual recognition. 
His work addresses aspects of computer, human, and robot vision. He has 
worked on general object recognition in images, action recognition in 
video, human pose identification in images, image parsing, face 
recognition, image search, and large-scale machine learning. He 
co-organizes the ImageNet Large Scale Visual Recognition Challenge, and 
organized the first Large-Scale Learning for Vision workshop. He is 
currently an associate professor in computer science at UNC Chapel Hill. 
Prior to that he was on the faculty at Stony Brook University, a 
research scientist at Columbia University, and research scientist at 
Yahoo! Research. His PhD at U.C. Berkeley developed a novel approach to 
deformable template matching. He earned a BA and MA in Mathematics from 
Johns Hopkins University and learned to race sailboats at SSA in 
Annapolis. In 2013, his work received the Marr prize.

https://scholar.google.com/citations?user=jjEht8wAAAAJ&pagesize=100 
http://acberg.com/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.princeton.edu/pipermail/talks/attachments/20170228/7bf1fca0/attachment.html>