<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p style="color: rgb(0, 0, 0); font-family: arial, helvetica,
sans-serif; font-size: medium; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal; orphans: 2;
text-align: start; text-indent: 0px; text-transform: none;
white-space: normal; widows: 2; word-spacing: 0px;
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255,
255);">PIXL Lunch Talk<br>
</p>
<p style="color: rgb(0, 0, 0); font-family: arial, helvetica,
sans-serif; font-size: medium; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal; orphans: 2;
text-align: start; text-indent: 0px; text-transform: none;
white-space: normal; widows: 2; word-spacing: 0px;
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255,
255);">Speaker: <b>Alex Berg</b><br>
Date: Monday, March 06, 2017<br>
Time: 12:30-1:30PM (lunch is served)<br>
Location: CS 402<br>
</p>
<p style="color: rgb(0, 0, 0); font-family: arial, helvetica,
sans-serif; font-size: medium; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal; orphans: 2;
text-align: start; text-indent: 0px; text-transform: none;
white-space: normal; widows: 2; word-spacing: 0px;
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255,
255);"><b>Title</b><b>: </b>Rethinking object detection in
computer vision<br>
</p>
<p style="color: rgb(0, 0, 0); font-family: arial, helvetica,
sans-serif; font-size: medium; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal; orphans: 2;
text-align: start; text-indent: 0px; text-transform: none;
white-space: normal; widows: 2; word-spacing: 0px;
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255,
255);"><strong>Abstract: </strong>Object detection is one of the
core problems in computer vision and is a lens through which to
view the field. It brings together machine learning
(classification, regression) with the variation in appearance of
objects and scenes with pose and articulation (lighting, geometry)
, and the difficulty of what to recognize for what purpose
(semantics) all in a setting where computational complexity is not
something to talk about in abstract terms, but matters every
millisecond for inference and where it can take exaflops to train
a model (computation).</p>
<p style="color: rgb(0, 0, 0); font-family: arial, helvetica,
sans-serif; font-size: medium; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal; orphans: 2;
text-align: start; text-indent: 0px; text-transform: none;
white-space: normal; widows: 2; word-spacing: 0px;
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255,
255);">I will talk about our ongoing work attacking all fronts of
the detection problem. One is the speed-accuracy trade-off, which
determines the settings where it is reasonably possible to use
detection. Our work on single shot detection (SSD) is currently
the leading approach [1,2]. Another direction is moving beyond
detecting the presence and location of an object to detecting 3D
pose. We are working on both learning deep-network models of how
visual appearance changes with pose and object [3], as well as
integrating pose estimation as a first class element in detection
[4].</p>
<p style="color: rgb(0, 0, 0); font-family: arial, helvetica,
sans-serif; font-size: medium; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal; orphans: 2;
text-align: start; text-indent: 0px; text-transform: none;
white-space: normal; widows: 2; word-spacing: 0px;
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255,
255);">One place where pose is especially important is for object
detection in the world around us, e.g in robotics, as opposed to
on isolated internet images without context. I call this setting
"situated recognition". A key illustration that this setting is
under addressed is the lack of work in computer vision on the
problem of active vision, where perception is integrated in a loop
with sensor platform motion, a key challenge in robotics. I will
present our work on a new approach to collecting datasets for
training and evaluating situated recognition, allowing computer
vision researchers to study active vision, for instance training
networks using reinforcement learning on a densely sampled data of
real RGBD imagery without the difficulty of operating a robot in
the training loop. This is a counterpoint to recent work using
simulation and CG for such reinforcement learning, where our use
of real images allows studying and evaluating real-world
perception.</p>
<p style="color: rgb(0, 0, 0); font-family: arial, helvetica,
sans-serif; font-size: medium; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal; orphans: 2;
text-align: start; text-indent: 0px; text-transform: none;
white-space: normal; widows: 2; word-spacing: 0px;
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255,
255);">I will also briefly mention our lower-level work on
computation for computer vision and deep learning algorithms and
building tools for implementation on GPUS and fPGAs, as well as
other ongoing projects.</p>
<p style="color: rgb(0, 0, 0); font-family: arial, helvetica,
sans-serif; font-size: medium; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal; orphans: 2;
text-align: start; text-indent: 0px; text-transform: none;
white-space: normal; widows: 2; word-spacing: 0px;
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255,
255);">Collaborators for major parts of this talk UNC Students-
Wei Liu, Cheng-Yang Fu, Phil Ammirato, Ric Poirson, Eunbyung Park
Outside academic collaborator- Prof. Jana Kosecka (George Mason
University) Adobe: Duygu Ceylan, Jimei Yang, Ersin Yumer; Google:
Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed
Amazon: Ananth Ranga, Ambrish Tyagi</p>
<p style="color: rgb(0, 0, 0); font-family: arial, helvetica,
sans-serif; font-size: medium; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal; orphans: 2;
text-align: start; text-indent: 0px; text-transform: none;
white-space: normal; widows: 2; word-spacing: 0px;
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255,
255);">[1] SSD: Single Shot MultiBox Detector Wei Liu, Dragomir
Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang
Fu, Alexander C. Berg ECCV 2016
<a class="moz-txt-link-freetext" href="https://arxiv.org/pdf/1512.02325.pdf">https://arxiv.org/pdf/1512.02325.pdf</a></p>
<p style="color: rgb(0, 0, 0); font-family: arial, helvetica,
sans-serif; font-size: medium; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal; orphans: 2;
text-align: start; text-indent: 0px; text-transform: none;
white-space: normal; widows: 2; word-spacing: 0px;
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255,
255);">[2] DSSD : Deconvolutional Single Shot Detector Cheng-Yang
Fu, Wei Liu, Ananth Ranga, Ambrish Tyagi, and Alexander C. Berg
arXiv preprint arXiv:1701.06659
<a class="moz-txt-link-freetext" href="https://arxiv.org/pdf/1701.06659.pdf">https://arxiv.org/pdf/1701.06659.pdf</a></p>
<p style="color: rgb(0, 0, 0); font-family: arial, helvetica,
sans-serif; font-size: medium; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal; orphans: 2;
text-align: start; text-indent: 0px; text-transform: none;
white-space: normal; widows: 2; word-spacing: 0px;
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255,
255);">[3] Transformation-Grounded Image Generation Network for
Novel 3D View Synthesis Eunbyung Park, Jimei Yang, Ersin Yumer,
Duygu Ceylan, Alexander C. Berg To appear CVPR 2017</p>
<p style="color: rgb(0, 0, 0); font-family: arial, helvetica,
sans-serif; font-size: medium; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal; orphans: 2;
text-align: start; text-indent: 0px; text-transform: none;
white-space: normal; widows: 2; word-spacing: 0px;
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255,
255);">[4] Fast Single Shot Detection and Pose Estimation Patrick
Poirson, Philip Ammirato, Cheng-Yang Fu, Wei Liu, Jana Kosecka,
Alexander C. Berg 3DV 2016 <a class="moz-txt-link-freetext" href="https://arxiv.org/pdf/1609.05590">https://arxiv.org/pdf/1609.05590</a></p>
<p style="color: rgb(0, 0, 0); font-family: arial, helvetica,
sans-serif; font-size: medium; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal; orphans: 2;
text-align: start; text-indent: 0px; text-transform: none;
white-space: normal; widows: 2; word-spacing: 0px;
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255,
255);">[5] A Dataset for Developing and Benchmarking Active Vision
Phil Ammirato,Patrick Poirson, Eunbyung Park, Jana Kosecka, and
Alexander C. Berg to appear ICRA 2017</p>
<p style="color: rgb(0, 0, 0); font-family: arial, helvetica,
sans-serif; font-size: medium; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal; orphans: 2;
text-align: start; text-indent: 0px; text-transform: none;
white-space: normal; widows: 2; word-spacing: 0px;
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255,
255);"><strong>Bio: </strong>Alex Berg's research concerns
computational visual recognition. His work addresses aspects of
computer, human, and robot vision. He has worked on general object
recognition in images, action recognition in video, human pose
identification in images, image parsing, face recognition, image
search, and large-scale machine learning. He co-organizes the
ImageNet Large Scale Visual Recognition Challenge, and organized
the first Large-Scale Learning for Vision workshop. He is
currently an associate professor in computer science at UNC Chapel
Hill. Prior to that he was on the faculty at Stony Brook
University, a research scientist at Columbia University, and
research scientist at Yahoo! Research. His PhD at U.C. Berkeley
developed a novel approach to deformable template matching. He
earned a BA and MA in Mathematics from Johns Hopkins University
and learned to race sailboats at SSA in Annapolis. In 2013, his
work received the Marr prize.</p>
<p style="color: rgb(0, 0, 0); font-family: arial, helvetica,
sans-serif; font-size: medium; font-style: normal;
font-variant-ligatures: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal; orphans: 2;
text-align: start; text-indent: 0px; text-transform: none;
white-space: normal; widows: 2; word-spacing: 0px;
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255,
255);"><a class="moz-txt-link-freetext" href="https://scholar.google.com/citations?user=jjEht8wAAAAJ&pagesize=100">https://scholar.google.com/citations?user=jjEht8wAAAAJ&pagesize=100</a>
<a class="moz-txt-link-freetext" href="http://acberg.com/">http://acberg.com/</a></p>
</body>
</html>