<html>

  <head>


    <meta http-equiv="content-type" content="text/html; charset=utf-8">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <p style="color: rgb(0, 0, 0); font-family: arial, helvetica,

      sans-serif; font-size: medium; font-style: normal;

      font-variant-ligatures: normal; font-variant-caps: normal;

      font-weight: normal; letter-spacing: normal; orphans: 2;

      text-align: start; text-indent: 0px; text-transform: none;

      white-space: normal; widows: 2; word-spacing: 0px;

      -webkit-text-stroke-width: 0px; background-color: rgb(255, 255,

      255);">PIXL Lunch Talk<br>

    </p>

    <p style="color: rgb(0, 0, 0); font-family: arial, helvetica,

      sans-serif; font-size: medium; font-style: normal;

      font-variant-ligatures: normal; font-variant-caps: normal;

      font-weight: normal; letter-spacing: normal; orphans: 2;

      text-align: start; text-indent: 0px; text-transform: none;

      white-space: normal; widows: 2; word-spacing: 0px;

      -webkit-text-stroke-width: 0px; background-color: rgb(255, 255,

      255);">Speaker: <b>Alex Berg</b><br>

      Date: Monday, March 06, 2017<br>

      Time: 12:30-1:30PM (lunch is served)<br>

      Location: CS 402<br>

    </p>

    <p style="color: rgb(0, 0, 0); font-family: arial, helvetica,

      sans-serif; font-size: medium; font-style: normal;

      font-variant-ligatures: normal; font-variant-caps: normal;

      font-weight: normal; letter-spacing: normal; orphans: 2;

      text-align: start; text-indent: 0px; text-transform: none;

      white-space: normal; widows: 2; word-spacing: 0px;

      -webkit-text-stroke-width: 0px; background-color: rgb(255, 255,

      255);"><b>Title</b><b>: </b>Rethinking object detection in

      computer vision<br>

    </p>

    <p style="color: rgb(0, 0, 0); font-family: arial, helvetica,

      sans-serif; font-size: medium; font-style: normal;

      font-variant-ligatures: normal; font-variant-caps: normal;

      font-weight: normal; letter-spacing: normal; orphans: 2;

      text-align: start; text-indent: 0px; text-transform: none;

      white-space: normal; widows: 2; word-spacing: 0px;

      -webkit-text-stroke-width: 0px; background-color: rgb(255, 255,

      255);"><strong>Abstract: </strong>Object detection is one of the

      core problems in computer vision and is a lens through which to

      view the field. It brings together machine learning

      (classification, regression) with the variation in appearance of

      objects and scenes with pose and articulation (lighting, geometry)

      , and the difficulty of what to recognize for what purpose

      (semantics) all in a setting where computational complexity is not

      something to talk about in abstract terms, but matters every

      millisecond for inference and where it can take exaflops to train

      a model (computation).</p>

    <p style="color: rgb(0, 0, 0); font-family: arial, helvetica,

      sans-serif; font-size: medium; font-style: normal;

      font-variant-ligatures: normal; font-variant-caps: normal;

      font-weight: normal; letter-spacing: normal; orphans: 2;

      text-align: start; text-indent: 0px; text-transform: none;

      white-space: normal; widows: 2; word-spacing: 0px;

      -webkit-text-stroke-width: 0px; background-color: rgb(255, 255,

      255);">I will talk about our ongoing work attacking all fronts of

      the detection problem. One is the speed-accuracy trade-off, which

      determines the settings where it is reasonably possible to use

      detection. Our work on single shot detection (SSD) is currently

      the leading approach [1,2]. Another direction is moving beyond

      detecting the presence and location of an object to detecting 3D

      pose. We are working on both learning deep-network models of how

      visual appearance changes with pose and object [3], as well as

      integrating pose estimation as a first class element in detection

      [4].</p>

    <p style="color: rgb(0, 0, 0); font-family: arial, helvetica,

      sans-serif; font-size: medium; font-style: normal;

      font-variant-ligatures: normal; font-variant-caps: normal;

      font-weight: normal; letter-spacing: normal; orphans: 2;

      text-align: start; text-indent: 0px; text-transform: none;

      white-space: normal; widows: 2; word-spacing: 0px;

      -webkit-text-stroke-width: 0px; background-color: rgb(255, 255,

      255);">One place where pose is especially important is for object

      detection in the world around us, e.g in robotics, as opposed to

      on isolated internet images without context. I call this setting

      "situated recognition". A key illustration that this setting is

      under addressed is the lack of work in computer vision on the

      problem of active vision, where perception is integrated in a loop

      with sensor platform motion, a key challenge in robotics. I will

      present our work on a new approach to collecting datasets for

      training and evaluating situated recognition, allowing computer

      vision researchers to study active vision, for instance training

      networks using reinforcement learning on a densely sampled data of

      real RGBD imagery without the difficulty of operating a robot in

      the training loop. This is a counterpoint to recent work using

      simulation and CG for such reinforcement learning, where our use

      of real images allows studying and evaluating real-world

      perception.</p>

    <p style="color: rgb(0, 0, 0); font-family: arial, helvetica,

      sans-serif; font-size: medium; font-style: normal;

      font-variant-ligatures: normal; font-variant-caps: normal;

      font-weight: normal; letter-spacing: normal; orphans: 2;

      text-align: start; text-indent: 0px; text-transform: none;

      white-space: normal; widows: 2; word-spacing: 0px;

      -webkit-text-stroke-width: 0px; background-color: rgb(255, 255,

      255);">I will also briefly mention our lower-level work on

      computation for computer vision and deep learning algorithms and

      building tools for implementation on GPUS and fPGAs, as well as

      other ongoing projects.</p>

    <p style="color: rgb(0, 0, 0); font-family: arial, helvetica,

      sans-serif; font-size: medium; font-style: normal;

      font-variant-ligatures: normal; font-variant-caps: normal;

      font-weight: normal; letter-spacing: normal; orphans: 2;

      text-align: start; text-indent: 0px; text-transform: none;

      white-space: normal; widows: 2; word-spacing: 0px;

      -webkit-text-stroke-width: 0px; background-color: rgb(255, 255,

      255);">Collaborators for major parts of this talk UNC Students-

      Wei Liu, Cheng-Yang Fu, Phil Ammirato, Ric Poirson, Eunbyung Park

      Outside academic collaborator- Prof. Jana Kosecka (George Mason

      University) Adobe: Duygu Ceylan, Jimei Yang, Ersin Yumer; Google:

      Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed

      Amazon: Ananth Ranga, Ambrish Tyagi</p>

    <p style="color: rgb(0, 0, 0); font-family: arial, helvetica,

      sans-serif; font-size: medium; font-style: normal;

      font-variant-ligatures: normal; font-variant-caps: normal;

      font-weight: normal; letter-spacing: normal; orphans: 2;

      text-align: start; text-indent: 0px; text-transform: none;

      white-space: normal; widows: 2; word-spacing: 0px;

      -webkit-text-stroke-width: 0px; background-color: rgb(255, 255,

      255);">[1] SSD: Single Shot MultiBox Detector Wei Liu, Dragomir

      Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang

      Fu, Alexander C. Berg ECCV 2016

      <a class="moz-txt-link-freetext" href="https://arxiv.org/pdf/1512.02325.pdf">https://arxiv.org/pdf/1512.02325.pdf</a></p>

    <p style="color: rgb(0, 0, 0); font-family: arial, helvetica,

      sans-serif; font-size: medium; font-style: normal;

      font-variant-ligatures: normal; font-variant-caps: normal;

      font-weight: normal; letter-spacing: normal; orphans: 2;

      text-align: start; text-indent: 0px; text-transform: none;

      white-space: normal; widows: 2; word-spacing: 0px;

      -webkit-text-stroke-width: 0px; background-color: rgb(255, 255,

      255);">[2] DSSD : Deconvolutional Single Shot Detector Cheng-Yang

      Fu, Wei Liu, Ananth Ranga, Ambrish Tyagi, and Alexander C. Berg

      arXiv preprint arXiv:1701.06659

      <a class="moz-txt-link-freetext" href="https://arxiv.org/pdf/1701.06659.pdf">https://arxiv.org/pdf/1701.06659.pdf</a></p>

    <p style="color: rgb(0, 0, 0); font-family: arial, helvetica,

      sans-serif; font-size: medium; font-style: normal;

      font-variant-ligatures: normal; font-variant-caps: normal;

      font-weight: normal; letter-spacing: normal; orphans: 2;

      text-align: start; text-indent: 0px; text-transform: none;

      white-space: normal; widows: 2; word-spacing: 0px;

      -webkit-text-stroke-width: 0px; background-color: rgb(255, 255,

      255);">[3] Transformation-Grounded Image Generation Network for

      Novel 3D View Synthesis Eunbyung Park, Jimei Yang, Ersin Yumer,

      Duygu Ceylan, Alexander C. Berg To appear CVPR 2017</p>

    <p style="color: rgb(0, 0, 0); font-family: arial, helvetica,

      sans-serif; font-size: medium; font-style: normal;

      font-variant-ligatures: normal; font-variant-caps: normal;

      font-weight: normal; letter-spacing: normal; orphans: 2;

      text-align: start; text-indent: 0px; text-transform: none;

      white-space: normal; widows: 2; word-spacing: 0px;

      -webkit-text-stroke-width: 0px; background-color: rgb(255, 255,

      255);">[4] Fast Single Shot Detection and Pose Estimation Patrick

      Poirson, Philip Ammirato, Cheng-Yang Fu, Wei Liu, Jana Kosecka,

      Alexander C. Berg 3DV 2016 <a class="moz-txt-link-freetext" href="https://arxiv.org/pdf/1609.05590">https://arxiv.org/pdf/1609.05590</a></p>

    <p style="color: rgb(0, 0, 0); font-family: arial, helvetica,

      sans-serif; font-size: medium; font-style: normal;

      font-variant-ligatures: normal; font-variant-caps: normal;

      font-weight: normal; letter-spacing: normal; orphans: 2;

      text-align: start; text-indent: 0px; text-transform: none;

      white-space: normal; widows: 2; word-spacing: 0px;

      -webkit-text-stroke-width: 0px; background-color: rgb(255, 255,

      255);">[5] A Dataset for Developing and Benchmarking Active Vision

      Phil Ammirato,Patrick Poirson, Eunbyung Park, Jana Kosecka, and

      Alexander C. Berg to appear ICRA 2017</p>

    <p style="color: rgb(0, 0, 0); font-family: arial, helvetica,

      sans-serif; font-size: medium; font-style: normal;

      font-variant-ligatures: normal; font-variant-caps: normal;

      font-weight: normal; letter-spacing: normal; orphans: 2;

      text-align: start; text-indent: 0px; text-transform: none;

      white-space: normal; widows: 2; word-spacing: 0px;

      -webkit-text-stroke-width: 0px; background-color: rgb(255, 255,

      255);"><strong>Bio: </strong>Alex Berg's research concerns

      computational visual recognition. His work addresses aspects of

      computer, human, and robot vision. He has worked on general object

      recognition in images, action recognition in video, human pose

      identification in images, image parsing, face recognition, image

      search, and large-scale machine learning. He co-organizes the

      ImageNet Large Scale Visual Recognition Challenge, and organized

      the first Large-Scale Learning for Vision workshop. He is

      currently an associate professor in computer science at UNC Chapel

      Hill. Prior to that he was on the faculty at Stony Brook

      University, a research scientist at Columbia University, and

      research scientist at Yahoo! Research. His PhD at U.C. Berkeley

      developed a novel approach to deformable template matching. He

      earned a BA and MA in Mathematics from Johns Hopkins University

      and learned to race sailboats at SSA in Annapolis. In 2013, his

      work received the Marr prize.</p>

    <p style="color: rgb(0, 0, 0); font-family: arial, helvetica,

      sans-serif; font-size: medium; font-style: normal;

      font-variant-ligatures: normal; font-variant-caps: normal;

      font-weight: normal; letter-spacing: normal; orphans: 2;

      text-align: start; text-indent: 0px; text-transform: none;

      white-space: normal; widows: 2; word-spacing: 0px;

      -webkit-text-stroke-width: 0px; background-color: rgb(255, 255,

      255);"><a class="moz-txt-link-freetext" href="https://scholar.google.com/citations?user=jjEht8wAAAAJ&amp;pagesize=100">https://scholar.google.com/citations?user=jjEht8wAAAAJ&amp;pagesize=100</a>

      <a class="moz-txt-link-freetext" href="http://acberg.com/">http://acberg.com/</a></p>

  </body>

</html>