[talks] Colloquium Speaker Abhinav Gupta March 23

Thu Mar 17 09:20:41 EDT 2011

Beyond Naming: Image Understanding via Physical, Functional and Causal Relationships
Abhinav Gupta, Carnegie Mellon University
Wednesday, March 23, 4:30pm
Computer Science 105

What does it mean to "understand" an image? One popular answer is simply naming the objects seen in the image. During the last decade most computer vision researchers have focused on this "object naming" problem. While there has been great progress in detecting things like "cars" and "people", such a level of understanding still cannot answer even basic questions about an image such as "What is the geometric structure of the scene?", "Where in the image can I walk?" or "What is going to happen next?". In this talk, I will show that it is beneficial to go beyond mere object naming and harness relationships between objects for image understanding. These relationships can provide crucial high-level constraints to help construct a globally-consistent model of the scene, as well as allow for powerful ways of understanding and interpreting the underlying image. Specifically, I will present image and video understanding systems that incorporate: (1) physical relationships between objects via a qualitative 3D volumetric representation; (2) functional relationships between objects and actions via data-driven physical interactions; and (3) causal relationships between actions via a storyline representation. I will demonstrate the importance of these relationships on a diverse set of real-world images and videos.