Andy Zeng will present his FPO "Learning Visual Affordances for Robotic Manipulation" on Thursday, 10/24/2019 at 1:30pm in CS 402

10 Oct 2019

      Andy Zeng will present his FPO "Learning Visual Affordances for Robotic Manipulation" on Thursday, 10/24/2019 at 1:30pm in CS 402. 

The members of his committee are as follows: Thomas Funkhouser (Adviser); Examiners: Olga Russakovsky, Adam Finkelstein, and Thomas Funkhouser; Readers: Szymon Rusinkiewicz and Alberto Rodriguez (MIT). 

A copy of his thesis is available upon request. Please email ngotsis@cs.princeton if you would like a copy of the thesis. 

Everyone is invited to attend his talk. The talk title and abstract follow below. 

Title: 
Learning Visual Affordances for Robotic Manipulation 

Abstract: 
A humans remarkable ability to manipulate unfamiliar objects with little prior 
knowledge of them is a constant inspiration for robotics research. Despite the interest 
of the research community, and despite its practical value, robust manipulation of 
novel objects in cluttered environments still remains a largely unsolved problem. 
Classic solutions (e.g. involving 6D object pose estimation) typically require prior 
knowledge of the objects (e.g. class categories or 3D CAD models), which may not be 
available outside of highly constrained settings. More recent deep learning methods 
using end-to-end convolutional networks (e.g. raw pixels to motor torques) have 
the potential to model complex skills that generalize, but they remain highly data 
inefficient – and robot data (e.g. trial and error) is expensive. 
In this thesis, we consider an approach to learning manipulation called visual 
affordances. The idea is to use classic controllers to design motion primitives, then use 
convolutional networks to map from visual observations (e.g. images) to the perceived 
affordances (e.g. confidence scores or action-values) of the primitives for every pixel 
of the input. By leveraging dense equivariant state and action representations, this 
formulation can be used to acquire complex vision-based manipulation skills (e.g. 
pushing, grasping, throwing) on real robot platforms that generalize to novel objects, 
while using orders of magnitude less data. While visual affordances may not be 
directly compatible with classic planning frameworks that involve explicit forward 
simulation or propagation, in this thesis we show that it is possible to workaround 
this limitation by extending it with model-free reinforcement learning to sequence 
primitive picking motions for more complex manipulation policies. We also study how 
it can be combined with residual physics (learning to predict residual values on top of 
control parameter estimates from an initial analytical controller) to enable learning 
end-to-end visuomotor policies that leverage the benefits of analytical models while 
still maintaining the capacity (via data-driven residuals) to account for real-world 
dynamics that are not explicitly modeled. Finally, we conclude by discussing the 
limitations of learning visual affordances, which suggest directions for future work.

Nicki Mahler

tags

participants (1)