Andy Zeng will present his Pre FPO "Visual Affordances for Robotic Manipulation" on Tuesday, February 26, 2019 at 10am in CS 401.

The members of his committee are as follows: Tom Funkhouser (adviser), Szymon Rusinkiewicz, Olga Russakovsky, Adam Finkelstein, Alberto Rodriguez (MIT).

All are welcome to attend. Please see below for talk title and abstract.

Title: Learning Visual Affordances for Robotic Manipulation

Abstract:

How do we use machine learning algorithms to enable robots to intelligently interact with the physical world? On the one hand, we can adapt the classic sense-plan-act paradigm by replacing individual modules (e.g. pose estimation) with deep networks -- but these algorithms remain specific to training scenarios and suffer from error propagation. On the other hand, we can learn end-to-end models that map from raw sensory data (RGB images) to low-level control (motor torques) -- but these models are highly data inefficient, and robot data is expensive. In this talk, I will propose a middleground: learning the visual affordances of motion primitives. The idea is to use stable control algorithms to design motion primitives, then use deep learning to map from visual observations to the perceived affordances (e.g. confidence scores, or action-values) of these primitives. By abstracting away low-level control from the learning, this formulation significantly improves data efficiency, while exhibiting versatility and generalization.

First, I will show how this formulation can be used to learn manipulation policies for multiple grasping primitives that generalize to new objects, enabling a real-world picking system to win 1st place (stow task) at the Amazon Robotics Challenge. Next, I will discuss how we can adapt this framework to enable long-term planning for sequencing manipulation primitives (e.g. pushing and grasping) with self-supervised deep reinforcement learning. This results in a system that learns to execute complex manipulation strategies that can generalize to new objects, and requires orders of magnitude less data than prior work. Lastly, I will talk about how we can augment this formulation by inferring additional control parameters -- in particular, by learning to predict residuals on top of parameter estimates from physics-based control. This enables the creation of TossingBot, a robot arm that is capable of grasping arbitrary objects and throwing them to target locations outside its maximum reach range. TossingBot throws more accurately than the average Google engineer, learns within a few hours via self-supervision, generalizes to new objects and locations, and achieves picking speeds twice as fast as state-of-the-art systems at 500+ mean picks per hour. In all settings, our experiments demonstrate that learning affordances for motion primitives can enable systems to quickly learn complex manipulation skills that scale to novel scenarios.