New robots can see into their future
New robots can see into their future
By Brett Israel, Media relations | DECEMBER 4, 2017
UC Berkeley researchers have developed a robotic learning
technology that enables robots to imagine the future of their actions so they
can figure out how to manipulate objects they have never encountered before. In
the future, this technology could help self-driving cars anticipate future
events on the road and produce more intelligent robotic assistants in homes,
but the initial prototype focuses on learning simple manual skills entirely
from autonomous play.
Using this technology, called visual foresight, the
robots can predict what their cameras will see if they perform a particular
sequence of movements. These robotic imaginations are still relatively simple
for now – predictions made only several seconds into the future – but they are
enough for the robot to figure out how to move objects around on a table
without disturbing obstacles. Crucially, the robot can learn to perform these
tasks without any help from humans or prior knowledge about physics, its
environment or what the objects are. That’s because the visual imagination is
learned entirely from scratch from unattended and unsupervised exploration,
where the robot plays with objects on a table. After this play phase, the robot
builds a predictive model of the world, and can use this model to manipulate
new objects that it has not seen before.
“In the same way that we can imagine how our actions will
move the objects in our environment, this method can enable a robot to
visualize how different behaviors will affect the world around it,” said Sergey
Levine, assistant professor in Berkeley’s Department of Electrical Engineering
and Computer Sciences, whose lab developed the technology. “This can enable
intelligent planning of highly flexible skills in complex real-world
situations.”
The research team will perform a demonstration of the
visual foresight technology at the Neural Information Processing Systems
conference in Long Beach, California, on December 5.
At the core of this system is a deep learning technology
based on convolutional recurrent video prediction, or dynamic neural advection
(DNA). DNA-based models predict how pixels in an image will move from one frame
to the next based on the robot’s actions. Recent improvements to this class of
models, as well as greatly improved planning capabilities, have enabled robotic
control based on video prediction to perform increasingly complex tasks, such
as sliding toys around obstacles and repositioning multiple objects.
“In that past, robots have learned skills with a human
supervisor helping and providing feedback. What makes this work exciting is
that the robots can learn a range of visual object manipulation skills entirely
on their own,” said Chelsea Finn, a doctoral student in Levine’s lab and
inventor of the original DNA model.
With the new technology, a robot pushes objects on a
table, then uses the learned prediction model to choose motions that will move
an object to a desired location. Robots use the learned model from raw camera
observations to teach themselves how to avoid obstacles and push objects around
obstructions.
“Humans learn object manipulation skills without any
teacher through millions of interactions with a variety of objects during their
lifetime. We have shown that it possible to build a robotic system that also
leverages large amounts of autonomously collected data to learn widely
applicable manipulation skills, specifically object pushing skills,” said
Frederik Ebert, a graduate student in Levine’s lab who worked on the project.
Since control through video prediction relies only on
observations that can be collected autonomously by the robot, such as through
camera images, the resulting method is general and broadly applicable. In contrast
to conventional computer vision methods, which require humans to manually label
thousands or even millions of images, building video prediction models only
requires unannotated video, which can be collected by the robot entirely
autonomously. Indeed, video prediction models have also been applied to
datasets that represent everything from human activities to driving, with
compelling results.
“Children can learn about their world by playing with
toys, moving them around, grasping, and so forth. Our aim with this research is
to enable a robot to do the same: to learn about how the world works through
autonomous interaction,” Levine said. “The capabilities of this robot are still
limited, but its skills are learned entirely automatically, and allow it to
predict complex physical interactions with objects that it has never seen
before by building on previously observed patterns of interaction.”
The Berkeley scientists are continuing to research
control through video prediction, focusing on further improving video
prediction and prediction-based control, as well as developing more
sophisticated methods by which robots can collected more focused video data,
for complex tasks such as picking and placing objects and manipulating soft and
deformable objects such as cloth or rope, and assembly.
Comments
Post a Comment