Reinforcing the value of simulation: Teaching dexterity to a real robot hand
Nvidia researchers show how training in simulation enables the transfer of complex manipulation skills to a robot hand with project DeXtreme
The human hand is one of the most remarkable outcomes of millions of years of evolution. The ability to pick up all sorts of objects and use them as tools is a crucial differentiator allowing us to shape the world around us.
For robots to work in the everyday human world, the ability to deftly interact with our tools and the environment around them is critical. Without that capability, they will continue to be useful only in specialized domains such as factories or warehouses.
While it has been possible to teach robots with legs how to walk for some time, robots with hands have generally proven to be much trickier to control. A hand with fingers has more joints, and they must move in specific coordinated ways to accomplish a given task.
Traditional robotics control methods with precisely pre-programmed grasps and motions are incapable of the kind of generalized fine motor control skills that humans take for granted.
One approach to these problems has been the application of Deep Reinforcement Learning (RL) techniques that train a neural network to control the robot’s joints. With deep RL, a robot learns from trial and error and is rewarded for the successful completion of the assigned task.
Unfortunately, this technique can require millions or even billions of samples to learn from, making it almost impossible to apply directly to real robots.
Applying Simulation
Enter Nvidia’s Isaac robotics simulator, which enables robots to be trained inside a simulated universe that can run more than 10,000 times faster than the real world, and yet obeys the laws of physics.
Using Isaac Gym, an RL training robotics simulator, Nvidia researchers on the DeXtreme project taught this robot hand how to manipulate a cube to match a provided target position and orientation or pose. The neural network brain learned to do this entirely in simulation before being transplanted to control a robot in the real world.
Similar work has only been shown once before by researchers at OpenAI. Still, their work required a far more sophisticated and expensive robot hand, a cube tricked out with precise motion control sensors, and, last but not least, used a supercomputing cluster of hundreds of computers to train.
Democratizing Dexterity
The hardware used by the DeXtreme project was chosen to be as simple and inexpensive as possible to enable researchers worldwide to replicate our experiments. The robot itself is an Allegro Hand, which costs as little as 1/10th the cost of some alternatives, has four fingers instead of five, and has no moving wrist.
We can use off-the-shelf RGB cameras to track the cube with vision, which can be repositioned easily as needed without requiring special hardware, and the cube is 3D-printed with stickers affixed to each face.
DeXtreme is trained using Isaac Gym, which provides an end-to-end GPU-accelerated simulation environment for reinforcement learning. Nvidia PhysX simulates the world on the GPU, and results stay in GPU memory during the training of the deep learning control policy network.
As a result, training can happen on a single Omniverse OVX server. Training a good policy takes about 32 hours on this system, equivalent to 42 years of a single robot’s experience in the real world.
Not needing a separate CPU cluster for simulation means a 10-200x reduction in computing costs for training at current cloud rental rates. Because we can leverage Isaac Gym to train the model, training time and cost can be dramatically reduced.
Perception and Synthetic Data
In order for the robot to know the current position and orientation of the cube it’s holding, a perception system is needed. To keep costs low and leave open the potential for manipulation of other objects in the future, DeXtreme uses three off-the-shelf cameras and another neural network that can interpret the cube pose.
This network is trained using about 5 million frames of Synthetic Data generated using Omniverse Replicator and no real images whatsoever. The network learns how to perform the task under challenging circumstances in the real world.
To make the training more robust, we use a technique called domain randomization to change lighting and camera positions, plus data augmentation to apply random crops, rotation, and backgrounds.
The DeXtreme pose estimation system is very reliable and can perceive accurate poses even when the object in question is partly occluded from view, or the image has significant motion blur.
Real Robots are Still Challenging
One of the key reasons to use simulations is that training robots directly in the real world are riddled with various challenges. For example, robot hardware is prone to breaking after excessive usage, and experiments’ iteration cycles and turnaround time can be slow.
During our experiments, we often found ourselves repairing the hand after prolonged usage, for example, tightening the loose screws, replacing the ribbon cables, and resting the hand to cool down after running 10-15 trials.
Simulations allow us to sidestep many of these issues by training on a robot that doesn’t wear out and also provides the large diversity of data needed to learn challenging tasks. At the same time, because simulations can run much faster than in real time, the iteration cycle is massively improved.
When training in simulation, the most significant challenges are bridging the gaps between the simulations and the real world. To address this, DeXtreme uses domain randomization of the physics properties set in the simulator – changing object masses, friction levels, and other attributes at scale across over a hundred thousand simulated environments at once.
One interesting upshot of these randomizations is that we train the AI with all kinds of unusual combinations of scenarios, which translates to robustness when performing the task in the real world.
For instance, most of our experiments on the real robot took place with a slightly malfunctioning thumb due to a loose connection on the circuit board. Still, we were positively surprised that the policies transferred from simulation to the real world reliably regardless.
Sim-to-Real
Future breakthroughs in robotic manipulation will enable a new wave of robotics applications beyond traditional industrial uses. At the heart of the DeXtreme project is the message that simulation can be an incredibly effective tool for training complex robotic systems, even ones that need to handle environments with objects in continual contact with the robot.
We hope that by demonstrating this using relatively low-cost hardware, we can inspire others to use our simulation tools and build on this work.
For further details on the DeXtreme project, check out the paper and visit the project webpage.
For a further dive into simulators and how they can impact robotics projects, download the latest version of Omniverse Isaac Sim, read this blog that covers the topic, and learn about training your own reinforcement learning policies!
About the Authors
- Gavriel State is a Senior Director for Simulation and AI at Nvidia, based in Toronto, where he leads efforts involving applications of AI technology to simulation systems and vice versa. Previously, Gavriel founded TransGaming Inc, and spent 15 years focused on games and rendering technologies.
- Ankur Handa is a Research Scientist at Nvidia Seattle Robotics group led by Dieter Fox. Prior to that, he was a Research Scientist at OpenAI, and before that he was a Dyson Fellow at Imperial College London. He finished his PhD with Prof Andrew Davison at Imperial College London and did a two-year post-doc with Prof. Roberto Cipolla at the University of Cambridge. His papers have won Best Industry Paper Award at BMVC, in 2014 and have been the Best Manipulation Paper Award Finalist and Best Student Paper Award Finalist at ICRA 2019.