Summary
Focus: Learning Robot Locomotion from Simulation
Traditional robots: fixed, restricted in factories
New vision: robots interacting with people in everyday life
Challenge: human environment is very unstructured and dynamic
DeepMind Approach: use deep reinforcement learning to control robots
Previously very successful at playing games in virtual environments (Go, Starcraft)
Locomotion:
Enables us to explore the Earth
Many applications: healthcare, delivery, etc.
Simulation:
We have available many technologies to model locomotion of realistic bodies in simulation (e.g. from computer graphics community)
Simulation
Much faster and more scalable than real-world experiments
Safe
But not fully accurate (performance in simulation does not fully translate to real-world performance)
This is the Sim-To-Real problem
Approaches
Fast simulation in GPUs
Imitation learning from animals
Simulation from small amounts of video
Simulation for robots
Physics: robot body, collision detection, contact solver, numerical integrator
Sensor simulation
Actuator simulation
Robot control API
Scene creation and management: robots, objects, humans
Robot training method
Trained with Deep Reinforcement Learning (PPO) <-> Physics Simulation (PyBullet)
Deployed on real robot
Reinforcement learning:
Agent (algorithm that makes decisions)
Environment (robot body, world, etc.)
Agent observes environment, suggests action that affects environment, gets reward immediately or later in time, adjusts recommendations to maximize reward
Formulation:
Observations: joint angles, roll, pitch
Actions: desired motor angles
Reward: maximize movement towards a given goal, minimize energy expenditure
Early termination: robot falls
Policy: 2 layer neural network
Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning, CoRL 2021
Improving robot performance on reality despite Sim-To-Real gap
Challenge: policy trained on simulation doesn’t directly produce good results in reality
Unmodeled dynamics (e.g. robot body softer than expected)
Wrong simulation parameters (e.g. incomplete CAD file)
Inaccurate contact models
Communication latency
Actuator dynamics
Stochastic real environment (e.g. roughness of floor/carpet)
Numerical accuracy
Overcoming the gap
Analyze each component of the system to identify the best parameters
Actuator model is the most critical gap
Traditional: analytical models
Neural network models of actuators
Domain Randomization
Sample physical parameters from some distribution
Train robot in simulation across all those parameters
Learned policy is much more robust to different realities, even if it doesn’t have a way to measure which world/environment it is living in
Actions are more conservative/robust
Responses to events are more diverse
Robot’s peak performance is worse
But much more consistent across different scenarios
Reduce Sim-to-Real gap: Automatic System Identification
Technique:
Physics simulation based on initial physical parameters (not carefully tuned, so not that accurate)
Controller learning:
Parametrized using spline control nodes to ensure smooth transition across motions
Covariance Matrix Adaptation-Evolution Strategy (CAMES): gradient-free optimizer
Trained policy doesn’t work in reality, so the loop is closed to adjust the physics model to improve its accuracy (Automatic System Identification)
Using key parameters as indicators of sim-to-real discrepancy
PD gains
Center of mass
CAMES:
Given a set of parameter samples
Fit a gaussian distribution
Get quality of all the members, remove the low-performing ones
Fit another gaussian distribution on survivors
Generate more samples from new gaussian distribution
Repeat until sample population reaches high quality
Key lessons
Converge in 2 iterations, 12s of robot data
Can overfit physical parameters, which will be unphysical, will not transfer to other tasks
Need to select a subset of physical parameters
Learning by Imitating Animals
Approach:
Take motion capture data of animals
Translate motion to robot bodies
Adjust policy training to add new optimization target: minimize distance between robot and reference motion
Robot learns how to walk like the animal while successfully walking given its different body
Bridging the Sim-To-Real gap via Domain Adaptation
Randomly sample physical parameters
Take all physical parameters (>100 dim space) and map them to a low-dimensional space (10-20 dim) using an auto-encoder
Put encoded description of physical parameters as input to robot’s policy
Real-world runs
May not know the physical parameters
Use optimization to find the best choice of parameters that maximize reward
Follow-on work: retune dynamically so the policy is responsive to rapid changes to physics (e.g. wind)
Open Sim-To-Real problems
Complex dynamics (soft objects, fluids)
Realistic rendering (e.g. capturing what visual sensors will actually see)
Scalable creation of diverse scenes (e.g. furniture, nature, messy room)
Modeling humans and human behaviors/reactions to robot actions