I read this a few times. As it turns out every really smart idea is “obvious” AFTER someone else thinks of it. Why is this? Likey fire and the wheel was like that too.
In a nutshell, what I think they are saying is that if “K” is the "kinematic space” and “D” is the "dynamic space”, K+D is very much smaller than K*D. So much smaller that K+D can be trained quickly on consumer-level GPUs.
Typically what people have done is put the full physics based simulation on the computer and then used RL to train it. That takes hundreds of hours but works. During training the robots fall down a lot but after billions of cycles, they get better. What Disney has done here is skipped the physics. Just ignore things like momentum and gravity and train movement. Then turn on the gravity, balance and physics and RL train on “velocity”.
I think what also makes it easier is that the trained motions are imitative, they are created from a human motion. So the simulated robot does not need to spend hours and days to discover that “head-up-feet-down” is the best way to stand.
So they seem to have reduced the cost to train from huge to manageable. Next, let’s do this for the mechanics.