During a December demo of Tesla's Optimus robot, industry insiders noticed there was likely someone behind the scenes controlling things.
Known as teleoperation, it's a common practice in the world of robotics.
The fact Elon had to resort to it, however, shows just how far behind AI-powered robots really are.
And a large part of why is because the way AI companies train LLMs doesn't carry over to training robots.
Why AI and Robots Are Different
LLMs are trained on a giant pile of human-generated text.
Books.
Code.
Reddit arguments that should’ve stayed in drafts.
From there, machine learning specialists train the AI to identify patterns in the data, allowing the LLM to become extremely good at predicting what 'should' come next in a string of text.
Instead, for a robot to provide value beyond repeating a highly specific movement, it has to legitimately understand the world around it.
Today's AIs, however - including everything from ChatGPT to Nano Banana - have zero understanding of the physical reality around them or the person using them.
Instead, they generate text, images, and videos based on prompts. Nothing more, nothing less.
And that's the problem.
Why Robot Demos Are Scams
A lot of robotics today is still some flavor of:
scripted motions
carefully staged environments
narrow task training
massive human babysitting behind the scenes (aka Teleoperation)
These demos generate impressive videos for social media, but only to the untrained eye.
Because anyone who understands what's going on knows the only reason the demo works is because it was rehearsed dozens/hundreds of times using:
the same lighting
the same objects
the same table
the same room
the same “please don’t breathe near it” constraints
But if you move an object by two inches, or put the robot on an uneven floor, the cracks start to show.
Which is why 2026 will not be the year of the robot.
In fact, it's possible the 2020s won't even be the DECADE of the robot.
Not because the motors aren’t strong.
But because they lack real-world understanding.
Embodied Learning
Embodied learning (aka embodied AI) refers to the idea of having an intelligent system (agent, AI, whatever) learn by interacting with the world around it.
Touching it.
Messing up in it.
Building an internal model of “if I do X, Y happens.”
Think about it.
A toddler doesn’t read 10 million pages about walking.
Instead, they wobble, fall, and try again (and again and again) until they finally learn.
That is the essence of real-world learning.
And at the core of that biological process is data collection.
The baby's brain is forming new neural pathways that remember what happens when they move their leg a fraction of an inch the right way versus the wrong way.
What happens when they tense their core versus not tensing it.
Intelligence + Analysis
It's the perfect combination of intelligence (via the human brain) and learning via interactive data collection.
Which is precisely what's holding today's robots back.
It's not that they can't perform the movements required to be useful in the real-world (robotics companies made massive progress on fine motor movements in 2025).
The problem is that their understanding of the world around them is no better than a real-life toddler's.
While toddlers are cute little angels, they're completely worthless in terms of being productive members of society.
In fact, even the robots currently being deployed in Amazon's warehouses do not have true intelligence.
Instead, they're highly trained to scan barcodes and repeat a very finite number of repetitive motions (most of which involve picking up and setting down boxes).
The Brutal Reality
A good analogy is that of self-driving cars.
For Elon to accumulate the volume of data needed to get self-driving cars off the ground, his team had to monitor Tesla sensor inputs observed over billions of miles.
The process took years and years, and required entire server rooms full of hard drives just to store the data.
From there, machine learning specialists spent years teaching their AI how to interpret all that data.
And once that was done, it took another couple years for the cars to be reliable enough to get government approval for pilot testing (their current status).
Point being, it's taken well over a decade for Elon - the world's tech genius - to get self-driving cars even remotely close to being ready for widespread usage.
And the same can be said for robots in 2026.
Progress is Painfully Slow
While companies like Boston Dynamics and Unitree are making progress, their progress pales in comparison to the insanely rapid improvements we've seen in AI from 2023 to today.
Boston Dynamics Atlas Robot
Not because the money isn't there.
But because the process of training robots to understand the world around them is exponentially harder than that of training text LLMs.
Better training = Fewer errors.
LLMs went mainstream because they're right often enough to be useful, with very few real-world consequences for when they get something wrong.
Robots Can't Screw Up Like LLMs Can
Robots, however, don't benefit from that same level of forgiveness.
Wrong in robotics means:
broken objects
broken ankles
lawsuits
headlines
Video of robot going rogue and attacking its engineers
The solution is training based on real-world interaction.
It's something Yann LeCunn, considered by many to be one of the "Godfathers of AI," understands very well (as is explained in this video).
In fact, as today's LLMs seem to be plateauing in their capabilities, many in the industry predict we'll need to graduate beyond text-based machine learning - and adopt something like embodied learning - if we ever want to achieve AGI.
It's a sentiment I personally agree with.
And until that type of learning becomes more affordable - and can be done at scale - we're not going to see a "GPT-3 to GPT 4o" level jump in robot performance anytime soon.
Catch you next time, Chris Laub Head of Product, Sentient AI
*Interested in staying on top of the industry's cutting-edge developments?