In a twist on artificial intelligence (AI), computer scientists have
programmed machines to be curious—to explore their surroundings on
their own and learn for the sake of learning. The new approach could
allow robots to learn even faster than they can now. Someday they
might even surpass human scientists in forming hypotheses and pushing
the frontiers of what’s known.
“Developing curiosity is a problem that’s core to intelligence,” says
George Konidaris, a computer scientist who runs the Intelligent Robot
Lab at Brown University and was not involved in the research. “It’s
going to be most useful when you’re not sure what your robot is going
to have to do in the future.”
Over the years, scientists have worked on algorithms for curiosity,
but copying human inquisitiveness has been tricky. For example,
most methods aren’t capable of assessing artificial agents’ gaps in
knowledge to predict what will be interesting before they see it.
(Humans can sometimes judge how interesting a book will be by its cover.)
Todd Hester, a computer scientist currently at Google DeepMind in
London hoped to do better. “I was looking for ways to make computers
learn more intelligently, and explore as a human would,” he says.
“Don’t explore everything, and don’t explore randomly, but try to
do something a little smarter.”
So Hester and Peter Stone, a computer scientist at the University of
Texas in Austin, developed a new algorithm, Targeted Exploration with
Variance-And-Novelty-Intrinsic-Rewards (TEXPLORE-VENIR), that relies
on a technique called reinforcement learning. In reinforcement
learning, a program tries something, and if the move brings it closer
to some ultimate goal, such as the end of a maze, it receives a small
reward and is more likely to try the maneuver again in the future.
DeepMind has used reinforcement learning to allow programs to master
Atari games and the board game Go through random experimentation. But
TEXPLORE-VENIR, like other curiosity algorithms, also sets an internal
goal for which the program rewards itself for comprehending something
new, even if the knowledge doesn’t get it closer to the ultimate goal.
As TEXPLORE-VENIR learns and builds a model of the world, it rewards
itself for discovering information that’s unlike what’s seen before—for
example, finding distant spots on a map or, in culinary application,
exotic recipes. It also rewards itself for reducing uncertainty—for
becoming familiar with those places and recipes. “They’re fundamentally
different types of learning and exploration,” Konidaris says.
“Balancing them is really important. And I like that this paper did
both of those.”
http://www.sciencemag.org/news/2017/05/scientists-imbue-robots-curiosity