Hi Law,
Off-policy learning is a very cool idea.
When an agent interacts with the environment, the experience it gathers
depends on its behavior policy. For instance, if a robot goes towards a
ramp, it is likely that the sensor reading given by an accelerometer
will be high, at least, more likely than if the robot moves towards a
flat area or does not move at all.
There are cases in which some behavior policies are dangerous. For
instance, if the robot moves very close to the stairs, it may fall from
a high distance and break after impacting the floor.
As another example, imagine a factory manager who wants to optimize the
manufacturing process. He could change some parameters that control the
machines, however, if he mistakes, then the production line could
collapse (and loose a lot of money!). The manager could think about
learning the optimal parameters using RL but he wants to be conservative
when trying the parameters, but then the question is: "can the manager
learn about policies less conservative than its actual behavior policy?"
Another example of off-policy learning comes up naturally with the
general-value-functions. Imagine that you want to learn a predictive
representation of the environment. In other words, you are not
interested in a classic grammar of concepts but rather you want to be
able to predict the response of the environment to your actions. For
instance, you may want to predict that when moving north, the
accelerometer-reading may rise very fast (note that you really do not
care about whether there is a change in the slope of the terrain named
ramp, but rather that your reading will shoot).
In this case, off-policy learning becomes very relevant. Note that
although you behave with one policy, you may want to predict what would
have happened if you have followed another policy, or one thousand
different policies. This way, you could learn a lot from the environment
with just a single stream of data! Even if you usually move west, you
could predict something like "what will be the accelerometer reading if
I move north, east or south?"
Hope this helps!
Best,
Sergio
> --
> --
> You received this message because you are subscribed to the
> "Reinforcement Learning Mailing List" group.
> To post to this group, send email to
rl-...@googlegroups.com
> To unsubscribe from this group, send email to
>
rl-list-u...@googlegroups.com
> For more options, visit this group at
>
http://groups.google.com/group/rl-list?hl=en
> ---
> You received this message because you are subscribed to the Google
> Groups "Reinforcement Learning Mailing List" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to
rl-list+u...@googlegroups.com.
> For more options, visit
https://groups.google.com/groups/opt_out.
--
You are the master of the moments of your life.
--Paramahansa Yogananda
Make your day awesome:
http://www.youtube.com/embed/-aEhSgyWZe0