question about " the off-policy reinforcement learning"

47 views
Skip to first unread message

MRS FENG

unread,
Nov 9, 2009, 7:30:07 AM11/9/09
to Reinforcement Learning Mailing List
Hello everyone!

Recently I read the paper"Adaptive importance sampling for value
function approximation in off-policy reinforcement learning".I want
to study the policy gradient reinforcement learning.I have some
questions .
1. how to choose the initial target policy and sampling policy
2. We can get many episodes samples from a policy,how to set an
initial state in different epsiods.
3. If I want to simulate with the Mountain-car problem,whether the
continous state space is transformed into discretization.In the policy
gradient reinforcement learning,we should use what type of policy.
who have the matlab code about Mountain-car simulation ? If you
have ,can you send me?
So someone here can give me a hint? Many thanks!

Best regards,
MRS FENG

Itauma Isong Itauma

unread,
Nov 14, 2009, 1:24:02 AM11/14/09
to rl-...@googlegroups.com
Hello Mrs Feng,

Though I am a novice when it comes to RL, I wish to answer your questions base on my understanding. 1) The initial target policy or policy matrix could be a set of random values chosen intuitively which you would improve (optimize) after several training episode based on your domain (environment). 2) So you could use a random function in matlab that enables you to randomly place your agent on different start position while training to improve your policy or action matrix.  3) I have seen several links relating to Implementation of Mountain car problem using SARSA http://www.dia.fi.upm.es/~jamartin/download.htm
I hope this helps until someone else explains better.
One other thing don't forget that the agent needs some reward that guides it to the goal. So if you are using Q-learning, you would have an initial reward matrix that will help you to optimize your action matrix.
you code check this link for several codes on matlab base on Richard S. Sutton book on RL http://waxworksmath.com/Authors/N_Z/Sutton/sutton.html

Best wishes :-)
--
It is best to Light a Candle than being the cause of Darkness
Reply all
Reply to author
Forward
0 new messages