GQ(lambda) and Greedy-GQ pseudo code

Abeyruwan, Saminda Wishwajith

unread,

Feb 1, 2011, 12:35:50 AM2/1/11

to rl-...@googlegroups.com

Hi All,

I am currently engaged in improving a set of behaviors for 3D simulated autonomous robots (RoboCup environment (RoboCanes team)). My objective is to learn a goal keeper behavior using RL. I have already used Sarsa(lambda) with basis functions and the agents show a remarkable improvement. I would like to test GQ(lambda) and Greedy-GQ for the goal keeper behavior too. I really appreciate , if someone could point me to a location to find the pseudo code for these two algorithms. Any help regarding implementation points are greatly appreciated.

Thank you!

Sam

Saminda Abeyruwan
PhD Student
Dept. of Computer Science,
University of Miami

s.abe...@umiami.edu
(305) 457 9753
http://blog.saminda.org/
________________________________________

Tom Schaul

unread,

Feb 1, 2011, 7:24:56 AM2/1/11

to rl-list

Hi Sam,

Adam White provided that information to this mailing list already, see here:
http://groups.google.com/group/rl-list/browse_thread/thread/c42ec212f58b5237/e9d7627366121dbd?hl=en&lnk=gst&q=gq#e9d7627366121dbd

Also, there is a Python implementation of GQ(lambda) available in our
PyBrain library:
https://github.com/pybrain/pybrain/blob/master/pybrain/rl/learners/valuebased/linearfa.py#L213

Good luck with RoboCup!

Tom

--
http://www.idsia.ch/~tom/

> --
> You received this message because you are subscribed to the "Reinforcement Learning Mailing List" group.
> To post to this group, send email to rl-...@googlegroups.com
> To unsubscribe from this group, send email to
> rl-list-u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/rl-list?hl=en

Hamid Reza Maei

unread,

Feb 1, 2011, 1:06:43 PM2/1/11

to rl-...@googlegroups.com

Hi Sam and Tom,

As Tom mentioned there are several pseudo codes available for GQ(lambda). The GQ(lambda) algorithm is designed for prediction problems but it can also be used for control problems as well. For example, with greedy target policy \pi (\pi(a*|s)=1 for a* being optimal action with highest action value, and \pi(a|s)=0 for other actions), GQ(lambda) with lambda=0, would give us the form of Greedy-GQ, which is guaranteed to be stable (provably with a fixed or slowly changing behavior policy)--unlike Q-learning.

Thus, you can use GQ(lambda) code for implementing Greedy-GQ as well. If you have any further questions about the algorithm, I would be happy to help.

Best,
Hamid

Reply all

Reply to author

Forward