|RL in continuous state-action spaces benchmark||sarah mehraban||12/20/11 11:18 AM|
I'm working on a reinforcement learning agent that can learn in continous state spaces and can produce continious-valued actions,
I need a good benchmark to test it.
A very famous one is cart-pole balancing problem, but in this problem using discrete actions leads to better results than the use of continuous ones (this can be explained by two facts; first, this task is well suited for bang-bang actions, and second, the use of only two actions simplifies greatly the learning problem and reduces the learning time.).
Using continous valued-actions can lead to better policy, and I need a benchmark to show this.
I really appreciate if you could intoduce me some.
|Re: [rl-list] RL in continuous state-action spaces benchmark||Alejandro||12/20/11 11:56 AM|
> I need a good benchmark to test it.
The following papers deal with continuous action spaces, and include
Strösslin, T., & Gerstner, W. (2003). Reinforcement learning in
van Hasselt, H., & Wiering, M. a. (2007). Reinforcement Learning in
|Re: [rl-list] RL in continuous state-action spaces benchmark||Marek Grzes||12/20/11 11:31 AM|
Perhaps, the boat problem could be useful for you. It was used, e.g.,
|Re: [rl-list] RL in continuous state-action spaces benchmark||Marc Deisenroth||12/20/11 11:21 AM|
-----BEGIN PGP SIGNED MESSAGE-----
you could try the cart-pole swing-up (plus balancing). Also, if you
-----BEGIN PGP SIGNATURE-----
|Re: [rl-list] RL in continuous state-action spaces benchmark||Gerhard Neumann||12/20/11 12:11 PM|
> I need a good benchmark to test it.You can also find some simple benchmarks in . In general, you can use any control task with continuous actions (also the cart-pole balancing task, even though this is quite boring...) if you just add a squared punishment term for the control action in your reward function. Thus, you want to learn an energy-efficient controller. The bang-bang policy is suboptimal in this case.
 Neumann, G.; Peters, J. (2009). Fitted Q-iteration by Advantage Weighted Regression, Advances in Neural Information Processing Systems 22 (NIPS 2008)
|This message has been hidden because it was flagged for abuse.|
|Re: [rl-list] RL in continuous state-action spaces benchmark||Martin Riedmiller||12/20/11 4:25 PM|
you can find a couple of challenging continuous control benchmarks and
Hafner, Roland and M. Riedmiller. Reinforcement learning in feedback
You will find results on our neural RL approach for continuous actions
We will make the plants shortly also available in our software framework
|Re: RL in continuous state-action spaces benchmark||Ari||12/20/11 6:20 PM|
Also worth mentioning are the domains used in the paper: Binary Action
Search for Learning Continuous-Action Control Policies, by Jason Pazis
and Michail G. Lagoudakis.
As has been mentioned by others here, one approach in that paper is to
utilize a common domain, but to use a different reward structure
(which is what they do with the inverted pendulum).
|Re: [rl-list] RL in continuous state-action spaces benchmark||Lei Wu||12/20/11 7:11 PM|
I'm doing experiments on Rotary Single Inverted Pendulum with Reinforcement learning methods:
These methods are used to learn in continuous spaces. But I have not finished them.
2011/12/21 sarah mehraban <sarah.m...@gmail.com>
--You received this message because you are subscribed to the "Reinforcement Learning Mailing List" group.
|Re: [rl-list] RL in continuous state-action spaces benchmark||Arun Tejasvi Chaganty||12/21/11 4:41 AM|
I'm not sure why this hasn't been mentioned, but there are several
continuous domains at the RL competition as well:
* Acrobot - continuous state space; discrete actions
An implementation using RL-glue, with suitable visualisations, etc.
|Re: [rl-list] RL in continuous state-action spaces benchmark||Hado van Hasselt||12/21/11 5:10 AM|
The double-pole cart pole  is a good benchmark, similar to the
The Helicopter domain from the RL competition [2,3], as mentioned by
I have some (C++) code available for both these domains, if anyone is
|Re: [rl-list] RL in continuous state-action spaces benchmark||José Antonio Martín H.||12/26/11 3:49 AM|
I've been able to learn successfully in the Helicopter problem using the
The learning is not of the same quality as the evolutionary approach
The idea is to use a dedicated learning agent for every actuator and
I have not tested it with your suggestion for modifying the reward
El 21/12/2011 14:10, Hado van Hasselt escribi�:
|Re: [rl-list] RL in continuous state-action spaces benchmark||shimon....@gmail.com||12/27/11 10:01 AM|
We built a Python implementation of the two Helicopter domains used in the RL competitions, as well as an even harder one we designed for our own experiments. The code is available here:
The following article describes the neuroevolutionary methods we applied to these problems:
As policy search methods, they do not rely on the Markov property and thus do not require the alteration to the reward function that Hado describes.