RL in continuous state-action spaces benchmark

Showing 1-13 of 13 messages
RL in continuous state-action spaces benchmark Sarah 12/20/11 11:18 AM
Hi,
I'm working on a reinforcement learning agent that can learn in continous state spaces and can produce continious-valued actions,
I need a good benchmark to test it.
A very famous one is cart-pole balancing problem, but in this problem using discrete actions leads to better results than the use of continuous ones (this can be explained by two facts; first, this task is well suited for bang-bang actions, and second, the use of only two actions simplifies greatly the learning problem and reduces the learning time.).
Using continous valued-actions can lead to better policy, and I need a benchmark to show this.
I really appreciate if you could intoduce me some.
 
Thanks.
Re: [rl-list] RL in continuous state-action spaces benchmark Alejandro 12/20/11 11:56 AM
> I need a good benchmark to test it.

The following papers deal with continuous action spaces, and include
some environments you can try.

Strösslin, T., & Gerstner, W. (2003). Reinforcement learning in
continuous state and action space. Artificial Neural Networks-ICANN.

van Hasselt, H., & Wiering, M. a. (2007). Reinforcement Learning in
Continuous Action Spaces. 2007 IEEE International Symposium on
Approximate Dynamic Programming and Reinforcement Learning, (Adprl),
272-279.

Re: [rl-list] RL in continuous state-action spaces benchmark Marek Grzes 12/20/11 11:31 AM
Perhaps, the boat problem could be useful for you. It was used, e.g.,
here: http://books.nips.cc/papers/files/nips20/NIPS2007_0959.pdf

Cheers,
Marek

> --
> You received this message because you are subscribed to the
> "Reinforcement Learning Mailing List" group.
> To post to this group, send email to rl-...@googlegroups.com
> To unsubscribe from this group, send email to
> rl-list-u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/rl-list?hl=en

Re: [rl-list] RL in continuous state-action spaces benchmark Marc Deisenroth 12/20/11 11:21 AM
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

you could try the cart-pole swing-up  (plus balancing). Also, if you
set the sampling frequency to something small (0.1 seconds) then,
bang-bang control could run into problems.

Marc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJO8OAwAAoJEIdXhEUMEJT+XEoH/i/5ijdc99fyBB5mjq/FPog4
vEQvG3Z9QluVngTTDmj8EOu/QeC3xO/JC3Pz2h8Ge3mzlZR14Q3aIDR+mIFkYNjE
IMt5TAb+rZ86V1EEIL8/f4rQAHCXzzF2Ts7C5Riz2N1O+LBcEHjT+7Kgo5njIlRK
XqLYDQjNxzl8NCibQ5jiCq3jVwUpN83m4ohFv33M/crsocWGBTZXlyEhcyJWpM34
syNnFzI4aRFbQI8M0gioPvU4QJERtGS3vcq17kEXa7JahDdEbY8ug2BbgYqPXDhx
lWhJke86j8pA2YPCuIDQK486ux2vsCWJ+fI/5CxO6gKJaICwhwjXW9jXJwfZ6EM=
=0Ofy
-----END PGP SIGNATURE-----

Re: [rl-list] RL in continuous state-action spaces benchmark Gerhard Neumann 12/20/11 12:11 PM
> I need a good benchmark to test it.

You can also find some simple benchmarks in [1]. In general, you can use any control task with continuous actions (also the cart-pole balancing task, even though this is quite boring...) if you just add a squared punishment term for the control action in your reward function. Thus, you want to learn an energy-efficient controller. The bang-bang policy is suboptimal in this case.

best...

Gerhard

[1] Neumann, G.; Peters, J. (2009). Fitted Q-iteration by Advantage Weighted Regression, Advances in Neural Information Processing Systems 22 (NIPS 2008)
This message has been hidden because it was flagged for abuse.
Re: [rl-list] RL in continuous state-action spaces benchmark Martin Riedmiller 12/20/11 4:25 PM
Hi,

you can find a couple of challenging continuous control benchmarks and
detailed description (underwater vehicle, airplane, magnet levitation,
hvac) in the following paper:

Hafner, Roland and M. Riedmiller. Reinforcement learning in feedback
control. Machine Learning, 27(1):55�74. available online at
http://dx.doi.org/10.1007/s10994-011-5235-x or upon request at
riedm...@informatik.uni-freiburg.de, Springer Netherlands, 2011.

You will find results on our neural RL approach for continuous actions
(NFQCA) and will also find comparison also to classical controller designs.

We will make the plants shortly also available in our software framework
CLSquare (or earlier, upon request).

Best,

Martin Riedmiller
MLL University of Freiburg, Germany

Re: RL in continuous state-action spaces benchmark Ari 12/20/11 6:20 PM
Also worth mentioning are the domains used in the paper: Binary Action
Search for Learning Continuous-Action Control Policies, by Jason Pazis
and Michail G. Lagoudakis.

As has been mentioned by others here, one approach in that paper is to
utilize a common domain, but to use a different reward structure
(which is what they do with the inverted pendulum).

 - Ari
Re: [rl-list] RL in continuous state-action spaces benchmark Lei Wu 12/20/11 7:11 PM
I'm doing experiments on  Rotary Single Inverted Pendulum with Reinforcement learning methods:
cacla
wire fitting
Ex<a>

These methods are used to learn in continuous spaces. But I have not finished them.
2011/12/21 sarah mehraban <sarah.m...@gmail.com>
--
You received this message because you are subscribed to the "Reinforcement Learning Mailing List" group.
To post to this group, send email to rl-...@googlegroups.com
To unsubscribe from this group, send email to
rl-list-u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/rl-list?hl=en

Re: [rl-list] RL in continuous state-action spaces benchmark Arun Tejasvi Chaganty 12/21/11 4:41 AM
Hello,
  I'm not sure why this hasn't been mentioned, but there are several
continuous domains at the RL competition as well:

  * Acrobot - continuous state space; discrete actions
  * Helicopter - 12 dimensional continuous state space, 4 dimensional
continuous action space
  * Octopus - 82 dimensional continuous state space, 32 dimensional
continuous action space

An implementation using RL-glue, with suitable visualisations, etc.
can be found here: http://2009.rl-competition.org/software.php

Cheers,
--
Arun Tejasvi Chaganty
http://arun.chagantys.org/

Re: [rl-list] RL in continuous state-action spaces benchmark Hado van Hasselt 12/21/11 5:10 AM
Hi,

The double-pole cart pole [1] is a good benchmark, similar to the
normal cart pole. An advantage is that it has actually been used quite
a lot, so it is easy to compare any result to previous algorithms.

The Helicopter domain from the RL competition [2,3], as mentioned by
Arun Chaganty just now, is also a good benchmark. However, please note
that the reward should be changed! In the competition, a reward
(penalty, actually) was issued when the helicopter crashed. But this
reward depended on the number of time steps that had past. This makes
this reward non-Markovian. This may be a reason why we only saw
evolutionary approaches (which operate on whole episodes) and no
temporal-difference algorithms (or other RL techniques) on this domain
in the competition. This is easily fixed by changing the penalty on a
crash to a fixed amount.

I have some (C++) code available for both these domains, if anyone is
interested. This code can be used in combination with the RL C++ code
(rlcpp) on my homepage [4].

Best,
Hado

[1] http://homepages.cwi.nl/~hasselt/papers/RL_in_Continuous_Spaces/Experiment_on_Double_Pole_C.html
[2] http://2009.rl-competition.org/
[3] http://code.google.com/p/rl-competition/
[4] http://homepages.cwi.nl/~hasselt/code.html

> --
> You received this message because you are subscribed to the "Reinforcement Learning Mailing List" group.
> To post to this group, send email to rl-...@googlegroups.com
> To unsubscribe from this group, send email to
> rl-list-u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/rl-list?hl=en

Re: [rl-list] RL in continuous state-action spaces benchmark José Antonio Martín H. 12/26/11 3:49 AM
Hi Hado.

I've been able to learn successfully in the Helicopter problem using the
TD method Ex<a> without changing the reward function.

The learning is not of the same quality as the evolutionary approach
that I used in the RL competition but it works fine to keep the
helicopter safe.

The idea is to use a dedicated learning agent for every actuator and
selecting well the state variables for each agent.

I have not tested it with your suggestion for modifying the reward
function but I will do (thanks for the tip!)

Best,
Jose.

El 21/12/2011 14:10, Hado van Hasselt escribi�:

--
/ .- .-.. .-.. / -.-- --- ..- / -. . . -.. / .. ... / .-.. --- ...- .
Jos� Antonio Mart�n H. Ph.D.           E-Mail: jama...@fdi.ucm.es
Computer Science Faculty               Phone: (+34) 91 3947650
Complutense University of Madrid       Fax: (+34) 91 3947527
C/ Prof. Jos� Garc�a Santesmases,s/n   28040 Madrid, Spain
web: http://www.fdi.ucm.es/profesor/jamartinh/
El orden es el recurso no renovable    Order is the truly nonrenewable
m�s importante                         resource.
.-.. --- ...- . / .. ... / .- .-.. .-.. / .-- . / -. . . -..

Re: [rl-list] RL in continuous state-action spaces benchmark shimon....@gmail.com 12/27/11 10:01 AM
We built a Python implementation of the two Helicopter domains used in the RL competitions, as well as an even harder one we designed for our own experiments.  The code is available here:

http: //staff.science.uva.nl/~whiteson/helicopter.zip.

The following article describes the neuroevolutionary methods we applied to these problems:

http://staff.science.uva.nl/~whiteson/pubs/b2hd-koppejanei11.html

As policy search methods, they do not rely on the Markov property and thus do not require the alteration to the reward function that Hado describes.

Cheers,
Shimon

-------------------------------------------------------------
Shimon Whiteson | Assistant Professor
Intelligent Autonomous Systems Group
Informatics Institute | University of Amsterdam
-------------------------------------------------------------
Science Park 904 | 1098 XH Amsterdam
+31 (0)20.525.8701 | +31 (0)6.3851.0110
http://staff.science.uva.nl/~whiteson
-------------------------------------------------------------