The following papers deal with continuous action spaces, and include
some environments you can try.
Strösslin, T., & Gerstner, W. (2003). Reinforcement learning in
continuous state and action space. Artificial Neural Networks-ICANN.
van Hasselt, H., & Wiering, M. a. (2007). Reinforcement Learning in
Continuous Action Spaces. 2007 IEEE International Symposium on
Approximate Dynamic Programming and Reinforcement Learning, (Adprl),
272-279.
Cheers,
Marek
> --
> You received this message because you are subscribed to the
> "Reinforcement Learning Mailing List" group.
> To post to this group, send email to rl-...@googlegroups.com
> To unsubscribe from this group, send email to
> rl-list-u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/rl-list?hl=en
Hi,
you could try the cart-pole swing-up (plus balancing). Also, if you
set the sampling frequency to something small (0.1 seconds) then,
bang-bang control could run into problems.
Marc
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iQEcBAEBAgAGBQJO8OAwAAoJEIdXhEUMEJT+XEoH/i/5ijdc99fyBB5mjq/FPog4
vEQvG3Z9QluVngTTDmj8EOu/QeC3xO/JC3Pz2h8Ge3mzlZR14Q3aIDR+mIFkYNjE
IMt5TAb+rZ86V1EEIL8/f4rQAHCXzzF2Ts7C5Riz2N1O+LBcEHjT+7Kgo5njIlRK
XqLYDQjNxzl8NCibQ5jiCq3jVwUpN83m4ohFv33M/crsocWGBTZXlyEhcyJWpM34
syNnFzI4aRFbQI8M0gioPvU4QJERtGS3vcq17kEXa7JahDdEbY8ug2BbgYqPXDhx
lWhJke86j8pA2YPCuIDQK486ux2vsCWJ+fI/5CxO6gKJaICwhwjXW9jXJwfZ6EM=
=0Ofy
-----END PGP SIGNATURE-----
You could perhaps also try the HIV infection control problem described in:
http://www.montefiore.ulg.ac.be/~ernst/CDC_2006.pdf
It is a rather difficult benchmark for algorithms learning a state action value function.
All the best,
Damien
Prof. Damien ERNST
University of Liège - Dpt. of Elec. Eng.,
Building B28, Parking P32, B-4020 Liège, BELGIUM.
Email : der...@ulg.ac.be
Homepage : http://www.montefiore.ulg.ac.be/~ernst/
----- Mail original -----
De: "sarah mehraban" <sarah.m...@gmail.com>
À: rl-...@googlegroups.com
Envoyé: Mardi 20 Décembre 2011 20:18:47
Objet: [rl-list] RL in continuous state-action spaces benchmark
Hi,
I'm working on a reinforcement learning agent that can learn in
continous state spaces and can produce continious-valued actions,
I need a good benchmark to test it.
A very famous one is cart-pole balancing problem, but in this problem
using discrete actions leads to better results than the use of
continuous ones (this can be explained by two facts; first, this task is
well suited for bang-bang actions, and second, the use of only two
actions simplifies greatly the learning problem and reduces the learning
time.). Using continous valued-actions can lead to better policy, and I
need a benchmark to show this.
I really appreciate if you could intoduce me some.
Thanks.
-- You received this message because you are subscribed to the
you can find a couple of challenging continuous control benchmarks and
detailed description (underwater vehicle, airplane, magnet levitation,
hvac) in the following paper:
Hafner, Roland and M. Riedmiller. Reinforcement learning in feedback
control. Machine Learning, 27(1):55�74. available online at
http://dx.doi.org/10.1007/s10994-011-5235-x or upon request at
riedm...@informatik.uni-freiburg.de, Springer Netherlands, 2011.
You will find results on our neural RL approach for continuous actions
(NFQCA) and will also find comparison also to classical controller designs.
We will make the plants shortly also available in our software framework
CLSquare (or earlier, upon request).
Best,
Martin Riedmiller
MLL University of Freiburg, Germany
--
* Acrobot - continuous state space; discrete actions
* Helicopter - 12 dimensional continuous state space, 4 dimensional
continuous action space
* Octopus - 82 dimensional continuous state space, 32 dimensional
continuous action space
An implementation using RL-glue, with suitable visualisations, etc.
can be found here: http://2009.rl-competition.org/software.php
Cheers,
--
Arun Tejasvi Chaganty
http://arun.chagantys.org/
The double-pole cart pole [1] is a good benchmark, similar to the
normal cart pole. An advantage is that it has actually been used quite
a lot, so it is easy to compare any result to previous algorithms.
The Helicopter domain from the RL competition [2,3], as mentioned by
Arun Chaganty just now, is also a good benchmark. However, please note
that the reward should be changed! In the competition, a reward
(penalty, actually) was issued when the helicopter crashed. But this
reward depended on the number of time steps that had past. This makes
this reward non-Markovian. This may be a reason why we only saw
evolutionary approaches (which operate on whole episodes) and no
temporal-difference algorithms (or other RL techniques) on this domain
in the competition. This is easily fixed by changing the penalty on a
crash to a fixed amount.
I have some (C++) code available for both these domains, if anyone is
interested. This code can be used in combination with the RL C++ code
(rlcpp) on my homepage [4].
Best,
Hado
[1] http://homepages.cwi.nl/~hasselt/papers/RL_in_Continuous_Spaces/Experiment_on_Double_Pole_C.html
[2] http://2009.rl-competition.org/
[3] http://code.google.com/p/rl-competition/
[4] http://homepages.cwi.nl/~hasselt/code.html
I've been able to learn successfully in the Helicopter problem using the
TD method Ex<a> without changing the reward function.
The learning is not of the same quality as the evolutionary approach
that I used in the RL competition but it works fine to keep the
helicopter safe.
The idea is to use a dedicated learning agent for every actuator and
selecting well the state variables for each agent.
I have not tested it with your suggestion for modifying the reward
function but I will do (thanks for the tip!)
Best,
Jose.
El 21/12/2011 14:10, Hado van Hasselt escribi�:
--
/ .- .-.. .-.. / -.-- --- ..- / -. . . -.. / .. ... / .-.. --- ...- .
Jos� Antonio Mart�n H. Ph.D. E-Mail: jama...@fdi.ucm.es
Computer Science Faculty Phone: (+34) 91 3947650
Complutense University of Madrid Fax: (+34) 91 3947527
C/ Prof. Jos� Garc�a Santesmases,s/n 28040 Madrid, Spain
web: http://www.fdi.ucm.es/profesor/jamartinh/
El orden es el recurso no renovable Order is the truly nonrenewable
m�s importante resource.
.-.. --- ...- . / .. ... / .- .-.. .-.. / .-- . / -. . . -..
http: //staff.science.uva.nl/~whiteson/helicopter.zip.
The following article describes the neuroevolutionary methods we applied to these problems:
http://staff.science.uva.nl/~whiteson/pubs/b2hd-koppejanei11.html
As policy search methods, they do not rely on the Markov property and thus do not require the alteration to the reward function that Hado describes.
Cheers,
Shimon
-------------------------------------------------------------
Shimon Whiteson | Assistant Professor
Intelligent Autonomous Systems Group
Informatics Institute | University of Amsterdam
-------------------------------------------------------------
Science Park 904 | 1098 XH Amsterdam
+31 (0)20.525.8701 | +31 (0)6.3851.0110
http://staff.science.uva.nl/~whiteson
-------------------------------------------------------------