Cart pole regulator with neural network apporimation

Hiteshi Sharma

unread,

Oct 6, 2016, 5:04:54 PM10/6/16

to rl-...@googlegroups.com

Hello,

Has any simulated the benchmark problem of cart-pole balancing with function approximation through

neural nets? I was trying to implement that with reference to the paper "Neural Fitted Q Iteration- First

Experiences with a Data Efficient Neural Reinforcement Learning Method" by Martin Riedmiller.

I am confused about the "criterion" for successful policy here. In convention, a policy for cart-pole problem

is successful if the pole is balanced for >= 100000 steps but in this paper, a controller is successful, if at the

end of the episode, the pole is still upright and the cart is at its target position 0 within ±0.05m tolerance.

--

Regards,
Hiteshi Sharma.

Martin Riedmiller

unread,

Oct 7, 2016, 2:23:47 AM10/7/16

to rl-...@googlegroups.com

Hi,

the criterion that is used is more ambitious and reflects a bit more what should be generally
achieved by a well performing controller: to
control the dynamic system to a given setpoint and keep the values there within a certain tolerance
band. This is more closely related to what one would expect from a 'classical' controller.

Regards,
Martin

Martin Riedmiller,
Google DeepMind

--
You received this message because you are subscribed to the "Reinforcement Learning Mailing List" group.
To post to this group, send email to rl-...@googlegroups.com
To unsubscribe from this group, send email to
rl-list-u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/rl-list?hl=en
---
You received this message because you are subscribed to the Google Groups "Reinforcement Learning Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rl-list+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Chuck Anderson

unread,

Oct 7, 2016, 8:57:08 AM10/7/16

to Reinforcement Learning Mailing List

The cart-pole problem is often used to illustrate how the optimization of a simple performance measure based on a single variable, such as the pole angle, can result in a policy that optimizes other variables, too, such as cart position and cart and pole velocities. To evaluate a learned policy, the cart-pole simulation should be started in multiple states across the state space and the state monitored for some number of steps as it is controlled by the learned policy. This reveals cases when a learned policy maintains balance, but performs poorly if the pole every falls. Examples of both of these points appear in our recent paper:

Faster Reinforcement Learning After Pretraining Deep Networks to Predict State Dynamics. C. Anderson, M. Lee, and D. Elliott. Proceedings of the 2015 International Joint Conference on Neural Networks, Killarney, Ireland, 2015.

Winner of Best Overall Paper Award. DOI: 10.1109/IJCNN.2015.7280824 PDF of paper