The cart-pole problem is often used to illustrate how the optimization of a simple performance measure based on a single variable, such as the pole angle, can result in a policy that optimizes other variables, too, such as cart position and cart and pole velocities. To evaluate a learned policy, the cart-pole simulation should be started in multiple states across the state space and the state monitored for some number of steps as it is controlled by the learned policy. This reveals cases when a learned policy maintains balance, but performs poorly if the pole every falls. Examples of both of these points appear in our recent paper:
Faster Reinforcement Learning After Pretraining Deep Networks to Predict State Dynamics. C. Anderson, M. Lee, and D. Elliott.
Proceedings of the 2015 International Joint Conference on Neural Networks, Killarney, Ireland, 2015.
![[DOI]]()
Winner of Best Overall Paper Award. DOI:
10.1109/IJCNN.2015.7280824 PDF of paper
Chuck Anderson