Dear Brian Tanner,
we investigate the Acrobot Benchmark from the RL-Library
(Rev. 1306) and found an unexpected behavior: in the
update-method in class AcrobotState the loop iterates
until the isTerminal condition is met. After the loop,
this condition can be destroyed as the state variables are
scaled to fit the interval (see attached file). The
outputs (after the terminal condition was destroyed) in
the method are:
[...]
count= 4
theta 1 (before - now): -3.358119595066313
-3.141592653589793
theta 2 (before - now): 1.7629735282673185
1.7629735282673185
theta 1 dot (before - now): -1.352284224611665 0.0
theta 2 dot (before - now): 0.8900410793475375
0.8900410793475375
[...]
We now wonder, if this is a desired behavior or a bug.
From our point of view, we would expect the scaling to
happen inside the loop to ensure not to destroy the
terminal condition afterwards. If this is desired, could
you please briefly explain the intention for this.
Best regards,
Michael Baumann
--
Michael Baumann
International Graduate School "Dynamic Intelligent
Systems"
Department of Computer Science - Knowledge-Based Systems
University of Paderborn, Warburger Str. 100, D-33098
Paderborn
Office: O4.149
Email:
mbau...@uni-paderborn.de
Phone:
+49 (0) 5251 60-3352
Fax:
+49 (0) 5251 60-1763
<example.txt>