Re: Questions regarding the Acrobot Implementation

25 views

Skip to first unread message

Brian Tanner

unread,

May 23, 2012, 11:35:57 AM5/23/12

to Michael Baumann, Timo Klerx, rl-li...@googlegroups.com, Adam White

Hi Michael. I'm cc'ing the rl-library list because someone else might have something clever to say about this. This is not my code so I can only read it as you are.

However, having said that, I don't completely understand your question.

the update-method loop that you describe iterates until isTerminal is met OR the loop executes 4 times. Basically we are doing 4 iterations of the physical state update at each time step and the isTerminal() loop condition is just so we can bail out early if we reach the terminal region of the state space.

Ok, now that I am looking through your attachment and the source code (http://code.google.com/p/rl-library/source/browse/trunk/projects/environments/acrobot/src/org/rlcommunity/environments/acrobot/AcrobotState.java?r=1324) again, I see you have added some debug code because the answer to isTerminal() is different right after the loop from what it is after the variables are scaled.

So let me rephrase your concern and you can tell me if you agree. You are concerned that we may exit the while loop because isTerminal()=true and then after scaling the variables, isTerminal() may become false, and this seems like a problem. I tend to agree with you.

What determines if this is a real big problem or a small problem would be whether isTerminal() gives valid answers when theta1 and theta2 are outside of [-PI,PI]. If it does, then the outcome is really weird. If not, then perhaps sometimes we are bailing too early on the loop because we are not limiting the angles to be in [-PI,PI] before checking the terminal condition in the loop but we do outside of the loop.

Am I on the right track? What are your thoughts?

Brian Tanner

PhD Student

University of Alberta

br...@tannerpages.com

example.txt

Timo Klerx

unread,

May 23, 2012, 12:43:02 PM5/23/12

to Brian Tanner, Michael Baumann, rl-li...@googlegroups.com, Adam White

Hi Brian,

thank you for your answer.
As Michael is probably at home already, I will answer because I am also involved in the problem.

You are on the right track.
We also thought that isTerminal() could give wrong answers when theta1/2 are outside of [-PI,PI] because they are just subtracted from each other in isTerminal() and the result of that subtraction may be higher/lower in case of theta1/2 outside of [-PI,PI] compared to subtraction after scaling theta1/2 to [-PI,PI].
That is why we thought of moving the scaling of theta1/2 (ll 110-127) from those lines inside the while loop between lines 108/109.
We were just unsure whether there were reasons to place the scaling outside the while loop.
If not, we will just put the scaling inside the loop. Then the observed error does not occur any longer.

Best regards,
Timo

On 2012-05-23, at 2:36 AM, Michael Baumann wrote:

Dear Brian Tanner,

we investigate the Acrobot Benchmark from the RL-Library (Rev. 1306) and found an unexpected behavior: in the update-method in class AcrobotState the loop iterates until the isTerminal condition is met. After the loop, this condition can be destroyed as the state variables are scaled to fit the interval (see attached file). The outputs (after the terminal condition was destroyed) in the method are:

[...]
count= 4
theta 1 (before - now): -3.358119595066313 -3.141592653589793
theta 2 (before - now): 1.7629735282673185 1.7629735282673185
theta 1 dot (before - now): -1.352284224611665 0.0
theta 2 dot (before - now): 0.8900410793475375 0.8900410793475375
[...]

We now wonder, if this is a desired behavior or a bug. From our point of view, we would expect the scaling to happen inside the loop to ensure not to destroy the terminal condition afterwards. If this is desired, could you please briefly explain the intention for this.

Best regards,
Michael Baumann

--
Michael Baumann

International Graduate School "Dynamic Intelligent Systems"
Department of Computer Science - Knowledge-Based Systems
University of Paderborn, Warburger Str. 100, D-33098 Paderborn

Office: O4.149
Email: mbau...@uni-paderborn.de
Phone: +49 (0) 5251 60-3352
Fax: +49 (0) 5251 60-1763
<example.txt>

Reply all

Reply to author

Forward

0 new messages