Meaning of conditions

freek.geor...@gmail.com

unread,

Oct 3, 2017, 3:46:33 AM10/3/17

to gps-help

Dear Chelsea and Sergey,

First of all, thanks for making your code available!

I am adapting the code to include a tensorflow policy and more difficult tasks in the Box2D environment.
When I changed the initial position of the arm to a randomly selected one, even the Trajectory Optimization algorithm did not produce sensible results anymore.

The doc string AlgorithmTrajOpt seems to suggest using different conditions for the different initial positions is the solution: sample_lists: List of SampleList objects for each condition.

1. Is this true? What is the idea behind the conditions?
2. Could having separate trajectories for each condition make trajectory optimization work from different initial positions? Or only if the condition is known?
3. How does having separate trajectories for different conditions aid performance during testing, when the condition is unknown?

My initial guess was that different conditions are really different tasks, and that the random starting position was something that the network could deal with.

freek.geor...@gmail.com

unread,

Oct 6, 2017, 5:42:01 AM10/6/17

to gps-help

So I just read lines 42-43 in agent_box2d.py

self._worlds = [world(self.x0[i], target, render)

for i in range(self._hyperparams['conditions'])]

which seem to indicate conditions are just different initial positions. The single element being passed along also seems to suggest that x0 shouldn't be an array with a value for every joint angle, but rather a list of such arrays.

Op dinsdag 3 oktober 2017 09:46:33 UTC+2 schreef freek.geor...@gmail.com:

freek.geor...@gmail.com

unread,

Oct 9, 2017, 9:04:09 AM10/9/17

to gps-help

So another three days later, I think I've understood why there are different 'conditions' for different initial positions.

Trajectories can only be learned from a fixed point to another fixed point. Thus they are local, but are used to guide the global policy. This is why trajectories for separate conditions need to be managed separately. However, the resulting policy is global and can be viewed as an abstraction of the local trajectories.

Reply all

Reply to author

Forward