Dear Chelsea and Sergey,
First of all, thanks for making your code available!
I am adapting the code to include a tensorflow policy and more difficult tasks in the Box2D environment.
When I changed the initial position of the arm to a randomly selected one, even the Trajectory Optimization algorithm did not produce sensible results anymore.
The doc string AlgorithmTrajOpt seems to suggest using different conditions for the different initial positions is the solution: sample_lists: List of SampleList objects for each condition.
1. Is this true? What is the idea behind the conditions?
2. Could having separate trajectories for each condition make trajectory optimization work from different initial positions? Or only if the condition is known?
3. How does having separate trajectories for different conditions aid performance during testing, when the condition is unknown?
My initial guess was that different conditions are really different tasks, and that the random starting position was something that the network could deal with.