So if i understand it correct, the cost is approximated quadratically around zero like this:

(
End-to-End Training of Deep Visuomotor Policies page 30)
Which is just done to simplify it? Otherwise the original [x0, u0] have to be added everywhere.
It is nice this results in a controller does not need the previous trajectory but only the state.
Op zondag 13 augustus 2017 19:10:15 UTC-5 schreef Marvin Zhang: