The tricky calculations such as u = Kx + k or taylor expansions about the cost and its differential

32 views
Skip to first unread message

Lanhai Liu

unread,
Jul 31, 2017, 5:41:58 AM7/31/17
to gps-help
In the end to end paper, the dynamic became
xt+1 = fx * xt + fu * ut + fc
and the output became
u = Kx + k

I think in the original iLQG, these should be like
Δxt+1 = fxΔxt + fuΔut
and
Δu = KΔx + k

The only way these equations make sense is that,
they somehow keep the xt and ut always 0.

At the bottom of page 30, it is said that,
we assume that all Taylor expansions here are recentered around zero.
But I cannot find how to do this.
and what are exactly recentered?The dynamics F, Fx, Fu, or the cost l, lx, lxx?

In the source code, I find out that when calculate the dynamics,
the premise is
xt+1 = fx * xt + fu * ut + fc
and so the fx and fu can be calculated by linear regression.
So I think the F, Fx, Fu is not being "recentered around zero" ???

And I think the cost function is actually recentered, as in the algorithm.py
lx(0) = lx(x0) + lxx(x0)(0-x0) 
l(0) = l(x0) + lx(x0)(0-x0) + 1/2 * lxx(x0)(0-x0)^2
I think costs are recentered, or even changed to another value l(0), lx(0)

In the iLQG, we need first get these
fx, fu, lx, lu, lxx, lux, luu
to calculate the control signal update Δu by a backward pass.
and the K and k are just some term to calculate the Δu.

And in the GPS paper,
we just change the lx, lu to lx(0), lu(0)
get fx, fu by linear regression.
and then do the same backward pass,
and calculate the K and k

then we can get a control signal
u = Kx + k

From the result of GPS, I say yes, we can
But, is there a detailed proof of how this can happen.

and about the recentering of the costs and their differential.
lx(0) = lx(x0) + lxx(x0)(0-x0) 
l(0) = l(x0) + lx(x0)(0-x0) + 1/2 * lxx(x0)(0-x0)^2

Why calculate the lx(0) and l(0) by the taylor expansions.
The original l, lx, lxx are all calculated by some functions in the cost_utils.py.
If we want l(0) and lx(0), we can just run that evallogl2term function.

So what is the difference between the "recenter" and just calculate it by evallogl2term(0)?










Reply all
Reply to author
Forward
0 new messages