A doubt about gps policy update

109 views
Skip to first unread message

alba cheng

unread,
Dec 17, 2016, 8:57:19 PM12/17/16
to gps-help
     I'm still trying to read the code.I am confused about the gps policy update.In paper "End to End...", gps sovle optimization of the following formula

And lambda*mu_pi should be considered when gps solve the optimization. It seems like, the code write as follows:
(algorithm_badmm.py _update_policy)
Then I find lambda_K and lambda_k is interesting,they are calculated like this:

I guess lambdaK * X + lambdak is lambda_mu_t, but why they can be calculate like this still puzzle me。。
Any help would be grateful!
Auto Generated Inline Image 1
Auto Generated Inline Image 2
Auto Generated Inline Image 3

thob....@gmail.com

unread,
Feb 18, 2017, 9:00:09 AM2/18/17
to gps-help
Hi Alba,

If I understand correctly, Lambda_K is a higher moment (co-variance) for estimate the constraint in the E-E paper, but in this case, they set the policy_dual_rate_covar = 0.0 (in algorithm/config.py), so you can get rid of this term.
Therefore, the target_mu = mu_p - (lambda_k)*(prc^-1)*w_t, then if you expand the term (mu_pi - target_mu)*prc*(mu_pi - target_mu) it will become the original one in the paper (get rid of constant term - term don't depend on mu_pi).

Btw, I don't know where is the w_t (nu) in the paper gone. In page 10, equation 2, it should be the coefficient of KL-divergence term, but it disappear in the equation on the image. If it is the coefficient of KL-divergence term, I don't know why it appear in the second term of target_mu ?
Reply all
Reply to author
Forward
0 new messages