A doubt about gps policy update

109 views

Skip to first unread message

alba cheng

unread,

Dec 17, 2016, 8:57:19 PM12/17/16

to gps-help

I'm still trying to read the code.I am confused about the gps policy update.In paper "End to End...", gps sovle optimization of the following formula


  And lambda*mu_pi should be considered when gps solve the optimization. It seems like, the code write as follows:
(algorithm_badmm.py _update_policy)
 Then I find lambda_K and lambda_k is interesting,they are calculated like this:

  I guess lambdaK * X + lambdak is lambda_mu_t, but why they can be calculate like this still puzzle me。。
  Any help would be grateful！

Auto Generated Inline Image 1

Auto Generated Inline Image 2

Auto Generated Inline Image 3

thob....@gmail.com

unread,

Feb 18, 2017, 9:00:09 AM2/18/17

to gps-help

Hi Alba,

If I understand correctly, Lambda_K is a higher moment (co-variance) for estimate the constraint in the E-E paper, but in this case, they set the policy_dual_rate_covar = 0.0 (in algorithm/config.py), so you can get rid of this term.

Therefore, the target_mu = mu_p - (lambda_k)*(prc^-1)*w_t, then if you expand the term (mu_pi - target_mu)*prc*(mu_pi - target_mu) it will become the original one in the paper (get rid of constant term - term don't depend on mu_pi).

Btw, I don't know where is the w_t (nu) in the paper gone. In page 10, equation 2, it should be the coefficient of KL-divergence term, but it disappear in the equation on the image. If it is the coefficient of KL-divergence term, I don't know why it appear in the second term of target_mu ?

Reply all

Reply to author

Forward

0 new messages