Hi Alba,
If I understand correctly, Lambda_K is a higher moment (co-variance) for estimate the constraint in the E-E paper, but in this case, they set the policy_dual_rate_covar = 0.0 (in algorithm/config.py), so you can get rid of this term.
Therefore, the target_mu = mu_p - (lambda_k)*(prc^-1)*w_t, then if you expand the term (mu_pi - target_mu)*prc*(mu_pi - target_mu) it will become the original one in the paper (get rid of constant term - term don't depend on mu_pi).
Btw, I don't know where is the w_t (nu) in the paper gone. In page 10, equation 2, it should be the coefficient of KL-divergence term, but it disappear in the equation on the image. If it is the coefficient of KL-divergence term, I don't know why it appear in the second term of target_mu ?