kl_div[t] = max(
0,
-0.5 * mu_t.T.dot(M_new - M_prev).dot(mu_t) -
mu_t.T.dot(v_new - v_prev) - c_new + c_prev -
0.5 * np.sum(sigma_t * (M_new-M_prev)) - 0.5 * logdet_new +
0.5 * logdet_prev
)
I cann't figure out why kl divergence could be computed like this. Any help would be grateful!