HW1 possible typo?

gregory...@gmail.com

unread,

Nov 27, 2007, 2:01:56 AM11/27/07

to CS281A: Statistical Learning Theory (Fall 2007)

Hi,

Does anyone know if problem 1 has a typo? I think it might make more
sense if we were asked to minimize KL(p(x)||q(x)) instead of KL(q(x)||
p(x)). Because of the asymmetry of the KL divergence, the optimal
updates are probably different depending on whuich one we want to
minimize. (Just to be clear, i'm assuming KL(a(x),b(x))=integral
a(y) log(a(y)/b(y)) )

tfl
-greg

gregory...@gmail.com

unread,

Nov 27, 2007, 2:16:54 AM11/27/07

to CS281A: Statistical Learning Theory (Fall 2007)

yea okay, nevermind. (Though I am under the impression that typically
the first term of KL divergence is the ugly distribution--why is this
the case?).

On Nov 26, 11:01 pm, "gregory.vali...@gmail.com"

Percy Liang

unread,

Nov 27, 2007, 3:35:49 AM11/27/07

to cs281a...@googlegroups.com

When you do maximum likelihood estimation, you are minimizing KL(\hat
p(x) || p_\theta(x)),
where \hat p(x) is your empirical distribution and p_\theta(x) is
your model, which you are trying to optimize.

In variational inference, we are minimizing KL(q(x) || p(x)), where
now the first argument q(x) is what you're trying to optimize.

-Percy

Reply all

Reply to author

Forward