problem 4

4 views
Skip to first unread message

percy...@gmail.com

unread,
Nov 1, 2007, 8:28:47 PM11/1/07
to CS281A: Statistical Learning Theory (Fall 2007)
The way the model is set up, \lambda_k depends on x_i, which is
problematic for having a well-defined generative model of x because
you end up getting cycles in your graphical model. You can do one of
two things:

- Do the problem without the dependence on x_i.
- Do the problem as is, but consider the x_i that \lambda_k depends
on as a separate copy from the one that's generated.

In any case, the EM algorithm can be carried through without any
difficulties, because the E-step conditions on x, so the two
potentials corresponding to the cycle are just treated as two
potentials - the fact that they sum to 1 in one direction or the other
is irrelevant for the posterior computation.

The reason that \lambda_k depends on x_i is that it is useful to have
the dependence in practice (common words will probably have higher
values of \lambda_k for large k).

Will Chang

unread,
Nov 3, 2007, 10:44:14 PM11/3/07
to CS281A: Statistical Learning Theory (Fall 2007)
I have a general question regarding this problem.

I take part (b) to mean that we are given some string of words, and
from it we must deduce the lambdas. Do we also have to deduce the
p_k's? Or are those just derived from the counts of word collocations?

Percy Liang

unread,
Nov 4, 2007, 4:08:05 PM11/4/07
to cs281a...@googlegroups.com
Yes, derive the parameters for the p_k's as part of the EM algorithm...

In practice, because of the problem with maximum likelihood pointed
out in (c), people end up deriving p_k's from coccurrence counts on a
separate set of words.

-Percy

Reply all
Reply to author
Forward
0 new messages