problem 4

percy...@gmail.com

unread,

Nov 1, 2007, 8:28:47 PM11/1/07

to CS281A: Statistical Learning Theory (Fall 2007)

The way the model is set up, \lambda_k depends on x_i, which is
problematic for having a well-defined generative model of x because
you end up getting cycles in your graphical model. You can do one of
two things:

- Do the problem without the dependence on x_i.
- Do the problem as is, but consider the x_i that \lambda_k depends
on as a separate copy from the one that's generated.

In any case, the EM algorithm can be carried through without any
difficulties, because the E-step conditions on x, so the two
potentials corresponding to the cycle are just treated as two
potentials - the fact that they sum to 1 in one direction or the other
is irrelevant for the posterior computation.

The reason that \lambda_k depends on x_i is that it is useful to have
the dependence in practice (common words will probably have higher
values of \lambda_k for large k).

Will Chang

unread,

Nov 3, 2007, 10:44:14 PM11/3/07

to CS281A: Statistical Learning Theory (Fall 2007)

I have a general question regarding this problem.

I take part (b) to mean that we are given some string of words, and
from it we must deduce the lambdas. Do we also have to deduce the
p_k's? Or are those just derived from the counts of word collocations?

Percy Liang

unread,

Nov 4, 2007, 4:08:05 PM11/4/07

to cs281a...@googlegroups.com

Yes, derive the parameters for the p_k's as part of the EM algorithm...

In practice, because of the problem with maximum likelihood pointed
out in (c), people end up deriving p_k's from coccurrence counts on a
separate set of words.

-Percy

Reply all

Reply to author

Forward