- Do the problem without the dependence on x_i.
- Do the problem as is, but consider the x_i that \lambda_k depends
on as a separate copy from the one that's generated.
In any case, the EM algorithm can be carried through without any
difficulties, because the E-step conditions on x, so the two
potentials corresponding to the cycle are just treated as two
potentials - the fact that they sum to 1 in one direction or the other
is irrelevant for the posterior computation.
The reason that \lambda_k depends on x_i is that it is useful to have
the dependence in practice (common words will probably have higher
values of \lambda_k for large k).
I take part (b) to mean that we are given some string of words, and
from it we must deduce the lambdas. Do we also have to deduce the
p_k's? Or are those just derived from the counts of word collocations?
In practice, because of the problem with maximum likelihood pointed
out in (c), people end up deriving p_k's from coccurrence counts on a
separate set of words.
-Percy