Am I missing something?
~ Will
Non-identifiability generally arises with hidden-variable models and
discrete variables - as an example, note that a mixture of
multinomials is a multinomial distribution. Even in the good old HMM
model with discrete variables, the parameters are not identifiable in
general. For example, note that if you permute the identities of the
hidden states, you can get the same likelihood.
In general, the result that EM gives us won't be totally meaningless
because we are still maximizing the likelihood of the data, so for
the purpose of density estimation (where we only care about the
induced distribution over p(x)), we are doing our job (although in
the simple example of problem 3, a much more direct way to do density
estimation is to just use a single multinomial distribution).
One solution to fix this non-identifiability is to put a prior over
the parameters which will break ties between parameters that yield
the same likelihood. One can still use EM with a prior in this case.
-Percy