mixture of linear regression in Stan

Marco Inacio

unread,

Mar 9, 2016, 5:09:48 PM3/9/16

to stan-...@googlegroups.com

When trying to do a mixture of two linear regressions in Stan (using the discrete parameter marginalization method), the sampler gets stuck at a single model if the regression have more than 2 regressors.

See the attached files:

full.R: mix two regressions with two regressors and one intercept. modelIndex gets stuck in model 2.

simple.R: mix two regressions with one regressor and no intercept. Works fine: modelIndex average is close to 1.5. Also worked fine with one regressor and one intercept.

Note that both files mix two identical regressions, so modelIndex mean should be close to 1.5 in both.

This problem also happened with more complex models and also with some finite mixture (clusters not switching which is unexpected behavior since the model is not identified). What's the source of this problem? I think I read something about this a while ago, but don't remember where.

Also, does anyone know (maybe an article about) how good it is (in terms of converge, for example) the usage of posterior samples to get P(D) (that is, estimating each model separately in Stan, and then getting P(D) for each and using it to get the model weights):

$\frac{1}{P(D)} = \int_{\Theta} \frac{P(\theta|D)}{P(D|\theta)} d\theta \approx \frac{1}{S} \sum_{s=1}^{S} \frac{1}{P(D|\theta_s)}$

(the equality holds because:)

$\int_{\Theta} \frac{P(\theta|D)}{P(D|\theta)} d\theta = \int_{\Theta} \frac{P(D|\theta) P(\theta)}{P(D|\theta) P(D)} d\theta = \int_{\Theta} \frac{P(\theta)}{P(D)} d\theta = \frac{1}{P(D)} \int_{\Theta} P(\theta) d\theta = \frac{1}{P(D)}$

full.R

simple.R

Michael Betancourt

unread,

Mar 9, 2016, 5:12:13 PM3/9/16

to stan-...@googlegroups.com

This is the harmonic mean estimator and it is the worst. Seriously, it

is an atrocious estimator. As is any estimator that tries to compute

the marginal likelihood from posterior samples alone.

Marco Inacio

unread,

Mar 9, 2016, 7:02:56 PM3/9/16

to stan-...@googlegroups.com

Thanks, that's sad, but also half-expected, it was too easy for a too complex literature, but I couldn't find it there without knowing its name.

Should I include nested sampling among the bad methods? Do you have any suggested algorithm for starters?

--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward