We don’t do a lot of model comparison in Stan.
I think “transdimensional MCMC” must be Kruschke’s rather non-standard term
for what’s essentially a mixture model where you’re going to look at the mixture
weighting. What I’m guessing Andrew would tell you to do is just make predictions
with the mixture model.
You can code it in Stan for arbitrary models as follows:
In your model, define two local variables lp1 and lp2 for the log probabilities
of each model. Then increment those variables with the log probs of the models.
Then use a mixture model (there’s a chapter in the Stan manual).
In your simple case:
> model {
> modelIndex ~ categorical(model_probs);
>
> if (modelIndex == 1) {
> y ~ some_distribution(m1_params)
> } else {
> y ~ some_other_distribution(m2_params)
> }
> }
it’d look like this (in more traditional stats/CS notation):
parameters {
real<lower=0,upper=1> lambda;
…
}
model {
real lp1;
real lp2;
lp1 <- foo_log(y,theta1);
lp2 <- bar_log(y,theta2);
increment_log_prob(log_sum_exp(log(lambda) + lp1,
log1m(lamba) + lp2));
}
The key is that you’re setting up a mixture model with mixing ratio lambda:
p(y | theta1,theta2,lamba) = lambda * foo(y | theta1) + (1 - lambda) * bar(y | theta2).
The rest is just algebra, keeping in mind that we’re working on the log scale and
log_sum_exp(a,b) = log(exp(a) + exp(b))
and
log1m(c) = log(1 - c)
Those forms are much more arithmetically stable.
This should be much more efficient than Gibbs sampling lambda, because of the
Rao-Blackwell theorem (which, to roughly summarize, says you’re better off using expectations
than sampling).
If your models are more complicated, you need to set lp1 and lp2 to
their respective log probs. Probability easiest to initialize lp1 and lp2 to 0
and then increment for each statement.
If you have more than 2 models, you have to declare lambda as a simplex[K],
declare lp as a vector[K], increment lp as usual for each model,
then use the vector form of log_sum_exp, which will just be:
increment_log_prob(log_sum_exp(log(lambda) + lp));
because log() is vectorized and + is just vector arithmetic. In fact, it might be
easier to follow if you do it this way even for two variables:
parameters {
simplex[2] lambda;
…
}
model {
vector[2] lp; real lp1;
lp[1] <- foo_log(y,theta1);
lp[2] <- bar_log(y,theta2);
increment_log_prob(log_sum_exp(log(lambda) + lp));
}
I should put this in the manual!
- Bob