> On Oct 14, 2016, at 4:45 PM, Haziq Jamil <
haziq...@gmail.com> wrote:
>
>
>
> On Friday, 14 October 2016 16:45:20 UTC+1, Bob Carpenter wrote:
> You can only fit a hard spike-and-slab variable selection model
> in Stan with small numbers of variables.
>
> It's also very hard to fit in JAGS or other Gibbs samplers.
> They will tend to get stuck in modes and not mix well on the
> Bernoulli variate indicating if a variable is selected or not.
> The true posterior is not going to be 100% on or off (that is,
> full Bayes won't do variable selection), and there are likely
> to be many modes. You can see that analytically if you write out
> the posterior for the value of a coefficient.
>
>
> In a Gibbs sampling procedure of the original model above, the conditional density for the betas is Gaussian, and Bernoulli for the gammas. I'm not sure what you mean by multimodal?
The usual---the posterior will have local optima, based
on which of the variables is selected.
> Also, when you say the Bernoulli variates get stuck and not mix well, e.g. gamma_j keeps on being sampled as 1, 1, 1, 1, ... with some occasional zeroes, then surely the model is telling us that this variable is important, don't throw it out?
What happens is that it can do that because it gets stuck, not
because the Pr[z[n] = 1 | data] = 1 in the posterior.
> If you mean the betas don't mix well, then sure it's a problem. But the Gibbs conditionals seem to be nice distributions, I'm failing to see how it can be an issue.
:-) Gibbs is scale invariant but not rotation invariant.
When you introduce correlations among your parameters (as
in the strong correlation between the variables being
selected), then it can fail. There really is a huge literature
on this as I said before. It's mainly about the kinds of
phase changes that happen among the modes.
> The posterior mean of the gamma_j samples are interpreted as the posterior inclusion probabilities for each variable X_j (Ntzoufras, 2011). So yes agreed that Bayes doesn't turn variables 100% on or off, but inference on variable or model selection is done based on probabilities, and one could choose the model with the highest posterior probability or median probability models (Barbieri and Berger, 2004).
Assuming you can put a prior on models that makes sense.
Usually it doesn't, and all this model comparison and model
weighting stuff is pretty useless.
And your choices of which variables are on or off are not
independent. So you can't just look at the marginals. You
get another combinatorial problem there.
> There's a vast MCMC literature on the problems posed by
> combinatorial models like variable selection, with Ising spin models
> making up the bulk of that because they have real physical
> analogues.
I would like to emphasize this point. Seriously, check out
the Ising models as they have the exact same problems you're
going to run into.
> You will be able to diagnose this using multiple chains if
> you give them diffuse initializations. I can't recall how
> JAGS initializes, but I think it may be deterministically by
> default. Masanao and Yu-Sun and Andrew built dealing with multiple
> chains into R2jags, but I can't recall how inits work.
Ditto.
- Bob