In this post (
https://r-nimble.org/variable-selection-in-nimble-using-reversible-jump-mcmc) it describes how to do variable selection using RJMCMC. This looks very cool, but as I played around with it, I realized that I have a question (probably very elementary, and I should know the answer, but I've been puzzling over it) about the interpretation of the posteriors for the regression coefficients.
Say a coefficient value is close to (but not at) zero, but the posterior inclusion probability for the covariate is about 0.6. What is the correct interpretation for the posterior for that coefficient? Is it interpreted only for the iterations where z[i] ==1 (which moves the mean away from zero) or should one use all the iterations, even when z[i]==0 (which may result in a posterior that has two modes - a greater one at zero and a lesser one away from zero)? I'd think the latter is correct, and the greater the spike at zero, the less important that variable is.
Or, does one choose an inclusion threshold (say 0.5), and then refit the model, including only the covariates that exceed the threshold?
In another (fairly large) dataset where I tried to apply this variable selection approach, I had one variable whose inclusion probability was > 0.8, but the posterior mean was also very, very close to zero. Maybe that indicated that variable are more likely to be included when a dataset is very large?
Also, the example linked above used independent priors for z and beta, but I read in Hooten and Hobbs 2015 that this can cause problems if the prior for the beta is too vague. If one uses a more constrained prior, is the independent prior prior approach still reasonable?
The waters are deeper than I anticipated....
Glenn