The problem is that the beta and alpha estimates are nowhere near each other! I realize that the model hasn't fully converged in JAGS (again showing the tremendous efficiency of HMC!), but it does eventually converge when everything is run about 10X longer, and the estimates are still nowhere near each other. Looking at the posterior summaries, it looks like the taualpha estimate increases at the expense of taub in STAN as compared with JAGS, where the reverse seems to happen. Entirely removing the hyper-parameter taub doesn't change the result in JAGS or STAN, so they're still different. What can possibly be the cause of this? What am I missing? Hopefully, there is a relatively straight forward explanation that will save me a few days of frustration.
taualpha ~ normal(0,100)T[0,]; taus ~ normal(0,100)T[0,];Also, it is not necessary to explicitly truncate the priors on taualpha and taus in Stan in this particular case because they are declared as non-negative and the amount of truncated mass is a constant.
are very different from the priors in your JAGS model
tau.alpha ~ dnorm(0.00000E+00, 100)I(0,) taus ~ dnorm(0.00000E+00, 100)I(0,)
Thanks for the quick reply Ben. Yes, I suspected that the precision vs SD might be the issue, but I'm not sure I immediately follow how I haven't accounted for this... For example, in JAGS, I then transform
taus ~ dnorm(0.00000E+00, 100)I(0,)
pre.alpha <- pow(tau.alpha, -2)And pre.alpha goes into the normal prior for alpha. Isn't this the same thing then as what I've done in STAN? Or do I need to throw a sqrt on the 100 in STAN?... I think this is me being daft!Again, thanks,Andrew
EDIT: So pg 249 suggests that normal(1,2) = normal(1,0.25). In which case, normal(1, 0.25) is the same as the following:
normal(1,pre)pre <- pow(tau, -2)tau <- 2
So, isn't this what I've done in my script, and so this isn't the problem?...
beta ~ normal(0.00000E+00, taub);
for (i in 1:4) {
beta[i] ~ dnorm(0.00000E+00, 1e-3)
}
You're right, sorry. I didn't look closely enough at your .bug file. But a more likely thing is that in the .stan file you havebeta ~ normal(0.00000E+00, taub);
while in the .bug file you havefor (i in 1:4) {
beta[i] ~ dnorm(0.00000E+00, 1e-3)
}
Although taub has a prior it seems disconnected from the rest of the DAG.
Ben
You should be able to see now that this still leads to different parameter estimates...
mu[i] <- beta[1] * xitrue[1,i] + beta[2] * xitrue[2,i] + rans[i] + beta[3]*xitrue[1,i]*xitrue[2,i] + alpha[PRED[i]];
> round(t(colMeans(as.array(fit.stan))), 3) chainsparameters chain:1 chain:2 chain:3 taus 0.178 0.229 0.398 tau 0.335 0.333 0.333 taub 0.349 0.346 0.301 beta[1] -0.124 -0.093 -0.192 beta[2] 0.087 0.101 -0.245 beta[3] 0.084 0.071 0.232 beta[4] 0.054 0.055 0.071 taualpha 2.183 2.046 1.814 alpha[1] -1.062 -1.058 -0.536 alpha[2] -0.978 -0.985 -1.111 alpha[3] -1.127 -1.122 -1.020 mu[1] -1.217 -1.217 -1.211 mu[2] -1.245 -1.245 -1.245 mu[3] -1.530 -1.527 -1.531 mu[4] -1.013 -1.012 -1.010 mu[5] -0.876 -0.874 -0.874 mu[6] -0.780 -0.781 -0.779 mu[7] -0.944 -0.947 -0.945 mu[8] -1.282 -1.282 -1.282 lp__ 492.389 490.340 479.517> round(t(colMeans(jagsfit8$BUGSoutput$sims.array)), 3) [,1] [,2] [,3]alpha[1] -0.013 -0.031 -0.038alpha[2] -0.006 -0.020 -0.015alpha[3] -0.035 -0.034 -0.031beta[1] 1.238 0.885 1.021beta[2] 1.154 1.017 1.173beta[3] 0.795 0.938 1.229beta[4] 0.247 0.330 -0.045deviance 514.308 514.954 514.549mu[1] -1.208 -1.218 -1.213mu[2] -1.244 -1.246 -1.243mu[3] -1.532 -1.526 -1.526mu[4] -1.010 -1.010 -1.012mu[5] -0.874 -0.876 -0.878mu[6] -0.775 -0.777 -0.775mu[7] -0.947 -0.943 -0.947mu[8] -1.285 -1.285 -1.288tau 0.334 0.335 0.335tau.alpha 0.084 0.091 0.084taub 2.366 2.158 2.318taus 0.092 0.076 0.078pairs(fit.stan, panel = function(...) smoothScatter(..., add = TRUE))On Sunday, February 24, 2013 3:16:23 AM UTC-5, atzap wrote:You should be able to see now that this still leads to different parameter estimates...Yes, now that I actually ran your code, although I shortened the Stan run, which might not have been the best idea in retrospect. Now, I am guessing that there may be multiple modes?mu[i] <- beta[1] * xitrue[1,i] + beta[2] * xitrue[2,i] + rans[i] + beta[3]*xitrue[1,i]*xitrue[2,i] + alpha[PRED[i]];
It seems like having both rans[i] and alpha[PRED[i]] could induce multimodality that would leave the mus unaffected, but I couldn't investigate that because rans didn't get saved. But I could sort of see a problem by looking at the chains individually.
On Sunday, February 24, 2013 1:24:48 PM UTC-5, Ben Goodrich wrote:On Sunday, February 24, 2013 3:16:23 AM UTC-5, atzap wrote:You should be able to see now that this still leads to different parameter estimates...Yes, now that I actually ran your code, although I shortened the Stan run, which might not have been the best idea in retrospect. Now, I am guessing that there may be multiple modes?mu[i] <- beta[1] * xitrue[1,i] + beta[2] * xitrue[2,i] + rans[i] + beta[3]*xitrue[1,i]*xitrue[2,i] + alpha[PRED[i]];
It seems like having both rans[i] and alpha[PRED[i]] could induce multimodality that would leave the mus unaffected, but I couldn't investigate that because rans didn't get saved. But I could sort of see a problem by looking at the chains individually.Thanks for doing that Ben. I hadn't thought of multimodality. But if this was the case shouldn't taus and taualpha be correlated, which they're not?
Anyway, if you do run for longer, you can get STAN to converge. It does so in a consistently different place than where JAGS converges. Without comparing the different software, I would have assumed that the model fitted. Now, I don't know whether to trust where it ends up or whether there is in fact some multimodality that cannot be detected because within a single program (i.e. STAN or JAGS), it always converges to the same spot. I guess that in addition to what is going on here, I'm also asking how can I trust the results?... Again, the same mu's are predicted between the two programs.
Thanks for doing that Ben. I hadn't thought of multimodality. But if this was the case shouldn't taus and taualpha be correlated, which they're not?Maybe, but I would suspect correlation in the levels rather than the standard deviations. In other words, you could add a constant to all the rans and subtract a constant from all the alphas and get the same mus without affecting any of the variances. One thing you might do is restrict the alpha vector to have a sum of zero by say, making the third element be the additive inverse of the sum of the first two.
despite what neff, Rhat, and visual inspection of the chain traces
My first thought was that you must be running it too long! 4 million iterations looks to be overkill! Maybe 1000 iterations or 10,000 would suffice? But are you saying that some of these parameters (e.g., taualpha and rans[1]) did not actually converge even in the simulations shown above?
On Thursday, February 28, 2013 10:30:35 PM UTC-5, Andrew Gelman wrote:My first thought was that you must be running it too long! 4 million iterations looks to be overkill! Maybe 1000 iterations or 10,000 would suffice? But are you saying that some of these parameters (e.g., taualpha and rans[1]) did not actually converge even in the simulations shown above?
3) Irrespective, STAN, the JAGS fit suggests convergence (though with the old Rhat). As you can see in the print out below, it converges at different values than STAN and the ML estimates.
--
Can you try running again with a smaller step size? In particular, run with a constant step size set to 1/10th the value to which adaptation converges and see if that changes the convergence of the chain.
library(rjags)bdat5 <- dget("http://dl.dropbox.com/u/7489119/bdat5")bdat5$Omega <- NULLparams = c("taus", "tau","taub", "beta", "taualpha", "alpha", "mu", "rans")bdat5$xi <- t(bdat5$xi)params[5] <- "tau.alpha"its = function() {list(taub=runif(1), tau.alpha = runif(1), taus = runif(1), tau = runif(1))}m1 <- jags.model("http://dl.dropbox.com/u/7489119/file6af85c783c103.bug", data= bdat5, inits = its, n.chains = 3, n.adapt=1, quiet=FALSE)adapt(m1, n.iter = 100, end.adaptation = F)update(m1, n.iter = 100*.1)m1 <- coda.samples(m1, params, 200)summary(m1)Hopefully I've understood your instructions correctly this time!I've called attr(fit.stan@sim$samples[[1]], "sampler_params")$stepsize in my original workspace and it seems like the different chains are converging at different step-sizes (0.3 - 0.5). I've selected the mean and am now adding equal_step_sizes = T, epsilon = .004 in the call to stan. Is that correct? From the manual, it looks like epsilon sets the step size?...
--
--
This is a limitation of the between-chain variance vs. within-chain variance approach to convergence diagnostics (and effective sample size calculations). If there is a lack of identification of some parameters or multimodality then some parameters may look good and others bad. Then again, genuine multimodality is a hard problem to deal with using MCMC. For your longer runs, can you check whether some of the purely within-chain convergence diagnostics in the coda package indicate lack of convergence for the chains individually?
# run schools example
fit.stan <- stan(model_code = schools_code, data = schools_dat,iter = 10000, warmup = 8000, thin = 10, chains = 4)
# process into MCMC list
fit.stan2 <- as.array(fit.stan)
fit.stan2 <- as.mcmc.list(list(as.mcmc(fit.stan2[,1,]), as.mcmc(fit.stan2[,2,]), as.mcmc(fit.stan2[,3,]), as.mcmc(fit.stan2[,4,])))
# calculate p-values (if you believe in such things) associated with Z-scores from Geweke on each chain
round(2*pnorm(-abs(sapply(geweke.diag(fit.stan2), function(x){x$z}))), 3)
# calculate Heidel on each chain
heidel.diag(fit.stan2)
--