Hi,
I have been running a model shown below, on fake data I generate in R (code also below). For each sampled value of 'lambda' I capture the log probability in the 'generated quantities' section, in order to estimate an expected log pointwise predictive density (elpd) as in Vehtari et al. (2016). To do this I use the R package 'loo'. If I do this I get the following,
Estimate SE
elpd_waic -3999.4 115.2
p_waic 7.9 0.6
waic 7998.8 230.4
So an estimate of the elpd of -3999.4, with a standard error of ~115. To check this, in R I can recapitulate the elpd estimate, but not its standard error. My question is -- how is this calculated? I couldn't find in the paper how this was done.
In a naive attempt to do this I just calculate the total log likelihood for each posterior sample using,
vELPD <- sapply(seq(1,400,1),function(i) sum(lLoglikelihood[i,]))
where 400 is the number of samples. In then look at the uncertainty in this as a gauge for that in the elpd (I know this method isn't quite correct, but I want an order of magnitude estimate out). This yields a standard deviation of around 0.85!
The reason this concerns me is that if I do proper cross-validation (using k-Folds), I get out an elpd of -3996, but the variation across samples is also small ~ 1. So here we have about a 100 fold difference between the two estimates of uncertainty in model fit!
I am probably making a mistake somewhere along the way, so forgive me if the above is rubbish. Best, Ben
## R fake data
n <- 1000
N <- rbinom(n,size = 100,prob = 0.5)
X <- rnbinom(n,mu=N*0.1,size = 1)
## Stan program
data{
int K;
vector[K] N;
int X[K];
}
parameters{
real<lower=0> lambda;
}
model{
X ~ poisson(lambda*N);
lambda ~ normal(0.5,0.5);
}
generated quantities{
vector[K] lLoglikelihood;
for(i in 1:K)
lLoglikelihood[i] = poisson_lpmf(X[i]|N[i]*lambda);
}