Obtain summary of posterior predictive distribution

1,271 views
Skip to first unread message

Kenneth Lopiano

unread,
May 23, 2013, 2:21:47 PM5/23/13
to r-inla-disc...@googlegroups.com
When response values are set to NA and control.predictor=list(compute=TRUE), the values in summary.fitted.values correspond to posterior summaries of the linear predictor. However, we are interested in summaries of the posterior predictive distribution, that is, we would like summaries of predicted values of the response not the predicted value of the linear predictor.

To obtain posterior summaries of predicted values, I specify the overall error term as an "iid" random effect and fixed the precision of the gaussian family to be extremely large, control.family=list(initial=12,fixed=TRUE). Does this seem reasonable? A working example is described below.

Consider the following working example based on the attached dataset and r-code(commented). The data comes from a split-plot experiment with 2 whole-plot factors (one with 2 levels and one with 6 levels), and one split-plot factor (with 3 levels). There were 4 replicates at each of the 12 combinations of the whole plot factors. A model for the data can be written as 

Y_{ijkl} =  \alpha_ i + \beta_ j + (\alpha\beta)_{ij} + \eta_{ l(ij)}+\gamma_k +(\alpha\gamma)_{ ik} + (\beta\gamma) _{jk} + (\alpha\beta\gamma)_{ijk} + \epsilon_l(ijk)

where i indexes whole-plot factor 1, j indexes whole-plot factor 2, k indexes the split plot factor and l indexes replicate. We assume \eta_{ l(ij)} - the whole plot error term - is normally distributed with mean 0 and variance $\sigma^2_{\eta}$ and the split-plot error term $\epsilon_l(ijk)$ is normally distributed with mean 0 and variance $\sigma^2_{\epsilon}$.

The data set has the following columns - ID, Response, repXwp1wp2, wp1, sp, wp2.

The attached r-code reads in the data, creates a data frame for predicting response at each of the 36 combinations if wp1, sp, wp2 and merges the two data frames. Then the model with $\epsilon$ specified as an "iid" effect is fit and posterior summaries are obtained. Finally, the model with $\epsilon$ assumed to be the family error term is fit and posterior summaries are obtained. The results are plotted for both models in 3X2 arrays in posteriorpred.pdf and posteriorpredlinpred.pdf. Notice the width of the credible intervals is larger for the plots in posteriorpred.pdf.



data.sub.csv
posteriorpred.pdf
posteriorpredlinpred.pdf
for-forum-split-plot.R

INLA help

unread,
May 23, 2013, 3:38:11 PM5/23/13
to Kenneth Lopiano, r-inla-disc...@googlegroups.com
Thanks,

yes you are correct. I interpret your post as a comment only, so I
follow up with my comment on it. Yes, it's is a cool trick.

Here are two models:

A. y ~ 1 + x with family="gaussian"

B. y ~ 1 + x + f(idx,model="iid"), with idx=1:n, family="gaussian" with
a fixed high precision.

the difference, is that the linear.predictor in A is 1+x, while in B its
1+x+f(idx) which is the *the observed data* if y[idx[i]] and the
predictive distribution for a new observation, if y[idx[i]]=NA. so what
you need to do is to replicate the model.

This simple(r) example, demonstrate this.

n = 100
x = rnorm(n)
y = 1 + x + rnorm(n, sd=0.1)

r = inla(y ~ 1 + x,
data = data.frame(y, x),
control.predictor = list(compute=TRUE),
control.family = list(
hyper = list(prec = list(param=c(1, 0.01)))))

xx = c(x, x)
yy = c(y, rep(NA, n))
ii = c(1:n, 1:n)
rr = inla(yy ~ 1 + xx +
f(ii, model="iid", hyper = list(prec = list(param=c(1,
0.01)))),
data = data.frame(yy, xx),
control.family = list(hyper = list(prec = list(initial=20,
fixed=TRUE))),
control.predictor = list(compute=TRUE))

## this should be almost zero
r$summary.fixed-rr$summary.fixed

## plot of the i'th observation with the predictive distribution as
## well.

i = 1
plot(inla.smarginal(rr$marginals.linear.predictor[[n+i]]),type="l")
abline(v=y[i])
title(paste("observation", i))






Best,
Håvard



--
Håvard Rue
he...@r-inla.org

Reply all
Reply to author
Forward
0 new messages