can I make posterior predictions in rstanarm and exclude the variance from an effect I don't care about?

106 views
Skip to first unread message

Seth Flaxman

unread,
Jul 27, 2016, 6:52:11 AM7/27/16
to Stan users mailing list
I'm doing a meta-analysis and I've got a model in which I include a study-level effect:

m = stan_glmer(y ~ 1 + x + (1|studyid), data=data)

At prediction time, I want to make population level predictions, excluding this extra source of study-level variance. I could setup a prediction dataset with studyid = "_NEW_studyid" but I don't think that's right--instead of a population level prediction I believe that this gives me a prediction for what I'd find in a new study. (Meaning the means should be the same as what I want, but with wider credible intervals...?)

I can of course generate any posterior quantity I want using the draws, I just wanted to know whether I was missing some simple way to get what I wanted without doing that.
Seth

Ben Goodrich

unread,
Jul 27, 2016, 10:59:47 AM7/27/16
to Stan users mailing list, fla...@gmail.com
On Wednesday, July 27, 2016 at 6:52:11 AM UTC-4, Seth Flaxman wrote:
I'm doing a meta-analysis and I've got a model in which I include a study-level effect:

m = stan_glmer(y ~ 1 + x + (1|studyid), data=data)

At prediction time, I want to make population level predictions, excluding this extra source of study-level variance. I could setup a prediction dataset with studyid = "_NEW_studyid" but I don't think that's right--instead of a population level prediction I believe that this gives me a prediction for what I'd find in a new study. (Meaning the means should be the same as what I want, but with wider credible intervals...?)

If the studyid variable contains _any_ level that was not among the levels that were used when the model was estimated, then it grabs the _NEW_studyid shift in the intercept for that level when doing posterior_predict(). You don't have to name the new level _NEW_.

Ben

Seth Flaxman

unread,
Jul 27, 2016, 11:26:03 AM7/27/16
to Stan users mailing list
OK, good to know. But there's no simple way to simple way exclude this
extra variance, right?
> --
> You received this message because you are subscribed to the Google Groups
> "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Ben Goodrich

unread,
Jul 27, 2016, 11:42:09 AM7/27/16
to Stan users mailing list, fla...@gmail.com
On Wednesday, July 27, 2016 at 11:26:03 AM UTC-4, Seth Flaxman wrote:
OK, good to know. But there's no simple way to simple way exclude this
extra variance, right?

No, because then it wouldn't be a genuine posterior prediction.

Seth Flaxman

unread,
Jul 28, 2016, 6:41:10 AM7/28/16
to Ben Goodrich, Stan users mailing list
I guess you're making a semantic distinction in terms of what a
"posterior prediction" is? What I'm trying to predict is a quantity of
interest that's derived from the posterior; I don't think I'm doing
anything weird here. But I can see why it's not at all easy to make
this automatic.

In case it's helpful for anyone...here's a quick workaround which just
zeros out the draws for the parameter I don't care about (feedback
welcome):

i = grep("^b\\[",rownames(model$stan_summary))[1]
j = which(rownames(model$stan_summary) == "b[(Intercept)
studyid:_NEW_studyid]")

m = length(model$stanfit@sim$samples[[1]][[1]])
for(k in 1:length(model$stanfit@sim$samples))
model$stanfit@sim$samples[[k]][[paste0("b[",j - i + 1,"]")]] = rep(0,m)

predictions = posterior_predict(model, data)

Ben Goodrich

unread,
Jul 28, 2016, 11:55:31 AM7/28/16
to Stan users mailing list, fla...@gmail.com
On Thursday, July 28, 2016 at 6:41:10 AM UTC-4, Seth Flaxman wrote:
I guess you're making a semantic distinction in terms of what a
"posterior prediction" is? What I'm trying to predict is a quantity of
interest that's derived from the posterior; I don't think I'm doing
anything weird here. But I can see why it's not at all easy to make
this automatic.

Those predictions aren't posterior. If the new observation came from an old grouping level, you would integrate over its intercept shift to come up with a predictive distribution that is consistent with Bayes Rule. If the new observation came from a new grouping level, you would integrate over the prior distribution of intercept shifts using the posterior distribution of the group-level variance to come up with a predictive distribution that is consistent with Bayes Rule. But if you fix the intercept shift to be zero rather than integrating over it, then the predictions are not consistent with Bayes Rule.

Ben

Seth Flaxman

unread,
Jul 29, 2016, 8:07:53 AM7/29/16
to Stan users mailing list
Right, but what I want is a population-level estimate; integrating out
would be for the situation in which I got a new grouping, but that
answers a different question. See, for example, predict.lme
(https://stat.ethz.ch/R-manual/R-devel/library/nlme/html/predict.lme.html)
where you can set level=0.

Seth
Reply all
Reply to author
Forward
0 new messages