WAIC in the context of spatial models, data imputation/censoring, and/or measurement error

543 views
Skip to first unread message

Cody Ross

unread,
Dec 7, 2013, 7:07:34 PM12/7/13
to stan-...@googlegroups.com
This question doesn't deal with Stan itself, so feel free to delete the thread if it doesn't fit the purpose of the Stan mailing list.

I have been working on a few spatial models, and models with measurement error on the dependent variable; I'd like to find a way of doing model comparison.

First, I am wondering how WAIC (http://www.stat.columbia.edu/~gelman/research/unpublished/waic_understand_final.pdf) deals with data imputation and measurement error. Can the llpd and effective number parameters be calculated when the dependent variable is a random node itself?

Secondly, WAIC, like other metrics, seems to be derived under assumptions of independent errors. Some people have found that measures like AIC and BIC perform reasonably in model comparison in spite of violating these assumptions (using simulated data). Would it be insane to include a WAIC comparison in a paper comparing spatial models?

Bob Carpenter

unread,
Dec 7, 2013, 8:33:24 PM12/7/13
to stan-...@googlegroups.com
Maybe Andrew has some thoughts. If not, I'd suggest
writing to Aki Vehtari, one of the co-authors, who works
on Gaussian Processes and spatial stats.

- Bob
> --
> You received this message because you are subscribed to the Google Groups "stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Michael Betancourt

unread,
Dec 7, 2013, 8:51:46 PM12/7/13
to stan-...@googlegroups.com
> This question doesn't deal with Stan itself, so feel free to delete the thread if it doesn't fit the purpose of the Stan mailing list.

Oh, we love these kinds of questions.

> I have been working on a few spatial models, and models with measurement error on the dependent variable; I'd like to find a way of doing model comparison.
>
> First, I am wondering how WAIC (http://www.stat.columbia.edu/~gelman/research/unpublished/waic_understand_final.pdf) deals with data imputation and measurement error. Can the llpd and effective number parameters be calculated when the dependent variable is a random node itself?
>
> Secondly, WAIC, like other metrics, seems to be derived under assumptions of independent errors. Some people have found that measures like AIC and BIC perform reasonably in model comparison in spite of violating these assumptions (using simulated data). Would it be insane to include a WAIC comparison in a paper comparing spatial models?

Actually AIC is an approximation to WAIC, which is an approximation to the KL divergence between the predictive posterior predictive distribution and the true data distribution (assuming one exists). It's really not as assumption of independent errors for WAIC, but rather an assumption that each datum is really two independent data (as in cross validation).

The nice thing about WAIC is that it's coordinate independent and correctly incorporates any measurement error in the model. Data imputation is a different beast given that it rarely comes from a consistent model so that there's no way to validate it in a mathematically consistent way.

Daniel Lee

unread,
Dec 7, 2013, 11:13:19 PM12/7/13
to stan-...@googlegroups.com
On Sat, Dec 7, 2013 at 8:51 PM, Michael Betancourt <betan...@gmail.com> wrote:
> This question doesn't deal with Stan itself, so feel free to delete the thread if it doesn't fit the purpose of the Stan mailing list.

Oh, we love these kinds of questions.


Just in case it wasn't clear from the second part of Micheal's response, he was serious. We do love these kinds of questions and we are happy dealing with statistical issues.

So... keep 'em coming.


Daniel
 

Andrew Gelman

unread,
Dec 8, 2013, 12:27:21 PM12/8/13
to stan-...@googlegroups.com
Hi, just by the way, here's the final version of that paper:

Currently, I think the easiest (really, the only) way to calculate WAIC in Stan is by defining a WAIC variable in the transformed parameters block.  Actually I think it would make sense for us (I guess that means me) to do it for an example and put it up in the Stan models.

Mauricio Garnier-Villarreal

unread,
Dec 8, 2013, 2:13:36 PM12/8/13
to stan-...@googlegroups.com, gel...@stat.columbia.edu


Hi

I think it would be great to have an example of how to calculate the WAIC in Stan. I have been using an example code to calculate the WAIC in the post process (I attach it here), this code was posted in another discuccion about this topic. I would also like to here your opinion about this method to calculate the WAIC, Or really the only proper way is to do it in the transformed parameters block?

thanks

bye
waic_experiments.r

Andrew Gelman

unread,
Dec 8, 2013, 2:36:15 PM12/8/13
to Mauricio Garnier-Villarreal, stan-...@googlegroups.com
Just to be clear:  I don't really think that the transformed parameters block is the only place to compute Waic; I just see that as the cleanest way to do it.
A

<waic_experiments.r>

Bob Carpenter

unread,
Dec 8, 2013, 9:37:45 PM12/8/13
to stan-...@googlegroups.com
@Andrew --- an example would be great.

@Mauricio --- you can do the calculations like you do on the outside.

Using the transformed parameters block, you might save some
redundant coding.

Neither way is as clean as we like.

Keep in mind that from within Stan, the overall log probability returned
in Stan is on the unconstrained parameters and is not normalized. This includes
the log probability from the model itself and also from any Jacobians
involved in transforming constrained variables.

- Bob


On 12/8/13, 2:13 PM, Mauricio Garnier-Villarreal wrote:
>
>
> Hi
>
> I think it would be great to have an example of how to calculate the WAIC in Stan. I have been using an example code to
> calculate the WAIC in the post process (I attach it here), this code was posted in another discuccion about this topic.
> I would also like to here your opinion about this method to calculate the WAIC, Or really the only proper way is to do
> it in the transformed parameters block?
>
> thanks
>
> bye
>
>
> On Sunday, December 8, 2013 11:27:21 AM UTC-6, Andrew Gelman wrote:
>
> Hi, just by the way, here's the final version of that paper:
> http://www.stat.columbia.edu/~gelman/research/published/waic_understand3.pdf
> <http://www.stat.columbia.edu/~gelman/research/published/waic_understand3.pdf>
>
> Currently, I think the easiest (really, the only) way to calculate WAIC in Stan is by defining a WAIC variable in
> the transformed parameters block. Actually I think it would make sense for us (I guess that means me) to do it for
> an example and put it up in the Stan models.
>
> On Dec 8, 2013, at 1:07 AM, Cody Ross wrote:
>
>> This question doesn't deal with Stan itself, so feel free to delete the thread if it doesn't fit the purpose of
>> the Stan mailing list.
>>
>> I have been working on a few spatial models, and models with measurement error on the dependent variable; I'd like
>> to find a way of doing model comparison.
>>
>> First, I am wondering how WAIC
>> (http://www.stat.columbia.edu/~gelman/research/unpublished/waic_understand_final.pdf
>> <http://www.stat.columbia.edu/~gelman/research/unpublished/waic_understand_final.pdf>) deals with data imputation
>> and measurement error. Can the llpd and effective number parameters be calculated when the dependent variable is a
>> random node itself?
>>
>> Secondly, WAIC, like other metrics, seems to be derived under assumptions of independent errors. Some people have
>> found that measures like AIC and BIC perform reasonably in model comparison in spite of violating these
>> assumptions (using simulated data). Would it be insane to include a WAIC comparison in a paper comparing spatial
>> models?
>>
>> --
>> You received this message because you are subscribed to the Google Groups "stan users mailing list" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com
>> <javascript:>.
>> For more options, visit https://groups.google.com/groups/opt_out <https://groups.google.com/groups/opt_out>.

Mauricio Garnier-Villarreal

unread,
Dec 9, 2013, 6:15:13 PM12/9/13
to stan-...@googlegroups.com
Thank you very much Bob and Andrew

To have an example would be great

Great work on STAN, its great

Andrew Gelman

unread,
Dec 10, 2013, 11:38:41 AM12/10/13
to stan-...@googlegroups.com
As Bob would say, it's on my to-do list!

Aki Vehtari

unread,
Dec 12, 2013, 10:06:12 AM12/12/13
to stan-...@googlegroups.com
Cody> "First, I am wondering how WAIC (http://www.stat.columbia.edu/~gelman/research/unpublished/waic_understand_final.pdf) deals with data imputation and measurement error. Can the llpd and effective number parameters be calculated when the dependent variable is a random node itself?"

I assume that you have multivariate dependent variable y_ij, where i=1,...,n and j=1,...p.  I think it is easiest first to think how CV would work as WAIC is approximation of CV. There's two options. 

The first options is the pure M-open approach (see http://dx.doi.org/10.1214/12-SS102 for discussion of M-open etc.) where you compute the llpd only for the observed outputs. For example, if p=2 the collection of lppd_i's contain mix of terms p_post(-i)(y_i1), p_post(-i)(y_i2), p_post(-i)(y_i1,y_i2) depending which observations y_ij are missing. As same observations are missing for all models, obtained lppd_CV is ok for model comparison.

The second option is to use mix of M-open and M-completed/closed approaches. For example, if p=2 the collection of lppd_i's contain the terms p_post(-i)(y_i1,y_i2), where in case of missing data we integrate over the missing data distribution using the imputation model (which may be problematic if the imputation algorithm is not consistent as Mike warned).

In the case of measurement error, I assume that you have all observations and it's just that your model is slightly more complicated.

WAIC can be used in these cases, too. The first option is simpler to implement.

Cody> "Secondly, WAIC, like other metrics, seems to be derived under assumptions of independent errors. Some people have found that measures like AIC and BIC perform reasonably in model comparison in spite of violating these assumptions (using simulated data). Would it be insane to include a WAIC comparison in a paper comparing spatial models?"

If WAIC is derived from cross-validation it does not need the assumption of independent errors. The prediction task consists of a collection of independent marginal predictions (1a). It depends on your decision problem, whether this the correct prediction task or should you care more about the joint predictions (1b). Although WAIC and CV are asymptotically equal, they assume slightly different prediction tasks: WAIC) interested in the predictions only at the observed covariate values (2a), CV) interested in the predictions also at the not yet observed covariate values (2b). If your answer is 1a+2a, then WAIC is sane choice although it seems that it can break sometimes when using very flexible models (like Gaussian process or Markov random field spatial models). See the slides http://www.lce.hut.fi/~ave/slides_trondheim.pdf (we are writing a paper about this and further developments).

Mike> "Actually AIC is an approximation to WAIC, which is an approximation to the KL divergence between the predictive posterior predictive distribution and the true data distribution (assuming one exists).  It's really not as assumption of independent errors for WAIC, but rather an assumption that each datum is really two independent data (as in cross validation)."

cross-validation assumes exhangeability, not independence

To summarise: if your choice is between WAIC or some other *IC, use WAIC. If you can use CV, it is a more robust choice.

cs

unread,
Dec 20, 2013, 4:41:40 PM12/20/13
to stan-...@googlegroups.com
I've read this chapter (http://www.stat.columbia.edu/~gelman/research/unpublished/waic_understand_final.pdf) and still have some questions how to compute WAIC for a hierarchical model. Let me describe what I think is a correct way to compute WAIC...

Let's say I have data from N subjects (i=1,2,3,..., N) and each subject has T trials. I build a model that has P individual parameters and each individual parameter has its hyper parameters. Assume each parameter is coming from a normal distribution, so group parameters will be the mean and SD of each individual parameter.

1. First compute each subject's lppd_i (Eq 5, the chapter). I used the notation lppd_i to indicate that this is lppd of subject i. First calculate p(y_i | theta^s). I would do it by programming the model again in Python (or R) and using posterior samples. Use each subject's posterior samples (s=1,2,3,..., S) of individual parameters (not group parameters) to compute the probability over S samples and T trials. Calculate lppd_i using Eq 5 for each subject.
2. Compute pWAIC, either pWAIC1 or pWAIC2 (Eq 11). Again use each subject's posterior samples of individual parameters.
3. Compute each subject's WAIC_i ( WAIC_i = 2*lppd_i - 2* pWAIC_i)
4. Sum up all subjects' WAIC_i to compute a single value WAIC (WAIC = sum_{i=1}^N WAIC_i

If someone could check if this is a correct way to compute WAIC for a hiearchical model, it would be great. Thanks. 

CS

Andrew Gelman

unread,
Dec 20, 2013, 4:48:39 PM12/20/13
to stan-...@googlegroups.com
Hi, I'm working on an example of Waic in Stan using R (but the code should be clear enough that it could be copied into python etc).  It's possible to use increment_log_prob to compute the Waic without repeating the code.  If I get some examples working and they seem to work, we are thinking about implementing some more automatic way to do it within Stan.

Regarding the statistical details:  we recommend using p_waic2 as our default.   Regarding the hierarchical model, the predictive evaluation can be done at the observation level or the group level; it depends on which aspect of the model is being checked; see Section 2.5 of the final version of the paper:
But I think that what will really help are some fully worked-out examples in Stan.

Reply all
Reply to author
Forward
0 new messages