New papers on LOO/WAIC and Stan

575 views
Skip to first unread message

Andrew Gelman

unread,
Jul 16, 2015, 12:08:03 AM7/16/15
to stan-...@googlegroups.com
Aki, Jonah, and I have released the much-discussed paper on LOO and WAIC in Stan:
We (that is, Aki) now recommend LOO rather than WAIC, especially now that we have an R function to quickly compute LOO using Pareto smoothed importance sampling.

Also 2 new tutorial articles on Stan will be appearing:
The 2 articles have very similar titles but surprisingly little overlap.

I guess I should blog this…

See you
A

Jonah

unread,
Jul 16, 2015, 1:51:46 AM7/16/15
to stan-...@googlegroups.com, gel...@stat.columbia.edu
The R package mentioned in the paper is on GitHub here. There is a version on CRAN but it needs to be updated so please install from GitHub for now. I'll post an update here when the new version is also on CRAN. 

Ben Goodrich

unread,
Jul 16, 2015, 2:41:12 AM7/16/15
to stan-...@googlegroups.com, gel...@stat.columbia.edu, gel...@stat.columbia.edu
The estimates of the shape parameter of the generalized Pareto are at least as interesting as the LOO estimate. This is like a more Bayesian and more general version of Cook's distance for linear models.


On Thursday, July 16, 2015 at 12:08:03 AM UTC-4, Andrew Gelman wrote:

Aki Vehtari

unread,
Jul 16, 2015, 3:20:38 AM7/16/15
to stan-...@googlegroups.com, gel...@stat.columbia.edu
Matlab version of the PSIS-LOO code is available at https://github.com/avehtari/MatlabPSIS
(hopefully later included in the MatlabStan package)

Aki

Jonah

unread,
Jul 17, 2015, 11:26:58 AM7/17/15
to stan-...@googlegroups.com, gel...@stat.columbia.edu
The latest version of the loo R package (0.1.2) is now up on CRAN and should be installable for most people by running

install.packages("loo")

although depending on various things (your operating system, R version, CRAN mirror, what you ate for breakfast, etc.) you might need

install.packages("loo", type = "source")

Aki Vehtari

unread,
Aug 5, 2015, 4:00:47 AM8/5/15
to Stan users mailing list, gel...@stat.columbia.edu
PSIS-LOO code is now also included in the MatlabStan package https://github.com/brian-lau/MatlabStan

Aki

Ben Goodrich

unread,
Sep 2, 2015, 4:52:56 PM9/2/15
to Stan users mailing list, gel...@stat.columbia.edu
On Thursday, July 16, 2015 at 2:41:12 AM UTC-4, Ben Goodrich wrote:
The estimates of the shape parameter of the generalized Pareto are at least as interesting as the LOO estimate. This is like a more Bayesian and more general version of Cook's distance for linear models.

What do we think of the general principle that a prior should be strong enough so that none of the generalized Pareto shape estimates are greater than 1 and not too many are greater than 0.5. My thinking is that if these estimates are too sensitive to particular observations, then it is overfitting in-sample, which is one of the things we are trying to prevent with priors. If so, then we should be encouraging people to look at this even if they are not doing model comparison with the LOO.

Ben

Andrew Gelman

unread,
Sep 2, 2015, 4:55:46 PM9/2/15
to Ben Goodrich, Aki Vehtari, Jonah Sol Gabry, Stan users mailing list
Aki?  Jonah?

Jonah Sol Gabry

unread,
Sep 2, 2015, 5:36:17 PM9/2/15
to Andrew Gelman, Ben Goodrich, Aki Vehtari, Stan users mailing list
I'll defer to Aki for any definitive answer, but this makes sense to me. In particular for regression models where if the leave one out distribution is a bad approximation to the distribution for just point i then this could indicate leverage issues and that more shrinkage would be good. 

Aki Vehtari

unread,
Sep 3, 2015, 10:09:12 AM9/3/15
to Jonah Sol Gabry, Andrew Gelman, Ben Goodrich, Stan users mailing list
The difference between lppd_i and loo_i has been used as a sensitivity measure
(see, e.g., Gelfand et al 1992). Pareto shape parameter estimate k is likely to
be large if the difference between lppd_i and loo_i is large. It's not yet
clear to me whether the Pareto shape parameter estimate k would be better than
lppd_i-loo_i, but at least we know that estimate for lppd_i-loo_i is too small
if k is close to 1 or larger, so it might be better to look at k.

In stack example with normal model, k for one observation is large, but with
student-t model k is smaller. Normal model is same as student-t model, but with
very strong prior on degrees of freedom. So it's not just about having strong
prior or more shrinkage, but having a model which can describe the observations
well. With increased shrinkage and non-robust observation model, the one
observation could still be surprising.

Naturally it's not always the best solution to change to a more robust
observation model allowing "outliers". Instead it might be better to make the
regression function more nonlinear (that is having a less strong prior), or
transform covariates, or add more covariates.

So I do recommend looking at Pareto shape parameter values, but I don't
recommend to increase shrinkage if the values are large.

Aki

On 03.09.2015 00:36, Jonah Sol Gabry wrote:
> I'll defer to Aki for any definitive answer, but this makes sense to me. In
> particular for regression models where if the leave one out distribution is a
> bad approximation to the distribution for just point i then this could indicate
> leverage issues and that more shrinkage would be good.
...
Reply all
Reply to author
Forward
0 new messages