clarification/question about LOO and pareto's K

901 views

Skip to first unread message

Chang Feng Hsun

unread,

Apr 1, 2017, 5:21:48 PM4/1/17

to Stan users mailing list

Hi All,

I'd like to make sure that my understanding of LOOic and pareto's K is correct so that I can really use this method to compare my multiple models.

According to the manual, looic estimate from the loo package is 2*elpd_loo, elpd_loo is the expected log pointwise predictive density. In plain English, (elpd_loo) expected log pointwise predictive density is the probability to predict the ith data point with current data set without the ith data point. So it is expressed as eqn (4) and (5) of Vehtari A., Gelman A., and Gabry J. 2016. This is Bayesian because we are using the posterior probability of parameters and the probability of the ith data given the estimated parameters to calculate elpd_loo. In other words, looic is a measure of how good this model is to fit the data.

The loo package uses a Pareto-smoothed importance sampling (PSIS) method to estimate the expected log pointwise predictive density. In this method, there is a critical parameter K (Pareto's K) for each data point to judge whether this data point being accurately predicted. If this point is not being predicted accurately (low probability I guess??), the Pareto's K would be large. According to the manual, large Pareto's K (>0.7) means the model posterior would be too different if one data point is being removed. This suggest that the model is not capturing the data well (i.e. some data points with high K are highly influential and not being considered by the model).

Following this logic, I can not trust the looic estimate if most (say>80%) of the Pareto's K is higher than 0.7.

To deal with the issue with high Pareto's K, one can either change the model structure or transform the data. Are there any other solutions?

Am I understanding the loo and interpreting it correctly?

Thank you for your help!

OSCAR

Aki Vehtari

unread,

Apr 2, 2017, 5:15:18 AM4/2/17

to Stan users mailing list

Hi Oscar,

On Sunday, April 2, 2017 at 12:21:48 AM UTC+3, Chang Feng Hsun wrote:

According to the manual, looic estimate from the loo package is 2*elpd_loo, elpd_loo is the expected log pointwise predictive density.

looic is -2*elpd and thus they are eqaul up to this historically used multiplier -2. I prefer elpd scale.

In plain English, (elpd_loo) expected log pointwise predictive density is the probability to predict the ith data point with current data set without the ith data point.

Be more careful with the difference between probability density and probability. If the predicted observations are discrete, then we have probabilities. I f the predicted observations are continuous we have densities. Otherwise correct.

So it is expressed as eqn (4) and (5) of Vehtari A., Gelman A., and Gabry J. 2016. This is Bayesian because we are using the posterior probability of parameters and the probability of the ith data given the estimated parameters to calculate elpd_loo.

It's Bayesian because we integrate over the the posterior distribution to get the predictive distribution (which can be continuous or discrete), and we are estimating the predictive performance of the predictive distribution.

In other words, looic is a measure of how good this model is to fit the data.

It's a measure of how good the model is predicting future data assuming that the future data comes from the same distribution as the observed data.

The loo package uses a Pareto-smoothed importance sampling (PSIS) method to estimate the expected log pointwise predictive density. In this method, there is a critical parameter K (Pareto's K) for each data point to judge whether this data point being accurately predicted. If this point is not being predicted accurately (low probability I guess??),

Not necessarily low probability...

the Pareto's K would be large. According to the manual, large Pareto's K (>0.7) means the model posterior would be too different if one data point is being removed.

but more accurately this what you wrote, that is, the full data posterior is too different compared to leave-one-out posterior for the importance sampling to work well.

This suggest that the model is not capturing the data well (i.e. some data points with high K are highly influential and not being considered by the model).

This suggests that there can be a significant model misspecification (and some data points are considered too much). But there also cases where the model can be true one (just with unknown parameters), but still the full posterior and loo-posterior can be too different for the importance sampling to work well. This is common, for example, in case of flexible latent variable models such as Gaussian processes.

Following this logic, I can not trust the looic estimate if most (say>80%) of the Pareto's K is higher than 0.7.

In this case, you can not trust elpd (or looic) estimate computed with the importance sampling.

To deal with the issue with high Pareto's K, one can either change the model structure or transform the data. Are there any other solutions?

You can use k-fold-CV to compute corresponding elpd (or looic), which you can trust more, because it's not using the importance sampling.