Interpreting CPO Scores

Paul May

unread,

Apr 22, 2022, 12:43:03 PM4/22/22

to R-inla discussion group

Hi,

I have some trouble understanding CPO scores,

CPO_i = p(y_i | y_(-i) ) ,

and how they can be interpreted to identify surprising observations.

It seems to me the CPO is highly dependent on the variability of y | y_(-i). For instance, if

y | y_(-1) ~ N(0, 1)

and

y | y_(-2) ~ N(0, 10^6)

CPO_2 will likely be MUCH smaller than CPO_1, even if y_1 and y_2 are both plausible values given their respective distributions.

In "Implementing Approximate Bayesian Inference using Integrated Nested Laplace Approximation: a manual for the inla program", page 59, it is mentioned that the CPOs must be calibrated, and one way to do this is through PITs. Are the PITs the calibrated values themselves, or a means to calibrate the CPO scores?

Thanks,

Paul

Finn Lindgren

unread,

Apr 22, 2022, 1:00:32 PM4/22/22

to Paul May, R-inla discussion group

Hi Paul,

The behaviour of log p(y_i|y_{-i}) is probably best understood through the lens of proper scoring rules (see e.g. Gneiting and Raftery’s JASA paper from 2005 or 2007),

As it’s essentially the log-score.

When the true predictive distribution is G, and the claimed predictive distribution is F, and the observed value is y, let the score be S(F,y). (For cpo, the distributions are the leave-one-out predictive distributions) log-cpo is a positively oriented score, I.e. a large value is “good”.

A proper score has the property that the expectation over the true distribution is optimized when the claimed distribution F matches the true one, G, I.e.

S(F,G) := E_{y~G}[S(F,y)]

Is maximized when F=G.

Assume that the true predictive distribution is G~N(0,1). Then F~N(0,10^6) and F~N(0,10^{-6}) would received a lower score than F~N(0,1), on average.

So we can use proper scores to compare different predictions under the same circumstances, e.g. two different models for the same data.

But in your example it seems you wanted to change what you were conditioning _on_, but that’s not the situation handled by these methods. They only cover the case when there is a fixed (but unknown) distribution to be predicted.

So the total sum or average of log-cpo values isn’t really useful in itself. But if you take the pairwise differences between the scores computed from two different models for the same data, then they become more comparable. The variability of each difference still depends on the true G_i distribution, but at least the ordering is informative.

If one knew the true G_i, one could compute the theoretical maximum score expectation, but that’s only possible in toy examples where the true conditional distributions is already known (and these aren’t always philosophically well-defined).

For internal model validation, the pit values are much more meaningful, as they are interpretable for a single model, without the need to compare with another model.

Finn

On 22 Apr 2022, at 17:43, Paul May <paul.br...@gmail.com> wrote:

Hi,

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/r-inla-discussion-group/e6ef5625-282a-487e-9cde-783e84525390n%40googlegroups.com.

Finn Lindgren

unread,

Apr 22, 2022, 1:01:58 PM4/22/22

to Paul May, R-inla discussion group

Forgot to clarify: the log-cpo values are log-densities or log-probabilities, but the pit values are CDF values, so they are not directly computable from the cpo values. They are distinct quantities. I’m not sure what that “needs to be calibrated” mention really weas meant to convey.

Finn

On 22 Apr 2022, at 17:43, Paul May <paul.br...@gmail.com> wrote:

Hi,

Helpdesk

unread,

May 1, 2022, 7:48:09 AM5/1/22

to Paul May, R-inla discussion group

you are perfectly right about this observations. PIT values are Prob(y_i
< y_{-i}), which are calibrated and do offer a good interpretation. (for
discrete outcomes its not that good...)

CPO, yes care needs to be taken to compare comparable quantities.

--
Håvard Rue
he...@r-inla.org

Reply all

Reply to author

Forward