Question about Interpretation of Validation Metrics from inla.group.cv

28 views
Skip to first unread message

Guido Fioravanti

unread,
May 12, 2025, 10:18:55 AM5/12/25
to R-inla discussion group

Dear INLA group,

I have a question regarding the interpretation of three validation metrics obtained using the inla.group.cv function. My goal is to compare a calibration model (a spatio-temporal regression model) against a data fusion model (Bayesian melding), where the latter is a joint model with two likelihoods.

For each model, I computed the following monthly metrics:

  • Negative logarithmic score (LS)

  • Dawid-Sebastiani score (DS)

  • Kullback-Leibler divergence (KLD)

The results are provided in the attached file. My question concerns an apparent inconsistency:

  • The KLD suggests that the data fusion model performs better across all months.

  • However, the LS and DS metrics indicate either similar performance between the two models or a slightly better performance of the calibration model in certain months.

I would have expected these metrics to show more coherent behavior. Additionally, other metrics I computed (RMSE and MAE, not shown here) align more closely with LS and DS, suggesting comparable or slightly better performance for the calibration model.

How can I reconcile the KLD results with the other metrics? Is there an interpretation or methodological consideration I might be missing?

I would greatly appreciate any insights or suggestions you might have.

Thanks for your help,

Guido



metrics.png

Håvard Rue

unread,
May 12, 2025, 5:14:54 PM5/12/25
to Guido Fioravanti, R-inla discussion group

In general, there is no answer what is the 'best' one, as this is defined by
what you chose to use.

I would also guess that the KLD one is the most ``non-robust'' one, as shown in
the plot, so I would likely avoid that one, so then its less of an issue which
one you chose.

Best
H



On Mon, 2025-05-12 at 07:18 -0700, Guido Fioravanti wrote:
> Dear INLA group,
> I have a question regarding the interpretation of three validation metrics
> obtained using the inla.group.cv function. My goal is to compare a calibration
> model (a spatio-temporal regression model) against a data fusion
> model (Bayesian melding), where the latter is a joint model with two
> likelihoods.
> For each model, I computed the following monthly metrics:
>  * Negative logarithmic score (LS)
>  * Dawid-Sebastiani score (DS)
>  * Kullback-Leibler divergence (KLD)
> The results are provided in the attached file. My question concerns an
> apparent inconsistency:
>  * The KLD suggests that the data fusion model performs better across all
> months.
>  * However, the LS and DS metrics indicate either similar performance between
> the two models or a slightly better performance of the calibration model in
> certain months.
> I would have expected these metrics to show more coherent behavior.
> Additionally, other metrics I computed (RMSE and MAE, not shown here) align
> more closely with LS and DS, suggesting comparable or slightly better
> performance for the calibration model.
> How can I reconcile the KLD results with the other metrics? Is there an
> interpretation or methodological consideration I might be missing?
> I would greatly appreciate any insights or suggestions you might have.
> Thanks for your help,
> Guido
>
>
> --
> You received this message because you are subscribed to the Google Groups "R-
> inla discussion group" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to r-inla-discussion...@googlegroups.com.
> To view this discussion, visit
> https://groups.google.com/d/msgid/r-inla-discussion-group/7abd4d06-cf0d-45d5-8dd1-8e8ff9dfb9f2n%40googlegroups.com
> .

--
Håvard Rue
hr...@r-inla.org
Reply all
Reply to author
Forward
0 new messages