Re: [r-inla] Variability explained

154 views
Skip to first unread message

Finn Lindgren

unread,
Jun 3, 2024, 10:01:09 AM6/3/24
to R-inla discussion group
Hi,

this question has multiple potential answers, depending on the reason
for asking it, and no one true correct answer. For instance, since the
spatial field component has posterior correlation with the "fixed
effects", the variation contribution of each individual effect can add
up to more than the combined variability, so computing a "percentage
of explained variance" doesn't necessarily mean what one thinks it
means.

But you may be on the right track in thinking about what happens to
the model when removing one model components, and comparing the
results. We did something along those lines in
Yuan Yuan, Fabian E. Bachl, Finn Lindgren, David L. Borchers, Janine
B. Illian, Stephen T. Buckland, Håvard Rue, and Tim Gerrodette (2017),
"Point process models for spatio-temporal distance sampling data from
a large-scale survey of blue whales", Annals of Applied Statistics,
11, 2270--2297, doi:10.1214/17-AOAS1078
There, we compared the spatial model predictions with and without the
random field component present in the estimation, and computed the
overall spatial variability (in the spatial context, overall spatial
variability is arguably the relevant quantity, and not raw statistical
variance at the observation locations).
What you describe sounds similar, but with looking only at the
locations where you have observations.
Both versions require you to run the model twice; one with the f(s,
...) component, and once without it. There's no special magic to
this.

Note:
Due to the posterior dependence, I prefer looking at the combined
component effects when comparing models; i.e. I would mainly compare
combined predictor values, and not the individual component estimates,
as those have different interpretation in the presence of other model
components (spatial confounding is virtually ubiquitous, and one
should not expect "spatial fixed effect" estimates to remain unchanged
when adding other spatial components to the model).

Finn

On Mon, 3 Jun 2024 at 03:26, Richard <richard...@gmail.com> wrote:
>
> Dear INLA team
>
> I have created a spatial model and I am interested to see how much of the variability is explained by each term, especially by the spatial effect, is there any way to do this? Let´s say I have this model:
>
> y ~ x1 + x2 + x3 + f(s, model = spde)
>
> Is there any way to check how much each terms help to explain the variability?
>
> One thing I did, but I am not sure if there is a better option, and it is just graphical, is to see how well the posterior follows the sample, so I use inla.posterior.sample and get the first n-predictors that correspond to the sample of size n, the idea is that those predictors are similar to the observed data.
>
> I am not sure if by removing each of terms once and see how much the posterior changes compared to the data could help to show how much each term helps to compute the posterior but is not exactly an explanation of the variability and I am more interested in something numeric rather than graphic.
>
> Is there any way to do this?
>
> Thank you
>
>
> --
> You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
> To view this discussion on the web, visit https://groups.google.com/d/msgid/r-inla-discussion-group/52912b2c-c085-4004-953f-a398afc3e7efn%40googlegroups.com.



--
Finn Lindgren
email: finn.l...@gmail.com

Richard

unread,
Jun 3, 2024, 4:02:19 PM6/3/24
to R-inla discussion group
Hi Finn, thank you for your answer. I read the paper you mentioned and I think that I get what you mentioned.
Reading the paper in some part you did this

imagen.png

So I gues that I should do what you mentioned, run the model the first time with the spatial effect and see which ones do not have the 0 in the posterior credible interval and then a second time without the spatial effect and see which ones have the 0 and see how in the presence of f(s, model = spde) some variables can or cannot explain the spatial effect?

And to compute the overall spatial variability I think that with INLA one can do that using this:

mod.field = inla.spde2.result(res, name="s", spde)
mod.field$summary.hyperpar

So I get theta1 and theta2 and their mean, sd, an posterior credible interval, then if theta2 (I guess that is sigma and theta1 is the range) is significant their mean would be the overall spatial variability?

And lastly, one question that appeared while reading this, I can use the significance of the variables to see how this changes with and without spatial effect as mentioned above, but even if the variable is non significant that is not enough reason to remove the variable, even in the presence of a spatial effect, right? I am asking this because I did a model selection using the DIC criteria but when I see the significance to see how much of the spatial effect is explained by them, one of my two covariables are not significant so that covariable cannot explain the log-spatial effect (I have a lognormal model) but the predictions are better with this term (and I am mostly interested only on those predictions), but in the paper you mentioned, because of that significance you are going to remove the variables, so what should be the correct procedure here?

Thank you!!

Richard

unread,
Jun 3, 2024, 4:34:27 PM6/3/24
to R-inla discussion group
Just to add something about Theta1 and Theta2, I see that theta1 is negative so I guess is not the range of the SPDE but then I am not sure what does theta1 and theta 2 mean:

mod.field = inla.spde2.result(res, name="s", spde)
mod.field$summary.hyperpar



Finn Lindgren

unread,
Jun 4, 2024, 2:56:11 AM6/4/24
to Richard, R-inla discussion group
Theta1 and theta2 are the internal parameterisation of the “humanly interpretable” parameter. The spde2.result function converts them into humanly interpretable format. But when using pcmatern models, already the $summary.hyperpar output is in humanly interpretable scale (range and sigma).

However, when I said spatial variability I meant just that, and _not_ the model _parameter_ sigma. The distinction is subtle, and if one had a iid model component they would be almost the same as each other. But in the presence of strong spatial correlation, the actual variability within the domain can be _smaller_ than the model parameter value.
I’m referring to the difference between
  Var(u(s)|Y) = sigma^2,
which  is the posterior  variance of the random field u(.), and
  V(D) = \frac{1}{|D|} \int_D [u(s) - M(D)]^2 ds,
where
  M(D) = \frac{1}{|D|} \int u(s) ds

The posterior expectation of V(D) is the “posterior expected mean square deviation from the spatial average”, and this is more directly measuring the amount of spatial variability in u(s) than \sigma^2.

That said, people tend to ignore that and just talk about sigma anyway…

“Statistical insignificance” of a variable is usually not a reason to remove it from the model; I would only remove it if keeping it in the model causes numerical problems or excessive runtime. But this is a large complicated philosophical issue that one can spend weeks on in a modern class on statistical modelling…

Finn

On 3 Jun 2024, at 22:34, Richard <richard...@gmail.com> wrote:

Just to add something about Theta1 and Theta2, I see that theta1 is negative so I guess is not the range of the SPDE but then I am not sure what does theta1 and theta 2 mean:

Richard

unread,
Jun 5, 2024, 6:07:02 PM6/5/24
to R-inla discussion group
Thank you Finn everything you wrote was really useful.
Reply all
Reply to author
Forward
0 new messages