Understanding "significance" of a spatial effect

Paul Lantos

unread,

Feb 12, 2017, 9:14:51 AM2/12/17

to R-inla discussion group

If my statistical hypothesis is that y is spatially heterogeneous, then what posterior parameter will tell me if this is true or false (or likely vs unlikely)?

More importantly, I'd like to compare models to see if this spatial effect persists after controlling for confounders.

This is a bit simpler to figure out in geoadditive modeling where I can get a p value for an s(x,y) term.

Thanks,
Paul

Haakon Bakka

unread,

Feb 13, 2017, 3:10:17 AM2/13/17

to Paul Lantos, R-inla discussion group

Hi Paul,

First and foremost, I would plot the sigma (SD) parameter for the spatial field, both prior and posterior.

If the posterior is "larger sigma" than the prior, I would consider the data spatially heterogenous.

You may also want to plot the prior/posterior for the inverse spatial range (to see that the posterior is choosing a shorter spatial range than the prior).

A flat spatial field is one with a very large range parameter.

Kind regards,

Haakon

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion-group+unsub...@googlegroups.com.
To post to this group, send email to r-inla-discussion-group@googlegroups.com.
Visit this group at https://groups.google.com/group/r-inla-discussion-group.
For more options, visit https://groups.google.com/d/optout.

Finn Lindgren

unread,

Feb 13, 2017, 3:34:03 AM2/13/17

to Haakon Bakka, Paul Lantos, R-inla discussion group

On 13 Feb 2017, at 08:10, Haakon Bakka <ba...@r-inla.org> wrote:

First and foremost, I would plot the sigma (SD) parameter for the spatial field, both prior and posterior.

If the posterior is "larger sigma" than the prior, I would consider the data spatially heterogenous.

If the prior includes large variances, this will not be the case, so it's only an "if" condition, not an "if and only if".

You may also want to plot the prior/posterior for the inverse spatial range (to see that the posterior is choosing a shorter spatial range than the prior).
A flat spatial field is one with a very large range parameter.

No, the commonly used intrinsic random fields have infinite range, and are not at all flat.

Furthermore, if no correlated spatial effect is needed, fields with range close to zero are indistinguishable from independent measurement noise.

Goodness of fit techniques for these situations that are "properly Bayesian" need more development, but there are some techniques. Diagnostic plots are useful. I'm pretty sure we wrote about posterior pointwise p-value-like calculations with you Paul earlier? The excursions package formalizes this, to calculate joint credible regions for where the field crosses zero; if that set covers the entire space, that is a strong indication that the random field might not be required.

Finn

Kind regards,
Haakon

On 12 February 2017 at 15:14, Paul Lantos <paul....@gmail.com> wrote:
If my statistical hypothesis is that y is spatially heterogeneous, then what posterior parameter will tell me if this is true or false (or likely vs unlikely)?

More importantly, I'd like to compare models to see if this spatial effect persists after controlling for confounders.

This is a bit simpler to figure out in geoadditive modeling where I can get a p value for an s(x,y) term.

Thanks,
Paul

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion-group+unsub...@googlegroups.com.
To post to this group, send email to r-inla-discussion-group@googlegroups.com.
Visit this group at https://groups.google.com/group/r-inla-discussion-group.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.

To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To post to this group, send email to r-inla-disc...@googlegroups.com.

Haakon Bakka

unread,

Feb 13, 2017, 5:53:06 AM2/13/17

to Finn Lindgren, Paul Lantos, R-inla discussion group

See comments below.

For your question on whether the spatial effect is still significant after covariates/confounders... that depends on what you mean with the question. When you conclude "no significant spatial effect" do you want this to mean that the data "does not need" a spatial effect in the model, or that the data "disproves" a spatial effect?

(In the common hypothesis testing framework, you are only allowed to "reject" H0, not "accept" it, that is why I ask for this clarification.)

In my experience, if you add some spatial covariates and iid noise, the model often "does not need" a spatial effect. That does not mean that there is no spatial effect, only that you do not need it to model your specific data. On the other hand, if you consider spatial area predictions, and use plug-in estimates for your covariate effects, all the variance in the predictions come from the spatial effect!

PS. I know this was a bit vague, but I hope it helps anyway.

PPS. A different way to think about these questions is model comparison.

Haakon

On 13 February 2017 at 09:33, Finn Lindgren <finn.l...@gmail.com> wrote:

On 13 Feb 2017, at 08:10, Haakon Bakka <ba...@r-inla.org> wrote:
First and foremost, I would plot the sigma (SD) parameter for the spatial field, both prior and posterior.

If the posterior is "larger sigma" than the prior, I would consider the data spatially heterogenous.

If the prior includes large variances, this will not be the case, so it's only an "if" condition, not an "if and only if".

Yes. On the other hand, if the posterior concentrates strongly around a value greater than zero, that would also show significance of the spatial effect, more-or-less independent of prior, would it not?

You may also want to plot the prior/posterior for the inverse spatial range (to see that the posterior is choosing a shorter spatial range than the prior).
A flat spatial field is one with a very large range parameter.

No, the commonly used intrinsic random fields have infinite range, and are not at all flat.
Furthermore, if no correlated spatial effect is needed, fields with range close to zero are indistinguishable from independent measurement noise.

Hmm, I was referring to inla results with matern model in particular. Sometimes, the inla function will result in a large range and a nonzero sigma, because of identifiability issues + problems with the computations. Therefore, even if the posterior of SD is far from zero, I would be careful if the spatial range was very long.

I agree, if the spatial range is very short, you may want iid noise instead.

Goodness of fit techniques for these situations that are "properly Bayesian" need more development, but there are some techniques. Diagnostic plots are useful. I'm pretty sure we wrote about posterior pointwise p-value-like calculations with you Paul earlier? The excursions package formalizes this, to calculate joint credible regions for where the field crosses zero; if that set covers the entire space, that is a strong indication that the random field might not be required.

Finn

Kind regards,
Haakon

On 12 February 2017 at 15:14, Paul Lantos <paul....@gmail.com> wrote:

If my statistical hypothesis is that y is spatially heterogeneous, then what posterior parameter will tell me if this is true or false (or likely vs unlikely)?

More importantly, I'd like to compare models to see if this spatial effect persists after controlling for confounders.

This is a bit simpler to figure out in geoadditive modeling where I can get a p value for an s(x,y) term.

Thanks,
Paul

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.

To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion-group+unsubscr...@googlegroups.com.

To post to this group, send email to r-inla-discussion-group@googlegroups.com.
Visit this group at https://groups.google.com/group/r-inla-discussion-group.
For more options, visit https://groups.google.com/d/optout.

Haakon Bakka

unread,

Feb 13, 2017, 10:35:21 AM2/13/17

to Finn Lindgren, Paul Lantos, R-inla discussion group

Please ignore the comment I made that

"

You may also want to plot the prior/posterior for the inverse spatial range (to see that the posterior is choosing a shorter spatial range than the prior).

A flat spatial field is one with a very large range parameter.

"

As it assumes you use the same prior for range as I do.

Restated: I recommend plotting the spatial field, and plotting the prior/posterior for range, and thinking carefully about what is going on if the range is long (compared to the size of your study area).

Kind regards,

Haakon

Paul Lantos

unread,

Feb 13, 2017, 10:47:45 AM2/13/17

to Finn Lindgren, Haakon Bakka, R-inla discussion group

Yes, thanks, you did help me calculate the pointwise p-value.

For GAM modeling however I can get a global p-value for the overall significance of coordinate space in the model. I can infer it two ways, actually -

1 - look at the spatial effect in the model summary, so if the model is gam( y ~ s(long, lat) ), then I can look at the p for s(long, lat).

2 - compare spatial to aspatial models, i.e. gam( y ~ s(long, lat) ) vs gam( y ~ 1 ) using an ANOVA.

I suppose I can do the latter with INLA models by running spatial and aspatial versions (as you'd advised taking out the A.object and the SPDE parameter) and seeing if there is a large difference in DIC?

On Mon, Feb 13, 2017 at 3:33 AM, Finn Lindgren <finn.l...@gmail.com> wrote:

On 13 Feb 2017, at 08:10, Haakon Bakka <ba...@r-inla.org> wrote:
First and foremost, I would plot the sigma (SD) parameter for the spatial field, both prior and posterior.

If the posterior is "larger sigma" than the prior, I would consider the data spatially heterogenous.

If the prior includes large variances, this will not be the case, so it's only an "if" condition, not an "if and only if".

You may also want to plot the prior/posterior for the inverse spatial range (to see that the posterior is choosing a shorter spatial range than the prior).
A flat spatial field is one with a very large range parameter.

No, the commonly used intrinsic random fields have infinite range, and are not at all flat.
Furthermore, if no correlated spatial effect is needed, fields with range close to zero are indistinguishable from independent measurement noise.

Goodness of fit techniques for these situations that are "properly Bayesian" need more development, but there are some techniques. Diagnostic plots are useful. I'm pretty sure we wrote about posterior pointwise p-value-like calculations with you Paul earlier? The excursions package formalizes this, to calculate joint credible regions for where the field crosses zero; if that set covers the entire space, that is a strong indication that the random field might not be required.

Finn

Kind regards,
Haakon

On 12 February 2017 at 15:14, Paul Lantos <paul....@gmail.com> wrote:

If my statistical hypothesis is that y is spatially heterogeneous, then what posterior parameter will tell me if this is true or false (or likely vs unlikely)?

More importantly, I'd like to compare models to see if this spatial effect persists after controlling for confounders.

This is a bit simpler to figure out in geoadditive modeling where I can get a p value for an s(x,y) term.

Thanks,
Paul

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.

To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion-group+unsubscr...@googlegroups.com.

To post to this group, send email to r-inla-discussion-group@googlegroups.com.
Visit this group at https://groups.google.com/group/r-inla-discussion-group.
For more options, visit https://groups.google.com/d/optout.

Finn Lindgren

unread,

Feb 13, 2017, 10:53:33 AM2/13/17

to Paul Lantos, Haakon Bakka, R-inla discussion group

Yes, both methods can be used with INLA (if the p-values in 1. are the one we discussed previously).

Finn

Paul Lantos

unread,

Feb 13, 2017, 10:59:02 AM2/13/17

to Finn Lindgren, Haakon Bakka, R-inla discussion group

Right, the p values would be the ones we've discussed using this general calculation:

mean<- model1f$summary.fitted.values[inla.stack.index(stk.full, "prediction")$data,"mean"]
sd<- model1$summary.fitted.values[inla.stack.index(stk.full, "prediction")$data,"sd"]
m <- mean(mean)
p <- 2*pmin(pnorm(m, mean, sd), 1-pnorm(m, mean, sd))

Finn Lindgren

unread,

Feb 13, 2017, 11:01:28 AM2/13/17

to Paul Lantos, Haakon Bakka, R-inla discussion group

Precisely. The "excursions" package I mentioned just extends that to calculate overall probabilities based on the joint posterior distribution (which takes care of the multiple testing problem).

Finn

Paul Lantos

unread,

Feb 13, 2017, 11:15:38 AM2/13/17

to Finn Lindgren, Haakon Bakka, R-inla discussion group

Good to know... now a question I suppose is whether pointwise significance is enough to say whether a process is spatially heterogeneous? Over how much of the coordinate space does the p value need to be low? And should it be based on the p value, or rather whether a spatial model improves upon an aspatial model?

Finn Lindgren

unread,

Feb 13, 2017, 11:27:36 AM2/13/17

to Paul Lantos, Haakon Bakka, R-inla discussion group

Well, both aspects are useful to know.

But I don't think of it as p-values; for the excursions package, we essentially find "the largest pair of regions A and B such that with at least probability 1-alpha, the field is entirely above zero in A, and entirely below zero in B". The complement of the union of A and B is a level 1-alpha credible region for where the field crosses the level zero. But the sets A and B themselves are useful for their own sake, and are the regions where one can confidently say the field is not zero. Thi can be done either for the entire spatial prediction field, or separately for the SPDE component of a model, which is more what you're aiming for here I think.

These two papers discuss it in more detail:

http://onlinelibrary.wiley.com/doi/10.1111/rssb.12055/full

http://amstat.tandfonline.com/doi/abs/10.1080/10618600.2016.1228537

The second paper has an example of what can happen when some fixed effects are included/excluded.

Finn

Reply all

Reply to author

Forward