Allowing for spatially varying coefficients - model specification and interpretation

1,274 views
Skip to first unread message

TDM

unread,
Dec 10, 2015, 12:27:32 PM12/10/15
to R-inla discussion group

Hi all,

 

I am trying to code an INLA model and need some help with specification and interpretation. The data are on insecticide use across 2500 counties in the US.  The relationship I am most interested in is the overall, general relationship between insectHarv (insecticide use) and cHarv (landscape characteristics).  The rest of the model is made up of covariates of lesser interest.  I am struggling with the issue of how to allow for spatial variation in the insectHarv-cHarv relationship (acknowledging some non-stationarity).  I have been reading about SVC models.  Here is the formula I am working with:

 

formula1 <- insectHarv ~

1 + year + cCFV + cCorn + cSW + cInc + cGDD + cHF + cHarv +

f(id_area, model="besag", graph=counties1.adj) +

f(id_area1, year, model="ar1") +

f(id_area_year, model="iid") +

f(id_area2, cHarv, model="besag", graph=counties1.adj)

 

Here is my, quite possibly incorrect, interpretation:

 

The second line in the formula specifies the fixed effects.  These include a fixed effect for year (there are four years of repeated measurements per study site), a bunch of fixed covariates, and a fixed effect for cHarv.  Given the rest of the formula, I am thinking that this gives the overall weighted average effect of cHarv on insectHarv after accounting for a bunch of other stuff.

 

The third line accounts for the strong spatial autocorrelation within a year across sites not accounted for by the fixed effects, using a CAR model, where area_id is the identifier for each study site.

 

The fourth line accounts for the strong temporal autocorrelation across years within a site, using an AR1 model.

 

I don’t know what the fifth line does.  I copied it from other INLA examples.  But I imagine it allows for a space-time interaction of some sort.  Any additional insight?

 

It is not clear to me what the final line does.  I hope that it is related to spatially varying coefficients, in that it allows an estimate of local random variation in the insectHarv-cHarv relationship. The output of the model is:

 

Fixed effects:

               mean     sd  

(Intercept)  0.2065 0.0039    

year         0.0275 0.0039    

cCFV         0.5758 0.0160    

cCorn        0.4561 0.0163    

cSW         -0.0076 0.0112   

cInc         0.0077 0.0015    

cGDD         0.0153 0.0068    

cHF          0.0192 0.0029    

cHarv        0.0638 0.0176    

 

Random effects:
Name             Model
id_area          Besags ICAR model 
id_area1         AR1 model 
id_area_year     IID model 
id_area2         Besags ICAR model 
 
Model hyperparameters:
                                             mean        sd 
Precision for the Gaussian observations  321.2942   21.3716   
Precision for id_area                     78.3295    4.1063    
Precision for id_area1                  5366.5665 1394.9801  
Rho for id_area1                           0.9898    0.0033     
Precision for id_area_year               321.2942   21.3716   
Precision for id_area2                    73.9471   31.4321 

 

 

I interpret the last line in the fixed effects section to mean that there is an overall positive effect of cHarv on insectHarv.  I interpret the last line in the hyperparameters section to mean that the overall positive effect of cHarv on insectHarv occurs despite some local variation in the relationship.  I am thinking about these results in the same way that one would think about a random slopes model in a simpler mixed-model context.  I wonder how far off I am.

 

Is there alignment between my model specification, results, and interpretation?  THANKS A TON for any insight you have.

 

Best,

Tim

Elias T Krainski

unread,
Dec 10, 2015, 3:46:17 PM12/10/15
to r-inla-disc...@googlegroups.com

On 10/12/15 18:27, TDM wrote:

f(id_area1, year, model="ar1") +

If 'id_area1' is index for the areas, it does not makes sense to me since 'ar1' model is adequate for one dimensional index.

f(id_area_year, model="iid") +

If 'id_area_year' is an index that goes from 1 up to the number of areas times the number of years, then you have an intercept for each area and each time, as the type I interaction model in [1].

f(id_area2, cHarv, model="besag", graph=counties1.adj)

here you have a spatially varying coefficient model. Since you have 'cHarv' as fixed effect then you have
  (beta_cHarv + beta_cHarv_i ) cHarv_it
by default the 'besag' effects are constrained to sum up to zero and you need it when having the overall slope in the fixed effect. However, you can take out this fixed effect and then include constr=FALSE in the spatially (besag) varying coefficents.

Elias


TDM

unread,
Dec 11, 2015, 2:14:48 PM12/11/15
to R-inla discussion group

Thanks for your response, Elias!  After reading your comments, I went back and reread chapters 6 and 7 in the INLA book, and modified the model.  I think I now have something more sensible.  Do you see any technical errors in the model below?

 

formula1 <- insectHarv ~ 1 +                                # y and b0

  year + cCFV + cCorn + cSW + cInc + cGDD + cHF + cHarv +   # overall fixed effects

  f(id_area, model="bym", graph=counties1.adj) +            # for spatial autocorr

  f(id_area1, model="iid") +                                # for repeated measures

  f(id_area2, cHarv, model="bym", graph=counties1.adj) +    # spatial variation in cHarv

  f(id_year, cHarv, model="iid")                            # temporal variation in cHarv

 

This model yields a posterior mean fixed effect estimate for cHarv of 0.0814.  If I create a histogram (stem-and-leaf plot) like this:

 

stem(model1$summary.random$id_area2[,2] + 0.0814, scale=2)

 

I get this:

 

   The decimal point is 2 digit(s) to the left of the |

 

   -6 | 38661100

   -4 | 8877443332211109999888777766555555555544444432222221110000

   -2 | 99999998887777777776666666666666665555555555555544444444444444444444+187

   -0 | 99999999999999998888888888888888888888888888888888888888777777777777+426

    0 | 00000001111111111111111111111111111111222222222222222222222222222222+517

    2 | 00000000000000000000000000001111111111111111111111222222222222222222+474

    4 | 00000000000000000000000011111111111111111111111122222222222222222222+429

    6 | 00000000000000000000000001111111111111111122222222222222222222333333+349

    8 | 00000000000000000000000000000000011111111111111111111111111111111111+511

   10 | 00000000000000000000000000000000000111111111111111111111111222222222+504

   12 | 00000000000000000000000000111111111111111111111111111111111222222222+595

   14 | 00000000000000000000000000000000000000000011111111111111111111111111+510

   16 | 00000000000000000000000000011111111111111112222222222222222222223333+372

   18 | 00000000000000000000111111111111112222222222222222223333333333333333+191

   20 | 00000001111112222222222233333333333333333444445555667777778888999990+13

   22 | 01222333344455678889908

   24 | 03569

 

For the moment, ignore the fact that I added means to means.  Does this represent the local variation in the cHarv slopes across space?  Also, is it legitimate to map these values?

 

Similarly:

 

stem(model1$summary.random$id_year[,2] + 0.0814, scale=2)

 

  The decimal point is 2 digit(s) to the left of the |

 

   0 | 6

   2 | 6

   4 |

   6 |

   8 |

  10 | 2

  12 |

  14 |

  16 | 1

 

Does this represent the variation in the cHarv slopes over time?

 

Thanks so much for your help!

 

Best,

Tim

INLA help

unread,
Dec 14, 2015, 3:51:44 AM12/14/15
to TDM, R-inla discussion group
On Fri, 2015-12-11 at 11:14 -0800, TDM wrote:

  f(id_area2, cHarv, model="bym", graph=counties1.adj) +    # spatial

since you have a space-varying regression term, I guess you want to add

constr=FALSE

so that the coofs does not sum to zero (sum to zero does not make
sense)

H
--
Håvard Rue
he...@r-inla.org


Elias T Krainski

unread,
Dec 14, 2015, 3:58:45 AM12/14/15
to r-inla-disc...@googlegroups.com
'bym' includes already an 'iid' term. So, don't you think that it is too
much to have 'bym'+'iid'? Also, instead 'bym' you may consider 'bym2'
which has better parametrization.

may you use maps and time series plot stead stem(). Or just
plot(model11)
as it adds credibility intervals.

Elias

INLA help

unread,
Dec 14, 2015, 4:13:33 AM12/14/15
to TDM, R-inla discussion group
On Mon, 2015-12-14 at 09:51 +0100, INLA help wrote:
> On Fri, 2015-12-11 at 11:14 -0800, TDM wrote:
> >  
>   f(id_area2, cHarv, model="bym", graph=counties1.adj) +    # spatial
>
> since you have a space-varying regression term, I guess you want to
> add
>
> constr=FALSE
>
> so that the coofs does not sum to zero (sum to zero does not make
> sense)
>
> H

sorry, my answer is wrong. I think model 'besag' make much more sense
here, and then one should use constr=F




--
Håvard Rue
he...@r-inla.org


Tim Meehan

unread,
Dec 14, 2015, 5:06:32 AM12/14/15
to he...@r-inla.org, R-inla discussion group
Thanks for the input!  Here is the model I've been working with:

# model1
formula1<- insectHarv ~
  # fixed
  1 + year + cCFV + cCorn + cSW + cInc + cHF + cGDD + cHarv +
  # random spatial
  f(id_area_it, model="iid") +
  f(id_area_is, model="besag", graph=counties1.adj) +
  # space time Harv
  f(id_area_sHarv, cHarv, model="besag", graph=counties1.adj) +
  f(id_year_tHarv, cHarv, model="iid")

Am I right to think that the the first random term will help account for repeated measures at a site?

Am I right to think that the third random term will allow an estimate for variation in cHarv over space, independent of time?

Am I right to think that the fourth random term will allow an estimate for variation in cHarv over time, independent of space?

# run model
model1 <- inla(formula1,family="gaussian",data=df2,
                       control.predictor=list(compute=T),
                       control.compute=list(dic=T,cpo=F),
               verbose=T)

The model runs fine without 'constr=FALSE'.  After completion, I run this code (note that I add the fixed effect) and get:

z <- model1$summary.random$id_area_sHarv[,2] + 0.081
hist(z[z>-0.15], breaks=50)
d <- data.frame(x=df2$x, y=df2$y, z=z)
ggplot2::ggplot(d,aes(x=x, y=y, colour=z)) + geom_point() +
  scale_colour_gradient(limits=c(0, .2), high="green", low="red") +
  theme_bw()


Inline image 3
Inline image 2

Is this a reasonable way to look at variation in the cHarv effect?  Note that the plot only shows county centroids with data, but the graph for computation has all counties included.

But when I run with constr=FALSE:

# model1
formula1<- insectHarv ~
  # fixed
  1 + year + cCFV + cCorn + cSW + cInc + cHF + cGDD + cHarv +
  # random spatial
  f(id_area_it, model="iid") +
  f(id_area_is, model="besag", graph=counties1.adj) +
  # space time Harv
  f(id_area_sHarv, cHarv, model="besag", graph=counties1.adj, constr=FALSE) +
  f(id_year_tHarv, cHarv, model="iid", constr=FALSE)

INLA crashes:

file: smtp-taucs.c  hgid: d246233942e4  date: Wed May 27 20:53:00 2015 +0200
Function: GMRFLib_factorise_sparse_matrix_TAUCS(), Line: 829, Thread: 0
Fail to factorize Q. I will try to fix it...

GMRFLib version 3.0-0-snapshot, has recived error no [12]
Reason    : The Newton-Raphson optimizer did not converge
Message   : Condition `lambda < 1.0 / lambda_lim' is not TRUE
Function  : GMRFLib_init_GMRF_approximation_store__intern
File      : approx-inference.c
Line      : 2842
RCSId     : file: approx-inference.c  hgid: d246233942e4  date: Wed May 27 20:53:00 2015 +0200



Any ideas?

Thanks again!!!!


Elias T Krainski

unread,
Dec 14, 2015, 5:23:38 AM12/14/15
to r-inla-disc...@googlegroups.com
Hi Tim,

Am I right to think that the third random term will allow an estimate for variation in cHarv over space, independent of time?
Am I right to think that the fourth random term will allow an estimate for variation in cHarv over time, independent of space?

variation in the *effect* (the regression coefficient) of cHarv

But when I run with constr=FALSE:
With constr=FALSE in that term you need to take out the + cHarv term. So, you can have
 1: the + cHarv and constr=TRUE
 2: only the f(idxxx, cHarv, model='besag', constr=FALSE) term.
In the first case you have (\beta + \beta_i) * cHarv (constraint needed) and in the second option you have only \beta_i * cHarv (no constraint need).

Elias

Håvard Rue

unread,
Dec 14, 2015, 5:32:31 AM12/14/15
to Elias T Krainski, r-inla-disc...@googlegroups.com
Elias is correct, I didn't see the +cHarv term in the formula

H
--
Håvard Rue
Department of Mathematical Sciences
Norwegian University of Science and Technology
N-7491 Trondheim, Norway
Voice: +47-7359-3533 URL : http://www.math.ntnu.no/~hrue
Mobile: +47-9260-0021 Email: havar...@math.ntnu.no

R-INLA: www.r-inla.org


Tim Meehan

unread,
Dec 14, 2015, 5:45:54 AM12/14/15
to hr...@r-inla.org, Elias T Krainski, R-inla discussion group
I'm starting to catch on.  Thanks again!



--
You received this message because you are subscribed to a topic in the Google Groups "R-inla discussion group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/r-inla-discussion-group/X_g4NuqqHfY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to r-inla-discussion...@googlegroups.com.
To post to this group, send an email to r-inla-disc...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages