SPDE overfitting?

1,378 views
Skip to first unread message

mike.bl...@gmail.com

unread,
Jun 26, 2018, 2:48:27 PM6/26/18
to R-inla discussion group
Hello all,

I have developed a binomial GLM (log link) with a PC matern spatial random effect to predict the distribution of a species of insect at a global scale. 

At the moment my model seems to be over-fitting rather drastically. I think this is due to the spatial random effect taking over; the scale 
of the random effect (posterior sigma u) is high compared to the environmental covariates, the number of number of effective parameters 
is high and the number of equivalent replicates is low. The map of the fitted values (probability of presence) shows quite drastic over-fitting.

I have been trying to address this by reducing the scale of the spatial effect with better prior choices on the spde, but I have so far been unable to alter its behaviour.
I have found this other thread helpful as we appear to have experienced a similar scenario, but I appear to be a bit stuck with this. I would like to ask for some advice. 

I have taken the approach described in Illian et al (2012) to choose a prior range that offers a similar level of smoothing to that of the environmental covariates. This 
was to try to make sure that the spatial effect would operate at a longer range and not explain variation that could be explained by the covariates. From comparing 
plots of the spatial field computed using a range of priors, a prior range of 10 degrees appears to be reasonable. Considering this I set the mesh max edge length 
to ~1 degree, following the advice that estimation should be unbiased if the mesh resolution is ~1/10 of the spatial range. (The values are all converted from degrees to 
radians for the spherical mesh). 
Regarding the prior on the standard deviation, because in the binomial (logit) model the spatial effect is multiplicative with the mean, I set the prior sigma to log(1.1), 
assuming an average of 10% +/- effect on the predicted mean spatial effect.
I have applied the default N(0, 1e4) to the priors on the regression coefficients.

These hyperpriors result in posterior estimates for the range of ~13 degrees, which I think makes sense, but posterior sigma u of ~2.5, despite a prior of log(1.1) ~= 0.04. This is even
after heavily constraining the spatial effect with the tail probability on the prior sigma [i.e. prior.sigma = c(log(1.1), 0.01)].

Altering these priors (much larger/smaller range etc) seems to have little effect on the results, the posterior fitted estimates follow almost exactly the data.

Below I copy some of the code I have used to create the estimation model and I have attached the verbose output. I use a prediction stack for the predictions, but I didn't include that yet to 
avoid too long a post, I can provide the full code and the data if that would help.

Any help or insight as to where I am going wrong is greatly appreciated!

Thanks for all the invaluable help available through this forum!

Mike 


#----------------------------------------------------------------------Data ####
world_map <- _a simplified SpatialPolygonsDataframe_
data <- _a SpatialPointsDataframe of presence points and corresponding covariates values, nrow ~20,000_
cov_rst <- _a raster stack of covariates_
## Format for model
cov_names <-  c("tsi", "BIO16", "BIO17", "ghf")  # covariates of interest (continuous variables)
data <- data[c("presence", cov_names)]
cov_rst <- raster::subset(cov_rst, cov_names, drop = T)
cov_est <- data@data[cov_names]
loc_est <- inla.mesh.map(data@coords, projection = "longlat")  # convert lat/lon to 3D xyz
y <- data$presence

#----------------------------------------------------------------------Mesh ####
## Domain boundary
boundary <- inla.sp2segment(world_map,
                            crs = inla.CRS("longlat"))
## Mesh parameters
range <- 10
max_edge <- range/10 * pi/180
kCut_off <- 0.5
min_angle <- 26
## Build mesh
(mesh <- inla.mesh.2d(
  boundary = boundary,
  cutoff = max_edge * kCut_off,
  max.edge = c(1, 5) * max_edge,  # (inner, outer)
  min.angle = min_angle,
  crs = inla.CRS("sphere")))$n
[1] 32795
## Plot mesh
plot(mesh$mesh, rgl = T)


#----------------------------------------------------------Projector matrix ####
A_est <- inla.spde.make.A(mesh, loc = as.matrix(loc_est)) 


#----------------------------------------------------------------SPDE model ####
## Hyperparameter priors 
prior_range0 <- 10   
prior_sigmau0 <- log(1.1) 
alpha <- 2
## SPDE model 
spde = inla.spde2.pcmatern(
  mesh = mesh,
  alpha = alpha, 
  prior.range = c(prior_range0 * (pi/180), 0.1),  # p(spatial range < prior.range0)
  prior.sigma = c(prior_sigmau0, 0.01),  # p(sigma > prior.sigma.u0)
  )


#----------------------------------------------------------Estimation stack ####
stack_est <- inla.stack(
  data = list(y = y),
  effects = list(s = 1:spde$n.spde,  # spatial random effect index
                 list(b0 = 1,  # intercept
                      cov_est)),  # covariates
  A = list(A_est, 1),
  tag = 'est')  


#------------------------------------------------------------------Formulae ####
coef_names <- stack_est$effects$names[2:length(stack_est$effects$names)]
## Fixed effects only
form_fx <- formula(paste("y ~ -1 +", paste(coef_names, collapse = "+")))
## Fixed effects and spatial random effect
form_full <- update.formula(
  form_fx, ~. +  
  f(s, model = spde))
> form_full
y ~ b0 + tsi + BIO16 + BIO17 + ghf + f(s, model = spde) - 1 


#----------------------------------------------------------Estimation model ####
mdl_est = inla(
  formula = form_full,
  family = "binomial",
  data = inla.stack.data(stack_est),
  control.predictor = list(A = inla.stack.A(stack_est),
                           compute = F),  
  control.family = list(control.link = list(model = "logit")),
  control.inla = list(int.strategy = "eb",  # empirical bayes (cheap computation for testing)
                      strategy = "gaussian",   # (cheap computation for testing)
                      correct = F,  # correction to improve laplace approximations for binary data
                      correct.factor = 1),  # default = 1, report uses 10 
  control.compute = list(dic = F,
                         waic = F,
                         config = T),
  verbose = T)

> summary(mdl_est) 

Call:
c("inla(formula = form_full, family = \"binomial\", data = inla.stack.data(stack_est), ",  "    verbose = T, control.compute = list(dic = F, waic = F, config = T), ",  "    control.predictor = list(A = inla.stack.A(stack_est), compute = F), ",  "    control.family = list(control.link = list(model = \"logit\")), ",  "    control.inla = list(int.strategy = \"eb\", strategy = \"gaussian\", ",  "        correct = F, correct.factor = 1))")

Time used:
 Pre-processing    Running inla Post-processing           Total 
         0.6667         89.6467          0.4205         90.7340 

Fixed effects:
             mean     sd 0.025quant 0.5quant 0.975quant    mode kld
b0        -4.6861 0.3013    -5.2777  -4.6861    -4.0950 -4.6861   0
tsi 0.4892 0.1048 0.2834 0.4892 0.6949 0.4892 0 BIO16 0.1493 0.0506 0.0500 0.1493 0.2486 0.1493 0 BIO17 0.0627 0.0604 -0.0560 0.0627 0.1812 0.0627 0 ghf 1.4381 0.0398 1.3601 1.4381 1.5161 1.4381 0 Random effects: Name Model s SPDE2 model Model hyperparameters: mean sd 0.025quant 0.5quant 0.975quant mode Range for s 0.2259 0.0182 0.1923 0.2251 0.2639 0.2236 Stdev for s 2.2012 0.1141 1.9881 2.1971 2.4360 2.1875 Expected number of effective parameters(std dev): 428.72(0.00) Number of equivalent replicates : 51.96 Marginal log-Likelihood: -3683.86
> (max(mdl_est[["summary.random"]][["s"]]$mean)) [1] 7.208389 > (mean(mdl_est[["summary.random"]][["s"]]$mean)) [1] -0.3690322 > (min(mdl_est[["summary.random"]][["s"]]$mean)) [1] -4.646531





global_mesh_sphere.png
inlaGG_GLM_verbose_output

mike.bl...@gmail.com

unread,
Jun 26, 2018, 2:51:08 PM6/26/18
to R-inla discussion group
Oops, this is the Illian reference, I forgot to link. 

Haakon Bakka

unread,
Jul 3, 2018, 4:32:01 AM7/3/18
to mike.bl...@gmail.com, R-inla discussion group
Hi Mike,

This problem is not uncommon. In some cases the most useful thing for the spatial random effect to do to model the data is to select a short range and large sigma. The way I interpret this is that the data needs a very local "overdispersion" and that the spatial random effect is the only candidate to model it.

To compensate then, I suggest adding a local dispersion effect, as an iid effect in the linear predictor (f( index, model="iid")), and to consider this as part of the noise model (observation likelihood).

For a code example with poisson likelihood:


Kind regards,
Haakon Bakka




On 26 June 2018 at 19:51, <mike.bl...@gmail.com> wrote:
Oops, this is the Illian reference, I forgot to link. 

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion-group+unsub...@googlegroups.com.
To post to this group, send email to r-inla-discussion-group@googlegroups.com.
Visit this group at https://groups.google.com/group/r-inla-discussion-group.
For more options, visit https://groups.google.com/d/optout.

mike.bl...@gmail.com

unread,
Jul 4, 2018, 9:01:10 AM7/4/18
to R-inla discussion group
Hi Haakon,

Thanks very much for your reply and thanks for all the fantastic resources on your website, they were instrumental to me to learning how to use INLA! 

I have tried adding the iid random effect as per your BT topic, but I cant seem to fix the over-fitting. Basically, I can get the model to flip between either the spde completely taking over and over-fitting or constraining it so much that it is essentially removed (very small posterior sigma_u and s_[i]). 
I have tried adding a sum to zero constraint, because I thought that perhaps the spde was correlated with the intercept. Then I tried an extra constraint to make the spde orthogonal to the covariates, but I still experience the drastic over-fitting.

After struggling with this for a while, I decided to test the model at a much smaller scale --- the African continent. At this scale the model behaves exactly as I would expect; there is a substantial spatial effect (due to preferential sampling and clustering the presence points etc), but it does not take over. 

This has made me come to the realisation that perhaps there are several spatial structures in this global data that are too hard to capture with the single spde. The global data are extremely non-randomly distributed with some very strong clustering in certain areas. Does that sound like a possible explanation to you?

I think for the moment I will proceed with my model at the African continental scale and hopefully revisit the global model in the future with some new ideas!

Thanks again for your time spent reading my post.

Cheers,

Mike


On Tuesday, July 3, 2018 at 9:32:01 AM UTC+1, Haakon Bakka wrote:
Hi Mike,

This problem is not uncommon. In some cases the most useful thing for the spatial random effect to do to model the data is to select a short range and large sigma. The way I interpret this is that the data needs a very local "overdispersion" and that the spatial random effect is the only candidate to model it.

To compensate then, I suggest adding a local dispersion effect, as an iid effect in the linear predictor (f( index, model="iid")), and to consider this as part of the noise model (observation likelihood).

For a code example with poisson likelihood:


Kind regards,
Haakon Bakka



On 26 June 2018 at 19:51, <mike.bl...@gmail.com> wrote:
Oops, this is the Illian reference, I forgot to link. 

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion-group+unsub...@googlegroups.com.
To post to this group, send email to r-inla-disc...@googlegroups.com.

Mark Myer

unread,
Jul 9, 2018, 11:59:55 AM7/9/18
to R-inla discussion group
Mike, 

When you fit the iid random effect, what was the magnitude of that effect vs. the spde? I'm having a very similar problem, and when I fit an iid for each observation the posterior precision is extraordinarily high (mean~3500), which I interpret (possibly incorrectly?) that the SPDE is explaining almost all the variability in my data. Did you see the same issue? 

mike.bl...@gmail.com

unread,
Jul 14, 2018, 12:14:43 PM7/14/18
to R-inla discussion group
Hi Mark,

Yes, the posterior precision of the iid was large, not quite as large as yours, mean ~300-400. For me the spde completely takes over with or without the iid and for what seems like all reasonable prior choices. 

However, I think I may have worked out why it is not working for my data set. In this paper by David Redding https://doi.org/10.1371/journal.pone.0187602 they found that for simulated (species distribution) data with a clumped and restricted distribution, non-spatial inla models had higher predictive accuracy than spatial (spde) inla models. They state that this is due to the models being built on data sampled from only a part of the potential distribution, which can risk overfitting. This seems like the problem I am experiencing, because in areas with presumably suitable conditions but no observations the spatial effect is highly negative, resulting in overfitting.  My data is very clumped and restricted.

So, for my data, I think that the spde is not appropriate. But I may well be wrong! Just my thoughts after quite a lot of testing, reading and head scratching!

Mike

Finn Lindgren

unread,
Jul 14, 2018, 12:29:32 PM7/14/18
to mike.bl...@gmail.com, R-inla discussion group
Hi,
It sounds like you’re saying your data exhibits more clustering than the chosen random field model allows. You can try changing the smoothness parameter to alpha=3/2 (which gives matern smoothness 1/2, I.e. exponential covariance, instead of the default alpha=2, which gives smoothness 1).

You say the spde becomes large&negative in regions with no observations, which can result from smooth extrapolation of spatial trends in the regions that do have observations. This effect is expected to be smaller for a smaller smoothness parameter.


Finn
--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.

Sylvan Benaksas

unread,
Jun 7, 2022, 7:49:43 AM6/7/22
to R-inla discussion group
Hi,

I am having a similar situation where I think the spde random effect is capturing most of the variation. Maybe adding a local dispersion iid effect as haakon suggests, but I am not sure how this would work with predictions, where I have points at unobserved locations.

Cheers,
Sylvan

Thierry Onkelinx

unread,
Jun 7, 2022, 8:16:42 AM6/7/22
to Sylvan Benaksas, R-inla discussion group
Dear Sylvan,

You can assume that the idd models some noise at the locations (the nugget of the semi-variogram). It only affects the point estimate of a specific measurement at a given location. Another observation at the same location is likely to get a different idd effect. Hence for the prediction you don't use the individual estimates of the idd random effect. Instead you need to add the variance of the idd effect.

model for location i: \eta_i = intercept + spde_i + iid_i

predict while ignoring the iid effect \eta_i = intercept + spde_i This prediction has variance \sigma^2_e. You need to add the variance of the iid effect to get the correct posterior distribution.
\eta_i = N(mean = intercept + spde_i, variance = \sigma^2_e + \sigma^2_iid)

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry....@inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////




Op di 7 jun. 2022 om 13:49 schreef Sylvan Benaksas <sylva...@gmail.com>:
Message has been deleted

Abi Riley

unread,
Jul 10, 2024, 12:00:24 PM (11 days ago) Jul 10
to R-inla discussion group
Dear Thierry

I know this post has been dormant for awhile, but I'm having a similar issue. I'm fitting my model to air pollution ground measurement sites and predicting on the grid. 

I get how to add the IID site term, but how do you practically go about adding the additional variance to the posteriors? 

Would this be within the INLA fit and predict together call, added after this, or where you constructed the posterior predictive distributions manually using inla.posterior.sample?

Any help on this would be great!

Best Wishes,
Abi Riley

PhD Student
Department of Epidemiology and Biostatistics
Imperial College London 
Reply all
Reply to author
Forward
0 new messages