Fitting Spatial GLMs to proportion data using INLA-SPDE approach -Error in building stack object for prediction

511 views
Skip to first unread message

SODE Idelphonse

unread,
Mar 2, 2021, 6:03:52 AM3/2/21
to R-inla discussion group

Hello Dear all,

I am new to INLA. I 'am trying to fit a spatial binomial model to proportion data (y=count of postitive test, Ntrials= number of individuals tested at location s). This is an aggregated data at community level (each village is a point s and I have 31 points for  the 1st region in which I am trying the model for now). I followed the steps presented on "gambia" data in these two practical books:
- Blangiardo and Cameletti (2015) -Spatial and spatio-temporal models with R-INLA (chp.6) and 
- Moraga (2019) -Geospatial Health Data: Modeling and vizualisation with R-INLA and Shiny (chap. 9).

I fited the model in R-INLA with success using SPDE approach (''s.index'' are spatial random effects indexes and "ID'' are specific-community variations indexes) with all covariates scaled.
Now I am trying to make prediction at grid locations to get a finer resolution map of prevalence (proportion of positive) over each suty area (1332 grids of size ~1km² for the current region). As such, I have to build an INLA stack object for making a joint estimation/prediction as defined bellow, but my code doesn't work.

> stack.pred.resp <- inla.stack(data = list(y = NA, Ntrials = NA),
            +         A  = list(A.pred, 1, 1, 1, 1, 1, 1, 1),
            +                                effects = list(c(s.index, list(Intercept=1)),
            +                                         list(ben_ppp = data.sc$ben_ppp),
            +                                         list(bio1_wc30s = data.sc$bio1_wc30s),
            +                                         list(bio12_wc30s = data.sc$bio12_wc30s),
            +                                         list(bio4_wc30s = data.sc$bio4_wc30s),
            +                                         list(mimq_wc30s = data.sc$mimq_wc30s),
            +                                         list(dst_coastlin = data.sc$dst_coastlin),
            +                                         list(ID = data.sc$ID)),
            +                                   tag ="pred.resp")

Error in (function (data, A, effects, tag = "", compress = TRUE, remove.unused = TRUE)  :
            Row count mismatch for A: 1332,31,31,31,31,31,31,31


 The rest of my code for specifying my grids locations and other ''statck object ''' is as follows:

x.lim <- range(coords[,1])     # range of x coordinate of villages
y.lim <- range(coords[,2])     # y range
grid.x <- 37    # row number
grid.y <- 36    # col number
pred.grid <- expand.grid(x = seq(x.lim[1], x.lim[2], length.out = grid.x),
                         y = seq(y.lim[1], y.lim[2], length.out = grid.y))  
pred.grid <- as.matrix(pred.grid)
dim(pred.grid)   # 1332 cells   

A.pred <- inla.spde.make.A(mesh = mesh, loc = pred.grid)

inf.stack.est <- inla.stack(data = list(y = data.sc$N_infec, Ntrials = data.sc$N_tested),   
                              A  = list(A.est, 1, 1, 1, 1, 1, 1, 1),
                              effects = list(c(s.index, list(Intercept =1)),
                                            list(ben_ppp = data.sc$ben_ppp),
                                            list(bio1_wc30s = data.sc$bio1_wc30s),
                                            list(bio12_wc30s = data.sc$bio12_wc30s),
                                            list(bio4_wc30s = data.sc$bio4_wc30s),
                                            list(mimq_wc30s = data.sc$mimq_wc30s),
                                            list(dst_coastlin = data.sc$dst_coastlin),
                                            list(ID = data.sc$ID)),
                             tag ="est")

stack.pred.latent <- inla.stack(data = list(xi = NA),
                                    A = list(A.pred),
                                    effects =list(s.index),
                                    tag ="pred.latent")

stack.pred.rsp <-   inla.stack (... )      # as defined above

join.stack.inf  <- inla.stack(inf.stack.est, stack.pred.latent, stack.pred.resp)
join.data.inf  <- inla.stack.data(join.stack.inf)
index.pred.resp   <- inla.stack.index(join.stack.inf, tag ="pred.resp")$data

join.inla.inf <- inla(formula, data =inla.stack.data(join.stack.inf),
                Ntrials = join.data.inf$Ntrials,
                control.family =list(link ="logit"),
                control.predictor =list(A =inla.stack.A(join.stack.inf, spde=spde),
                                      compute =TRUE, link=1))

# Estimates of prevalence
prev.mean.inf  <- join.inla.inf$summary.fitted.values[index.pred.resp, "mean"]
prev.sd.inf    <- join.inla.inf$summary.fitted.values[index.pred.resp, "sd"]

Questions

1) How may I fix the above error ? May I reduce the number of my grids cells ?

2) Without covariates in prediction stack object, I obtained a prediction of prevalence  that is around 0.49999 over all the study region and the DIC=221.57 (close to 221.02 for model without spatial component).

Is because there is no spatial effect on my data I have such strange outcome ?

3) Is possible to fit a ''bym'' model to data in INLA as alternative using the distance between points for contructing the ''graph'' file ? How to obtain a finer map of prediction of prevalence oer the study region by doing like that ?

Any help will be very apprecieted.

Kind regards.

Finn Lindgren

unread,
Mar 2, 2021, 7:36:22 AM3/2/21
to SODE Idelphonse, R-inla discussion group
Hi,
I think I can answer the stack error (but not the other questions at the moment):

It looks like the covariate information you're providing to the prediction stack only has 31 values, but there are 1332 prediction points.
For spatial prediction that includes the covariate effects, the covariate values need to be provided for all prediction locations; in this case, for
all points on the prediction grid.

Finn

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/r-inla-discussion-group/5b5360e3-7f1c-49c4-be3b-ac761a768cf2n%40googlegroups.com.


--
Finn Lindgren
email: finn.l...@gmail.com

Prince Michael Amegbor

unread,
Mar 2, 2021, 8:59:07 AM3/2/21
to R-inla discussion group
With regards to your final question, you can do that by converting your mesh to voronoi polygons e.g.

vor_ply <- voronoi.polygons(SpatialPoints(mesh$ pred.grid[, 1:2]))

see an example in Francesco Serafini paper on Multinomial logit model with INLA (attached)

Alternatively, you can also use this function Finn wrote in a previous correspondence on this forum to create your polygon from the mesh

################### mesh to sp function
inla.mesh2sp <- function(mesh) {
  crs <- inla.CRS(inla.CRSargs(mesh$crs))
  isgeocentric <- identical(inla.as.list.CRS(crs)[["proj"]], "geocent")
  if (isgeocentric || (mesh$manifold == "S2")) {
    stop(paste0(
      "'sp' doesn't support storing polygons in geocentric coordinates.\n",
      "Convert to a map projection with inla.spTransform() before
calling inla.mesh2sp()."))
  }
  
  triangles <- SpatialPolygonsDataFrame(
    Sr = SpatialPolygons(lapply(
      1:nrow(mesh$graph$tv),
      function(x) {
        tv <- mesh$graph$tv[x, , drop = TRUE]
        Polygons(list(Polygon(mesh$loc[tv[c(1, 3, 2, 1)],
                                       1:2,
                                       drop = FALSE])),
                 ID = x)
      }
    ),
    proj4string = crs
    ),
    data = as.data.frame(mesh$graph$tv[, c(1, 3, 2), drop = FALSE]),
    match.ID = FALSE
  )
  vertices <- SpatialPoints(mesh$loc[, 1:2, drop = FALSE], proj4string = crs)
  
  list(triangles = triangles, vertices = vertices)
}

###################################################################################
mesh_2_sp = inla.mesh2sp(mesh)

NB: remember to create a spatial index for the polygons then add the index or ID to your point data. example below

mesh_2_sp@data$ID <- 1:nrow( mesh_2_sp)

#####create and add polygon id to point data
proj4string(data.sc or your_point_data)
proj4string(mesh_2_sp) #check if the polygon and point have the same crs projection

ID <- over(data.sc, mesh_2_sp  ) ## Get the space index or ID information
data.sc$Ply_ID <- ID ## Assign the information to the point data

I hope this helps

Prince Michael Amegbor

unread,
Mar 2, 2021, 9:02:22 AM3/2/21
to R-inla discussion group
Sorry, forgot to attach Francesco Serafini paper as promised. Find attached.

Please remember to acknowledge the authors (Serafini and Finn) if you use their approach as advised.

Cheers

On Tuesday, March 2, 2021 at 12:03:52 PM UTC+1 sdidel...@gmail.com wrote:
tutorial-multinomial-inla.pdf

SODE Idelphonse

unread,
Mar 7, 2021, 4:44:18 AM3/7/21
to R-inla discussion group
Hi Prof Finn and Dear INLA users,

After providing the values of covariates for my prediction locations, the code I presented above worked.

However, the posterior mean estimate of the prevalence (proportion of positive tests) computed as the fitted values (see the last line of my code) is still constant (around 0.50) at each prediction grid (see attached).

Even though the spatial effect is not so significant in my model when comparing its DIC with the one of the non-spatatial model, there is a great specific-community variations in each village.

So, why the posterior mean of my predictions could be so strange ?

I look forward to hearing from you.

Kind regards.

--
You received this message because you are subscribed to a topic in the Google Groups "R-inla discussion group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/r-inla-discussion-group/Ahrs9q0YU6Y/unsubscribe.
To unsubscribe from this group and all its topics, send an email to r-inla-discussion...@googlegroups.com.
Mean prevalence of infection.pdf

Elias T. Krainski

unread,
Mar 7, 2021, 8:10:55 AM3/7/21
to SODE Idelphonse, R-inla discussion group
Dear SODE,
When building the prediction stack you need to consider the same name for the outcome. Thus, it has to be data=list(y=...) and not data=list(xi=...)
best regards,
Elias

You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/r-inla-discussion-group/CANnc31DpeOLFBYcvXaXVdyeicHgNue22cU1V6k4LrK%3DV65COhw%40mail.gmail.com.

SODE Idelphonse

unread,
Mar 8, 2021, 2:25:10 PM3/8/21
to Elias T. Krainski, R-inla discussion group
Dear Krainski,

I followed your suggestion, but I still get the same results.
Could you please tell me whether my covariates at prediction points should be necessarily without NAs ?

Kind regards
--
SODE Akoeugnigan Idelphonse,
Biostatistician and GIS data Analyst,
Université d'Abomey-Calavi (Bénin)
Tel: (+229) 95200116.




SODE Idelphonse

unread,
Mar 10, 2021, 10:45:20 AM3/10/21
to Elias T. Krainski, R-inla discussion group
Hi Dear all,

I am trying to look for what happend so that my prediction is constant over my study region.

Although I have tried all of suggestions of Prof Finn and Krainski, I still get the same results. 

Does anyone think of any other reason ?

Kind regards.

Helpdesk

unread,
Mar 10, 2021, 11:02:48 AM3/10/21
to SODE Idelphonse, Elias T. Krainski, R-inla discussion group

can you attach updated Rcode and data, so it can be recheck-ed. ?
Håvard Rue
he...@r-inla.org

SODE Idelphonse

unread,
Mar 14, 2021, 5:32:42 PM3/14/21
to Helpdesk, Elias T. Krainski, R-inla discussion group
Hi INLA devellopers,

You have attached my R code and the data that I am using. 

I would like to know what happened so that the predicted prevalence is almost constant over the study area.

Kind regards.
Cleaned_prop_data.csv
Covariates.rar
Script_SPDE_Sode.R

SODE Idelphonse

unread,
Mar 19, 2021, 5:33:25 AM3/19/21
to R-inla discussion group
Hi INLA devellopers,

Has anyone found solution to my issue after checking my code with the attached data ?
I look forward to receiving any suggestion from your side.

Kind regards.

Elias T. Krainski

unread,
Mar 19, 2021, 9:34:00 AM3/19/21
to R-inla discussion group
Dear SODE,

I've checked your code.

One thing I had to fix was the way you scaled the covariates in the prediction scenario. This has to be done with the same mean and sd as for the observation one.

Another is that you forget to add family='binomial' for the prediction fit.

See attached the predicted map and the updated R script (some folder setting changes may not apply to you).

best regards,
Elias


mean-pred.png
Script_SPDE_Sode.R

SODE Idelphonse

unread,
Mar 19, 2021, 2:32:12 PM3/19/21
to Elias T. Krainski, R-inla discussion group
Prof Krainski,

Thank you for taking your time to fix the issue and improve my code. It is very appreciated. 

I also thank all those who had initially proposed some attempts of solution to my problem.

Kind regards.

SODE Idelphonse

unread,
Jun 28, 2022, 5:48:26 AM6/28/22
to R-inla discussion group
Hi Dear all,

Hope this my message finds you well.

I am mapping a prevalence outcome using Bayesian spatial binomial model with INLA and SPDE in R.
I have attached trough this email a sample of data and code last year when I had an issue with the model's prediction.

My previous map was performed with image.plot() function. But I want now to use ggplot2 package to design predicted maps resticted to my study area, similar to the maps in the attachment (Figure 1). I have also tried it with raster package, but the results do not look very good. The maps showed a white band at their middle (see the attached PDF file - Figure 2).

I don't know if anyone can help me.

Thanks in advance.
Figure 1 - Sample maps.JPG
Figure 2_Prevalence_Mapping.pdf
Mapping_trial code.r

Finn Lindgren

unread,
Jun 28, 2022, 11:15:53 AM6/28/22
to R-inla discussion group
Hi,

there is ggplot help functions in the inlabru package; the gg() method
there supports RasterLayer objects (by converting them to
SpatialPixelsDataFrame and plotting with geom_tile()),
so it might be as simple as

library(inlabru) # First install it from cran or github
ggplot() + gg(mean.inf.mask)

The inlabru:::gg.RasterLayer code is really simple, so take a look to
see what it actually does.

Finn
> To view this discussion on the web, visit https://groups.google.com/d/msgid/r-inla-discussion-group/94ac2966-c5cb-4a96-abb6-6636740072edn%40googlegroups.com.

SODE Idelphonse

unread,
Jun 29, 2022, 8:14:57 AM6/29/22
to R-inla discussion group
Dear Finn, 

Thanks you a lot for your suggestion.
I used the gg() function after converting my prediction into raster and clipping the result to my study region and it looks good (See Figure 1).

However, I want to customize the color palette of my map as the one of the Figure 2. I have tried  the scale_colour_gradientn() function without success.

So, how to change the default color set by the gg() function of Inlabru package ? 
I can't find any example in the package.

Regards
Figure2_north-mean-pred-inf.png
Figure1_inlabru_ggplot_north_mean_inf.png

Finn Lindgren

unread,
Jun 29, 2022, 12:03:26 PM6/29/22
to SODE Idelphonse, R-inla discussion group
I think it's the fill property you need to set the scale for, not colour, so scale_fill_gradientn is probably the one you need. Som of the github.io examples for inlabru do set the colours for similar plots, so look through those.

Finn

SODE Idelphonse

unread,
Jul 4, 2022, 10:54:48 PM7/4/22
to Finn Lindgren, R-inla discussion group
Dear Finn

Thanks a lot for your directives. I will explore later those examples from github.io
For now, I found an alternative solution in ArcGIS software after exporting my prediction into a TIF file.

Regards
--
SODE Akoeugnigan Idelphonse,
Biostatistician | Geospatial Data Scientist,
Cotonou, Republic of Benin
Tel: +229 95200116 /+229 96910543
Reply all
Reply to author
Forward
0 new messages