Can BYM INLA models be used if there are multiple points per grid cell?

283 views
Skip to first unread message

Kristin Bondo

unread,
Mar 17, 2021, 11:25:41 AM3/17/21
to R-inla discussion group
Hi,

I am new to using INLA and analyzing spatial data. I was wondering if the INLA BYM model and neighborhood matrix was only for areal data of which there is one aggregated count per grid cell or area with a boundary? For example, my data consist of several points, which are the numbers of mosquitos captured in mosquito traps set out at different times and dates. I am interested in how specific environmental covariates predict mosquito abundance across a US state, so don't want to combine the mosquito counts per county or per grid cell. For this data, could an INLA BYM model be used or would it be more appropriate to use an INLA-SPDE model for this type of data?

Thank so much,

Kristin

Finn Lindgren

unread,
Mar 17, 2021, 12:56:18 PM3/17/21
to Kristin Bondo, R-inla discussion group
Hi,
it sounds like your model can be constructed in a hierarchical way, where the value in a cell of the BYM model controls the expectation for conditionally independent observations at the capture sites.
That would be a valid model for this, similar to a project I'm involved in for counting ticks, where the "background intensity" is modelled by smooth spde model, with a hierarchical random effect structure on top modelling differences between capture sites and multiple collections at each site. Whether a BYM or SPDE is used mostly affects the type of smoothness you'll have in the estimated model; the spde models use piecewise linear interpolation, and the approach I described using BYM would use piecewise constant interpolation.

Finn


--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/r-inla-discussion-group/7bdae56a-e92c-4861-b25a-423ed77eff2cn%40googlegroups.com.


--
Finn Lindgren
email: finn.l...@gmail.com

KB

unread,
Mar 17, 2021, 3:47:22 PM3/17/21
to R-inla discussion group
Hi Finn,

Thanks for your response! Is there a way to run the hierarchical  BYM models using inlabru? I just learned of inlabru yesterday, so am currently looking at some of the tutorials. Also, Is there any R code that you are aware of using R-INLA or inlabru to generate prediction maps for BYM models with the hierarchal random effect structure? 

Thanks,

Kristin

Finn Lindgren

unread,
Mar 17, 2021, 4:49:05 PM3/17/21
to KB, R-inla discussion group
Yes,
all the latent models that can run in INLA can be specified in inlabru. For BYM and iid models, the specification is pretty much the same.

For prediction, it depends on precisely what quantity you want to be able to predict; the relative intensity (or count expectation) surface is straightforward to compute predictions for; just
predict(fit, data = predictiondata, formula = ~ ...)
where ... is an R expression using the effect names that are evaluated for the covariate values stored in predictiondata.

For prediction of capture count distributions at new locations, you'd need to add random number generation for the likelihood and/or random effect models to the prediction formula.
The model parameters are available, e.g. for Gaussian likelihood models, the prediction formula can be
  formula = ~ rnorm(linearpredictor, sd = Precision_for_the_Gaussian_observations^-0.5)
(the hyperparameter names from the inla model summary have all whitespace and special characters converted to _ )

For your specific case, what you need depends on the precise structure of your model, and precisely what quantities you want to generate posterior samples for.

The newest inlabru version, 2.3.0, is making its way through the CRAN system which will make the current tutorials on inlabru.org obsolete.
If your aren't already, you should instead look at https://inlabru-org.github.io/inlabru/articles/

Finn


KB

unread,
Mar 25, 2021, 4:20:24 PM3/25/21
to R-inla discussion group
Hi Finn,

Thanks so much for your help. I ran some models using R-INLA on the mosquito data using a hierarchal random effect, including one for year, model="iid" and one for grid-id, "model=bym". We did not have many points where multiple traps were placed at the same location during the same year, so we excluded those points rather than creating an additional random effect for trap location.

I am now in the process of evaluating the fit of the top models. I followed the suggestions in the book, "Spatial and Spatio-temporal Bayesian models with R-INLA" and used methods based on the predictive distribution, including the cross-validation and posterior predictive checks.  Our dataset consists of 193,651 records, and the sum of the cpo values was 306. Considering how large the dataset is, should we still be concerned about these failures? Would you recommend other cross-validation methods such as leaving out 20% of the data and then using that to test the predictive ability of the model?

Also, there were many zero values in the dataset. When I plotted a histogram of the frequency of the posterior predictive p-values, it did not have a uniform distribution, and there were several low values and high values, and not many in between, indicating that the model does not fit the data well. Do you have any suggestions in this case? I tried running it with negative binomial, Poisson, and zero-inflatedpoisson. However, I am unsure what the difference is between zeroinflatedpoisson0, zeroinflatedpoisson1, and zeroinflatedpoisson3.

I am also considering that the model may not fit the data well because we might be missing an important covariate, such as the weather conditions the night of trapping, which could also explain why some traps may have had 0 values, especially if there was high wind, rain, and colder temperatures the night the trap was set out. 

It was also suggested that we could try using the median of the abundance estimates to assess model validity instead of the mean.  However, I am unsure on the code to do this in INLA. Would this involve transforming the posterior means into medians after they are calculated in R_INLA? 

Sincerely,

Kristin

Helpdesk

unread,
Mar 27, 2021, 8:58:43 AM3/27/21
to Kristin Bondo, R-inla discussion group

sorry for late reply.

the BYM is usually for aggregated data, or to represent regional effect
which are constant within each region.

Maybe you're after a LGCP (log gaussian cox process) for which your
covariate influence the marks.

A good source here would be the SPDE-book, see r-inla.org that should
have a discussion on this.

You might also check out papers by Janine Illian and friends, who should
have written about this in a different context

Best
H
> --
> You received this message because you are subscribed to the Google
> Groups "R-inla discussion group" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to r-inla-discussion...@googlegroups.com.
> To view this discussion on the web, visit
> https://groups.google.com/d/msgid/r-inla-discussion-group/7bdae56a-e92c-4861-b25a-423ed77eff2cn%40googlegroups.com
> .

--
Håvard Rue
he...@r-inla.org

KB

unread,
Apr 17, 2021, 12:15:33 PM4/17/21
to R-inla discussion group
Hi Håvard,

Thanks so much! That was very helpful.

- Kristin

Reply all
Reply to author
Forward
0 new messages