Re: [r-inla] Discrete or continuous space model?

Elias T Krainski

unread,

Jul 28, 2015, 5:00:00 PM7/28/15

to r-inla-disc...@googlegroups.com

Hi Tim,

Using the continuous field approach would be a good solution, specially if you have several disconnected components in the map. Could you provide the maps and them I will provide an example/code.

Elias

On 21/07/15 01:36, Tim wrote:

Hi All,

I am using R-INLA (thanks for the fantastic tool!) to build a spatio-temporal model of insecticide use. The response is the proportion of farmland treated with insecticide, summarized at the county scale (2500 counties across the contiguous USA). The fixed effects are cropping system, climate, and landscape characteristics, also summarized at the county scale. Their is clear temporal (four annual estimates per county) and spatial structure in simple model residuals that I want to account for in a spatio-temporal model. My question regards which type of modeling approach I should use.

I know that it is typical to treat this type of data as areal data, to describe spatial relationships with graphs, and employ CAR terms to deal with spatial structure. But this data set is unusual in that I don't have data for all counties in the contiguous US. There are a lot of holes and islands in the map that are causing conceptual and computational issues for me. Conceptually, assigning neighbors seems odd when there are gaps between counties, especially when I suspect that the spatial pattern in the data is due to continuous spatial processes related to climate, landscape structure, and crop pest movement (they don't really care what county they are in).

I would prefer to model these data using a geostatistical approach, using county centroids, and the SPDE tools you have developed. I have tried both approaches in R-INLA. They yield similar fixed effect estimates (the ones I am most interested in), but the SPDE approach produces a better fitting model than the CAR approach, eliminating nearly all residual autocorrelation. So, finally, the question. Is it reasonable to model this data using a geostatistical approach given (1) there are lots of holes and islands in the areal data, (2) I suspect the spatial patterning to be due to continuous spatial processes, (3) there are nearly 2500 counties in the analysis, where variation in county size and shape is small compared to the analysis extent, (4) it is not too absurd, in this case, to think of the county as a point given my interest in landscape characteristics (landscapes are really big points) and (5) a geostatistical model does a better job of removing residual autocorrelation?

Sorry for the long question, and for being a statistical ecologist, as opposed to an ecological statistician (took a wrong turn in graduate school). Thanks for any advice you can offer.

Best,

Tim Meehan

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To post to this group, send email to r-inla-disc...@googlegroups.com.
Visit this group at http://groups.google.com/group/r-inla-discussion-group.
For more options, visit https://groups.google.com/d/optout.

jpkr...@gmail.com

unread,

Jul 29, 2015, 4:16:05 AM7/29/15

to R-inla discussion group, eliask...@gmail.com

Hi Tim, Elias,

Just purely theoretically I'm not sure I agree with using the continuous field approach. Why ? Because land use / insecticide use is typically not a continuous phenomenon. You could argue something like population density is continuous as all points have a population density and it tends to smoothly change from low density to high density as you go towards a city centre for example, but land use/insecticide use is typically patchwork - the land use of neighboring fields might be completely different in a random way - I don't think that can be described as a continuous smooth field.

I may be wrong of course, but I've thought a little about these issues as I'm also thinking about land use currently. I'm open to other opinions.

James

Finn Lindgren

unread,

Jul 29, 2015, 5:00:37 AM7/29/15

to jpkr...@gmail.com, Tim Meehan, R-inla discussion group, Elias T. Krainski

On 29 July 2015 at 09:16, <jpkr...@gmail.com> wrote:
> Just purely theoretically I'm not sure I agree with using the continuous
> field approach. Why ? Because land use / insecticide use is typically not a
> continuous phenomenon. You could argue something like population density is
> continuous as all points have a population density and it tends to smoothly
> change from low density to high density as you go towards a city centre for
> example, but land use/insecticide use is typically patchwork - the land use
> of neighboring fields might be completely different in a random way - I
> don't think that can be described as a continuous smooth field.

Hi James, Tim, and all,

some comments on Tim's original question and James' comment:

"Discrete or continuous" is not necessarily an either/or proposition.
The random field part of the model
can for example be needed to model spatially smooth behaviour that is
not due to land use and/or other
spatially discrete phenomena. This type of model is very common, where
land use can enter as a discrete
factor, with a smooth spde process added on top of it, and on top of
that a random effects model with
independent random effects within each path/subregion to model
discrete effects that are neither
explained by the covariates or the smooth field.

In Tim's case it also depends on the scale of the observations
compared with the scale of the spatially
discrete variables. If the data is obtained over large enough
counties, he can only hope to model the
spatially smoothed average behaviour, and in such cases a continuous
smooth model is still useful.

In Bayesian hierarchical modelling, it's very common (I hesitate to
say mandatory) to model the underlying phenomenon as a whole. This
also gives an answer to Tim's questions abouts "holes" in the data.
My preferred method is to model the entire domain of interest, so that
the model is well defined even if there is _no_ data. The available
data is then added to the model. With this way of thinking, there is
no issue with "missing data" (except for cases of
not-missing-at-random/preferential sampling); the spatial model graph
simply includes all counties, regardless of which ones have
observations and which do not.
The INLA output will simply generate posterior predictions for all of
the counties.

There is a simple likely reason that Tim seemed to get better results
with an SPDE model: The
commonly discretised SPDE is a second order, CAR(2) model on a similar
graph to the first order,
CAR(1), model commonly used on neighbourhood graphs. If the CAR(1)
model is to spatially irregular
compared with the data, the second order model should do a better job.
It's also entirely justfifiable to use
the the GMRF for the discretised SPDE basis weights as a _definition_
of a discrete domain graph model;
the neighbour weights in the ordinary CAR(1) model are fairly ad hoc
as it is, so the fact that a CAR(2)
model constructed like this happens to have a continuous well-defined
limit can be seen as a benefit, not
a problem.

When the weights of the CAR(1) model are chosen in a particular way
depending on the angles between
neighbouring centres, that model happens to coincide with a
discretisation of a fractional SPDE
(alpha=1); see Besag (1981), JRSSB 43(3):302-309 and Besag and Mondal
(2005), Biometrika
92(4):909-920 for an analysis the continuous limit of the lattice
version of this, and Lindgren et al (2011),
JRSSB 73(4):423-498 for the general triangulation graph limits; the
continuous limit of the first-order CAR
models don't have point-wise meaning, so they are only appropriate to
use for spatially averaged data
(and for that they are well-defined).

Finn L

jpkr...@gmail.com

unread,

Jul 31, 2015, 5:48:25 AM7/31/15

to R-inla discussion group, jpkr...@gmail.com, tme...@gmail.com, eliask...@gmail.com, finn.l...@gmail.com

Hi Finn,

Thanks for the detailed reply - although I don't follow everything.

So can I ask for my own work - I have a CAR model of disease incidence - 18,000 ish polygons with observed and based on population, expected cases and I have made a BYM model in INLA. I'm not looking to build land use into this model as I have a land use of about 20,000 polygons. How would would set about building a joint model in this situation ? I did a polygon overlay in QGIS whici took 5 days to do and resulted in about 70,000 polygons and I can do some fairly basic stuff with that, but it seems quite crude and I'm sure there is a smarter way to model it.

James

TDM

unread,

Jul 31, 2015, 12:16:05 PM7/31/15

to R-inla discussion group, eliask...@gmail.com

Hi all,

Elias, Finn, James, thank you for helping me better understand how to think about discrete and continuous space in an INLA modeling context. It really helps to see how you think about it.

Elias, thanks for offering to consider how I can use the continuous field approach. I have attached my code and a resulting mesh. The resulting model looks very good. I built the mesh using ideas from your tutorial and from the new INLA book. I hope I employed your suggestions correctly. From what I gathered, the objectives are to have smaller, similar-sized triangles across the inner domain, and larger triangles in the outer domain; with similar, relatively large internal angles; where sample points can land on vertices but this is not required; where it is OK when occasional densely clustered points fall within triangles. Sound reasonable?

Thanks again, all!
Tim

To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion-group+unsub...@googlegroups.com.