On 29 July 2015 at 09:16, <
jpkr...@gmail.com> wrote:
> Just purely theoretically I'm not sure I agree with using the continuous
> field approach. Why ? Because land use / insecticide use is typically not a
> continuous phenomenon. You could argue something like population density is
> continuous as all points have a population density and it tends to smoothly
> change from low density to high density as you go towards a city centre for
> example, but land use/insecticide use is typically patchwork - the land use
> of neighboring fields might be completely different in a random way - I
> don't think that can be described as a continuous smooth field.
Hi James, Tim, and all,
some comments on Tim's original question and James' comment:
"Discrete or continuous" is not necessarily an either/or proposition.
The random field part of the model
can for example be needed to model spatially smooth behaviour that is
not due to land use and/or other
spatially discrete phenomena. This type of model is very common, where
land use can enter as a discrete
factor, with a smooth spde process added on top of it, and on top of
that a random effects model with
independent random effects within each path/subregion to model
discrete effects that are neither
explained by the covariates or the smooth field.
In Tim's case it also depends on the scale of the observations
compared with the scale of the spatially
discrete variables. If the data is obtained over large enough
counties, he can only hope to model the
spatially smoothed average behaviour, and in such cases a continuous
smooth model is still useful.
In Bayesian hierarchical modelling, it's very common (I hesitate to
say mandatory) to model the underlying phenomenon as a whole. This
also gives an answer to Tim's questions abouts "holes" in the data.
My preferred method is to model the entire domain of interest, so that
the model is well defined even if there is _no_ data. The available
data is then added to the model. With this way of thinking, there is
no issue with "missing data" (except for cases of
not-missing-at-random/preferential sampling); the spatial model graph
simply includes all counties, regardless of which ones have
observations and which do not.
The INLA output will simply generate posterior predictions for all of
the counties.
There is a simple likely reason that Tim seemed to get better results
with an SPDE model: The
commonly discretised SPDE is a second order, CAR(2) model on a similar
graph to the first order,
CAR(1), model commonly used on neighbourhood graphs. If the CAR(1)
model is to spatially irregular
compared with the data, the second order model should do a better job.
It's also entirely justfifiable to use
the the GMRF for the discretised SPDE basis weights as a _definition_
of a discrete domain graph model;
the neighbour weights in the ordinary CAR(1) model are fairly ad hoc
as it is, so the fact that a CAR(2)
model constructed like this happens to have a continuous well-defined
limit can be seen as a benefit, not
a problem.
When the weights of the CAR(1) model are chosen in a particular way
depending on the angles between
neighbouring centres, that model happens to coincide with a
discretisation of a fractional SPDE
(alpha=1); see Besag (1981), JRSSB 43(3):302-309 and Besag and Mondal
(2005), Biometrika
92(4):909-920 for an analysis the continuous limit of the lattice
version of this, and Lindgren et al (2011),
JRSSB 73(4):423-498 for the general triangulation graph limits; the
continuous limit of the first-order CAR
models don't have point-wise meaning, so they are only appropriate to
use for spatially averaged data
(and for that they are well-defined).
Finn L