Examples of LGCP for disease surveillance in R-INLA

63 views
Skip to first unread message

Julianne Meisner

unread,
May 10, 2022, 6:02:51 PM5/10/22
to R-inla discussion group
Hello, 

Can anyone point me to a good example (ideally with code) implementing an LGCP model in R-INLA for disease surveillance? I'm planning to use population rather than polygon area (https://becarioprecario.bitbucket.io/spde-gitbook/ch-lcox.html#introduction-1) for deriving expected values, but an example building in reporting effort would be fantastic. 

Shearer et al. used reports of other diseases as integration points for yellow fever mapping (in a BRT implementation of a point process model), which is compelling, but it'd be great to see something similar (or a better alternative) in R-INLA: https://www.thelancet.com/cms/10.1016/S2214-109X(18)30024-X/attachment/dad09729-13cf-447f-b3bc-2558168d46d9/mmc1.pdf

Many thanks,
Julianne

Finn Lindgren

unread,
May 10, 2022, 6:38:28 PM5/10/22
to Julianne Meisner, R-inla discussion group
Hi,

One starting point is to look at the ecology examples for LGCP models at https://inlabru-org.github.io/inlabru/articles/
These are based on directly working with the point process aspect of the data and observation models (area surveys, transect surveys, etc). Any aggregation to counts is done internally as part of the likelihood construction.

Approaches based on explicitly aggregating the points into counts on polygons to approximate the point process with e.g. Poisson count regression are a bit different. Which approach is most appropriate depends on the details of your data gathering system, and what aspects you wish to include in your model.

In both cases, the model estimation is tied to the data gathering mechanism. The output, e.g. predictions of total counts, can include other aspects as well, that are not necessarily the same as the input data (e.g. whether population is used to inform the “exposure” in the model estimation and/or posterior predictions, depending on precisely what quantities are needed).

Finn

On 10 May 2022, at 23:02, Julianne Meisner <meisner....@gmail.com> wrote:

Hello, 
--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/r-inla-discussion-group/2e2ca71c-9cd9-4140-9183-71fc9c846765n%40googlegroups.com.

Julianne Meisner

unread,
May 10, 2022, 7:44:29 PM5/10/22
to Finn Lindgren, R-inla discussion group
Thanks very much, Finn, I will check these out. 

I was also interested by Simpson et al. (2016)'s example 7.3 (figure 7.3, https://academic.oup.com/biomet/article-abstract/103/1/49/2389990?redirectedFrom=fulltext), in which the mesh was coraser in areas with lower sampling effort. Do you have examples of how to implement this?

Many thanks,
Julianne
--
Julianne Meisner, BVM&S(Vet) MS PhD
Clinical Assistant Professor | DEOHS, University of Washington SPH

Julianne Meisner

unread,
May 10, 2022, 7:45:00 PM5/10/22
to Finn Lindgren, R-inla discussion group
My apologies, that should have said Figure 4(b)

Finn Lindgren

unread,
May 11, 2022, 6:20:50 AM5/11/22
to Julianne Meisner, R-inla discussion group
Hi,

the "lower sampling effort" aspect of Figure 4(b) is mostly relevant
when there is complete absence of sampling (as in the example
constructed in the paper). When the sampling effort is just "lower"
(and not "zero"), the mesh resolution needed is more related to the
spatial variability, or lack thereof, of the intensity field, so a
uniform mesh is usually sensible.
But for the case of a polygon defining the sampling domain (with or
without holes), there are two places to use it:
1. As "samplers" argument to lgcp() or ipoints(), to define the region
of the point observations, and
2. As the first "boundary" polygon when defining the mesh. If you have
the sampling domain as a SpatialPolygons object "my_sampler", then
inla.mesh.2d(..., boundary = list(inla.sp2segment(my_sampler)),
max.edge = c(inner_edges, outer_edges))
should allow you to make smaller triangles inside than outside.

Finn
--
Finn Lindgren
email: finn.l...@gmail.com

Finn Lindgren

unread,
May 11, 2022, 6:39:16 AM5/11/22
to Julianne Meisner, R-inla discussion group
And to further clarify, for point process likelihoods, the sampling
domain should _always_ be used when defining the likelihood; the mesh
will usually cover a larger region (e.g to move spde boundary effects
away from the region of interest), and a "point process observation"
is the combination of "where did we look for points" and "where were
the points we found"; both are needed to define the likelihood.

(Also note that the "dual mesh" graph in the Going off grid paper was
meant only to illustrate a geometric fact about integrating basis
functions. Later examples unfortunately often conflate this with
"aggregating points into counts on the dual mesh cells", which is
_not_ the best way to implement the point process likelihood; inlabru
uses a different approach which is closer to the continuous spatial
model.)

Finn

Julianne Meisner

unread,
May 11, 2022, 11:36:59 AM5/11/22
to Finn Lindgren, R-inla discussion group
Got it, thank you. In my use case (disease surveillance) I have a proxy for the sampler, which is reporting of other diseases, but not the sampler itself. The sampling domain is effectively unbounded as this is passive surveillance (anywhere there is a population there could be a report). I’m still wrapping my head around exactly how the sampler is implemented in inlabru— would using a proxy for the sampler, when the exact sampler is unknown, introduce significant bias? 

Many thanks,
Julianne

Finn Lindgren

unread,
May 11, 2022, 11:44:56 AM5/11/22
to Julianne Meisner, R-inla discussion group
I'm sure I saw someone else discuss a similar type of model recently
(but it may have been a private communication and not on this list)...

I'd say that the "proxy sampler" aspect is likely best handled by
using it to define a detection model; it's similar but not quite the
same as in the animal transect setting, where one models the detection
probability conditionally on there being an animal, which turns the
combined intensity to
log(observation intensity) = log(actual intensity) +
log(P(detection|presence))
and the detection probability is allowed to depend on location.
In your case, the "log(detection probability)" term would be the one
involving the sampling proxy in some way.

Whether you get a biased estimate of the "actual intensity" depends on
how well your sampling proxy model corresponds to reality.

Finn
> To view this discussion on the web, visit https://groups.google.com/d/msgid/r-inla-discussion-group/CAJ9V2oemR_RtYvSf8d5f-JCmD25nLtZ_ZsX9sXZwc4G75u%2Bm5Q%40mail.gmail.com.

Julianne Meisner

unread,
May 11, 2022, 11:47:43 AM5/11/22
to Finn Lindgren, R-inla discussion group
This is great, thank you Finn. Do you happen to have any examples you can point me to that have implemented this model?

Many thanks,
Julianne

Finn Lindgren

unread,
May 11, 2022, 11:56:24 AM5/11/22
to Julianne Meisner, R-inla discussion group
inlabru implements the line transect version, see example in
https://inlabru-org.github.io/inlabru/articles/web/2d_lgcp_distancesampling.html
The principle would be the same, though you'd just need an ordinary
spatial integration scheme and not a space-times-distance scheme,
and how the sampling proxy should enter might need some thinking. If
it's just a "proportional to population" thing, then
+ log(population) would be appropriate, but more complex depending on
the precise form of the model and what you assume known about the
proportionality constant, etc.

If you define a function f_log_population() that evaluates the
population information at given locations (see the 2d covars example
on the inlabru site) then
the detection part of the model could be something like this:

components = ~ ... + beta(1) +
log_pop(f_log_population(coordinates(.data.)), model = "offset")
formula = coordinates ~ ... + beta * log_pop

(using "beta" here would likely make the model non-identifiable, so
just leave that out to just use the log-population as a fixed offset)

Finn
Reply all
Reply to author
Forward
0 new messages