Preferential Sampling model results different for inlabru and plain INLA

575 views
Skip to first unread message

David Dayi Li

unread,
Feb 17, 2022, 7:56:42 PM2/17/22
to R-inla discussion group
Hi all,

I was running the preferential sampling model illustrated here: https://becarioprecario.bitbucket.io/spde-gitbook/ch-lcox.html#sec:prefsampl

I also ran the inlabru version of the model on the same data (the data is exactly the same as in the example provided in the link). The results produced by inlabru is way off. Below is my code in plain INLA (copied from the link example):

win <- owin(c(0, 3), c(0, 3))
npix <- 300
spatstat.options(npixel = npix)
beta0 <- 3
exp(beta0) * diff(range(win$x)) * diff(range(win$y))
sigma2x <- 0.2
range <- 1.2

set.seed(1)
lg.s <- rLGCP('matern', beta0, var = sigma2x,
              scale = range / sqrt(8), nu = 1, win = win)

xy <- cbind(lg.s$x, lg.s$y)[, 2:1]

n <- nrow(xy)

Lam <- attr(lg.s, 'Lambda')
rf.s <- log(Lam$v)
summary(as.vector(rf.s))

loc.d <- 3 * cbind(c(0, 1, 1, 0, 0), c(0, 0, 1, 1, 0))
v_mesh <- inla.mesh.2d(loc.domain = loc.d, offset = c(0.3, 1),
                     max.edge = c(0.3, 0.7), cutoff = 0.05)
nv <- v_mesh$n

spv <- SpatialPoints(cbind(lg.s$x, lg.s$y))

dmesh <- inla.mesh.dual(v_mesh) # this function is the book.mesh.dual() function provided in the link for computing the dual mesh

plot(dmesh)

nv <- v_mesh$n

domain.polys <- Polygons(list(Polygon(loc.d)), '0')
domainSP <- SpatialPolygons(list(domain.polys))

plot(domainSP)

w <- sapply(1:length(dmesh), function(i) {
  if (gIntersects(dmesh[i, ], domainSP))
    return(gArea(gIntersection(dmesh[i, ], domainSP)))
  else return(0)
})

y.pp <- rep(0:1, c(nv, nrow(spv@coords)))
e.pp <- c(w, rep(0, nrow(spv@coords)))

imat <- Diagonal(nv, rep(1, nv))

lmat <- inla.spde.make.A(v_mesh, loc = spv)

A.pp <- rbind(imat, lmat)


z <- log(t(Lam$v)[do.call('cbind',
                          nearest.pixel(xy[, 1], xy[, 2], Lam))])

beta0.y <- 10
beta <- -2
prec.y <- 16

set.seed(2)
resp <- beta0.y + (z - beta0) / beta +
  rnorm(length(z), 0, sqrt(1 / prec.y))

summary(resp)

#plain INLA

spde <- inla.spde2.pcmatern(mesh = v_mesh,
                            # PC-prior on range: P(practic.range < 0.05) = 0.01
                            prior.range = c(0.05, 0.01),
                            # PC-prior on sigma: P(sigma > 1) = 0.01
                            prior.sigma = c(1, 0.01))

stk.u <- inla.stack(
  data = list(y = resp),
  A = list(lmat, 1),
  effects = list(i = 1:nv, b0 = rep(1, length(resp))))

u.res <- inla(y ~ 0 + b0 + f(i, model = spde),
              data = inla.stack.data(stk.u),
              control.predictor = list(A = inla.stack.A(stk.u)))

stk2.y <- inla.stack(
  data = list(y = cbind(resp, NA), e = rep(0, n)),
  A = list(lmat, 1),
  effects = list(i = 1:nv, b0.y = rep(1, n)),
  tag = 'resp2')

stk2.pp <- inla.stack(data = list(y = cbind(NA, y.pp), e = e.pp),
                      A = list(A.pp, 1),
                      effects = list(j = 1:nv, b0.pp = rep(1, nv + n)),
                      tag = 'pp2')

j.stk <- inla.stack(stk2.y, stk2.pp)


gaus.prior <- list(prior = 'gaussian', param = c(0, 2))
# Model formula
jform <- y ~ 0 + b0.pp + b0.y + f(i, model = spde) +
  f(j, copy = 'i', fixed = FALSE,
    hyper = list(beta = gaus.prior))
# Fit model
j.res <- inla(jform, family = c('gaussian', 'poisson'),
              data = inla.stack.data(j.stk),
              E = inla.stack.data(j.stk)$e,
              control.predictor = list(A = inla.stack.A(j.stk)))

summary(j.res)

Below is the results produced by plain INLA:

Call:
   c("inla(formula = jform, family = c(\"gaussian\", \"poisson\"), data = inla.stack.data(j.stk), ", " E =
   inla.stack.data(j.stk)$e, control.predictor = list(A = inla.stack.A(j.stk)))" )
Time used:
    Pre = 1.45, Running = 4.06, Post = 0.558, Total = 6.07
Fixed effects:
       mean    sd 0.025quant 0.5quant 0.975quant  mode kld
b0.pp 3.203 0.155      2.912    3.198      3.542 3.194   0
b0.y  9.885 0.120      9.620    9.891     10.114 9.896   0

Random effects:
  Name          Model
    i SPDE2 model
   j Copy

Model hyperparameters:
                                          mean    sd 0.025quant 0.5quant 0.975quant   mode
Precision for the Gaussian observations 12.759 1.358     10.250   12.706     15.577 12.623
Range for i                              2.071 1.356      0.686    1.690      5.652  1.215
Stdev for i                              0.184 0.047      0.112    0.176      0.296  0.162
Beta for j                              -1.113 0.421     -1.941   -1.112     -0.286 -1.109

Expected number of effective parameters(stdev): 23.83(8.09)
Number of equivalent replicates : 38.56

Marginal log-Likelihood:  404.09 

As you can see, the results from plain INLA produced the correct results. But then I fitted the model using inlabru, the code is given below:

#inlabru

spv$resp <- resp

gaus.prior <- list(prior = 'gaussian', param = c(0, 2))

cmp <- ~ i(coordinates, model = spde) + j(coordinates, copy = 'i', fixed = F, hyper = list(beta = gaus.prior)) +
  b0.y(1) + b0.pp(1) -1

form_lgcp <- coordinates ~ b0.pp + i

form_col <- resp ~ b0.y + j


lik_lgcp <- like(family = 'cp', data = spv,
                 domain = list(coordinates = v_mesh),
                 formula = form_lgcp)

lik_col <- like(family = 'gaussian', data = spv,
                formula = form_col)

fit <- bru(components = cmp, lik_lgcp, lik_col,
           options = list(control.inla=list(int.strategy = 'ccd'),
                          control.compute=list(config=TRUE),
                          control.results=list(return.marginals.random = TRUE,
                                               return.marginals.predictor = TRUE),
                          control.predictor = list(compute = TRUE)))

summary(fit)

The results from inlabru is the following:
inlabru version: 2.4.0
INLA version: 21.02.23
Components:
  i: Model types main='spde', group='exchangeable', replicate='iid'
  j: Copy of 'i' (types main='spde', group='exchangeable', replicate='iid)
  b0.y: Model types main='linear', group='exchangeable', replicate='iid'
  b0.pp: Model types main='linear', group='exchangeable', replicate='iid'
Likelihoods:
  Family: 'cp'
    Data class: 'SpatialPointsDataFrame'
    Predictor: coordinates ~ b0.pp + i
  Family: 'gaussian'
    Data class: 'SpatialPointsDataFrame'
    Predictor: resp ~ b0.y + j
Time used:
    Pre = 1.65, Running = 10.7, Post = 1, Total = 13.3
Fixed effects:
        mean    sd 0.025quant 0.5quant 0.975quant   mode kld
b0.y  10.333 0.271      9.875   10.304     10.955 10.233   0
b0.pp  0.293 1.365     -2.553    0.316      3.014  0.361   0

Random effects:
  Name          Model
    i SPDE2 model
   j Copy

Model hyperparameters:
                                            mean    sd 0.025quant 0.5quant 0.975quant   mode
Precision for the Gaussian observations[2] 11.48 1.221      9.254    11.42     14.037 11.317
Range for i                                 3.06 0.623      2.073     2.98      4.507  2.801
Stdev for i                                 1.82 0.324      1.294     1.78      2.560  1.697
Beta for j                                 -0.17 0.049     -0.267    -0.17     -0.075 -0.169

Expected number of effective parameters(stdev): 60.52(7.47)
Number of equivalent replicates : 11.62

Deviance Information Criterion (DIC) ...............: 685.09
Deviance Information Criterion (DIC, saturated) ....: NaN
Effective number of parameters .....................: -204.00

Watanabe-Akaike information criterion (WAIC) ...: 1511.27
Effective number of parameters .................: 294.45

Marginal log-Likelihood:  -658.83
Posterior marginals for the linear predictor and
 the fitted values are computed

As you can see, the results from inlabru is way off. I have checked every single prior specification of all the parameters, they are exactly the same as the plain INLA model. What is going on here for inlabru? Thank you very much!

Best,
David

Finn Lindgren

unread,
Feb 17, 2022, 8:36:16 PM2/17/22
to David Dayi Li, R-inla discussion group
To things to check first:
1. Does inla.rerun() on the plain inla version change the results?
2. Does the GitHub devel version of inlabru give a different result?

The two approaches use different lgcp likelihood approximations, but the inlabru version is more accurate (the dual mesh stuff give a zeroth order approximation of the integral in the likelihood; inlabru uses a first order approximation), so normally it should be ok.

Finn

On 18 Feb 2022, at 00:56, David Dayi Li <davi...@gmail.com> wrote:

Hi all,
--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/r-inla-discussion-group/b350e22b-17ef-4b09-9127-1b7c6fab0e28n%40googlegroups.com.

David Dayi Li

unread,
Feb 17, 2022, 8:56:09 PM2/17/22
to R-inla discussion group
Hi Finn,

Just did the inla.rerun() on the plain inla version. There is a slight change, but the results are very much the same.

But when I got the Github devel version of inlabru, I got the following error when I ran the bru() function as my previous code: 

Error in ...names() : could not find function "...names"

How do I fix this? Thanks!

David

David Dayi Li

unread,
Feb 17, 2022, 9:54:33 PM2/17/22
to R-inla discussion group
Is there some kind of internal reparameterization in inlabru compared to plain INLA? Or some form of behind the scene operation that is different than plain INLA? It does not make sense that inlabru is supposed to be more accurate than plain INA but inlabru is now the one not giving the correct results. Thanks!

David

Finn Lindgren

unread,
Feb 18, 2022, 2:55:45 AM2/18/22
to David Dayi Li, R-inla discussion group
What R version do you have? I switched from names(list(…)) to …names(), but perhaps should have checked if …names() was too new to use at this point? You can try installing from a specific commit; anything before yesterday won’t have …names() in it.

The lgcp accuracy is in the likelihood implementation, where inlabru uses a more accurate construction for the likelihood. Your inla code uses a different construction. It’s possible that the inlabru version is more sensitive to clustering, which increases the needed variance. But I’ll need to run the code here to check.

Finn

On 18 Feb 2022, at 02:54, David Dayi Li <davi...@gmail.com> wrote:

Is there some kind of internal reparameterization in inlabru compared to plain INLA? Or some form of behind the scene operation that is different than plain INLA? It does not make sense that inlabru is supposed to be more accurate than plain INA but inlabru is now the one not giving the correct results. Thanks!

Finn Lindgren

unread,
Feb 18, 2022, 5:00:43 AM2/18/22
to David Dayi Li, R-inla discussion group
The main issue is that you're not specifying the LGCP domain in the inlabru model, so you're estimating a process on the entire mesh, instead of on the correct domain that you use for the inla call.
You should have something like
lik_lgcp <- like(family = 'cp', data = spv,
                 samplers = domainSP,

                 domain = list(coordinates = v_mesh),
                 formula = form_lgcp)

But I think there are some bugs introduced by the past few days work on environments and other code. I'll check more later that causes an error in the latest devel version.
Finn
--
Finn Lindgren
email: finn.l...@gmail.com

Finn Lindgren

unread,
Feb 18, 2022, 5:31:57 AM2/18/22
to David Dayi Li, R-inla discussion group
You were running with an old INLA version, and some of the options don't exist anymore (control.results).
I also turned off all the other inla options in the call, since you hadn't set any of those in the inla() call.
In particular int.strategy="ccd" should be expected to produce a different result than other int.strategies.
When comparing models and methods, make sure you're keeping the things you're _not_ specifically comparing to the same settings,
as in particular for weakly identifiable models, those choices may have a noticeable effect on the results.

I haven't been able to compare with the plain inla() results yet, as I don't have the special functions installed (the book.dual.mesh function; I'm not sure why there is no direct link from the book! Where did you find it?)

Finn

Finn Lindgren

unread,
Feb 18, 2022, 6:27:23 AM2/18/22
to David Dayi Li, R-inla discussion group
I found my local copy of book.dual.mesh.

With the latest INLA and inlabru, the results are more comparable, except for the copy model beta parameter, that gets a positive sign instead of negative.
That is strange, but looking at the constructed data, I wonder how identifiable the model really is, and whether the copy feature becomes the same or different internally; it's meant to be the same.

Finn


Elias T. Krainski

unread,
Feb 18, 2022, 8:59:06 AM2/18/22
to R-inla discussion group
In the example of the book 'i' is defined in the linear predictor for the 'mark' while 'j' is in the linear predictor for the point pattern. I usually like to define the field in the predictor of the more informative data and copy to the other.

If the purpose is to fit the same model from the book using inlabru, it just need to be changed the two formulae accordingly:

form_lgcp <- coordinates ~ b0.pp + j
form_col <- resp ~ b0.y + i

Thus, coding in inlabru is easier and the method is more accurate as explained by Finn.

Elias


David Dayi Li

unread,
Feb 18, 2022, 1:48:31 PM2/18/22
to R-inla discussion group
Thanks so much! Just fixed the sampler = domainSP thing as well as the int.strategy = 'auto', now the results are similar.

Best,
David 

David Dayi Li

unread,
Feb 18, 2022, 3:48:27 PM2/18/22
to R-inla discussion group
Also, out of curiosity, what is the LGCP likelihood approximation used by inlabru? I know the dual mesh thing is based on the construction in Simpson et al. (2016), but I can't find anything about the likleihood approximation for inlabru. Are there any sources available for this? Thanks a lot!

David

Finn Lindgren

unread,
Feb 20, 2022, 3:20:10 PM2/20/22
to R-inla discussion group
Hi David,

I had to double check that I remembered correctly what we actually wrote in the "Going off grid" paper
(Simpson et al, 2016), but inlabru actually does implement the approximation as written in equation (3) of
that paper, with (by default) integration points at the vertex nodes and integration weights equal to
the basis function integrals (which happens to be equal to the areas of the dual mesh cells, for the
piecewise linear basis functions on the triangles).

The last term of the equation evaluates the linear predictor at the observed point locations, and does
not aggregate to the mesh nodes. This is the difference to the examples that construct aggregated
counts on the dual mesh cells and then evaluate the linear predictor as the mesh nodes instead
of at the actual point locations.

The aggregation method was common for gridded modelling, and the mesh&basis function
approach was explicitly intended to avoid the need for such aggregation (hence the title of the paper!).
In cases where the mesh is constructed to include the observed points as nodes, the expression
for the aggregated version is identical to that of eq (3), but such mesh choices can lead to numerical
problems in case of clustered points, and also leads to some bias in the random field parameter estimates.

Figure 1 of the paper illustrates the dual mesh that gives the "local area" interpretation of the
basis function integrals, and was not intended to give the impression that the points need to
be aggregated to counts.

So in summary, inlabru implements the Simpson et al 2016 method, and examples that aggregate
points to mesh node counts do not.  (At some point we really need to make a new edition of the
SPDE examples book, with inlabru; that will fix this issue and also remove many pages of user
side code that's no longer needed.)

Finn


David Dayi Li

unread,
Feb 20, 2022, 3:53:35 PM2/20/22
to R-inla discussion group
Hi Finn,

Thanks for the clarification! I bumped into this paper yesterday and found that the author is saying inlabru implements the "Barycentric point spreading approach" since version 2.1.8 : https://onlinelibrary.wiley.com/doi/10.1002/sta4.285

The author says this method is different from the one in Simpson et al. If it is indeed different, how would one implement it from scratch? The most I have gotten to was obtaining a bunch of extra integration points in each mesh triangles using ipoints function from inlabru. But then the author says " all (extra) points within Ω are then mapped to the mesh nodes using the basis functions" got me a bit confused. How do you map these points to the mesh nodes? I just want to do this from scratch so that I can have a deeper understanding of how things work.

Thanks!
David

David Dayi Li

unread,
Feb 20, 2022, 4:10:22 PM2/20/22
to R-inla discussion group
I think I may understand what the author is saying by mapping. Is it just saying that the points associated with a mesh node is grouped together and their weights summed up as the weight for that mesh node? If this is the case, isn't the weight simply just the area of all the triangles associated with a mesh node? But this does not feel like the correct way to get the weight.

David

Finn Lindgren

unread,
Feb 20, 2022, 4:39:25 PM2/20/22
to David Dayi Li, R-inla discussion group
The first three methods listed in that paper are identical for triangles on the interior of the domain, and are just different ways of computing the same thing; away from the boundary, the
weight for each node is the integral of the basis function, which is equal to the area of the associated dual mesh cell, which is also equal to the sum of the adjacent triangle, divided by 3.
This is just pure linear geometry, and explains essentially all the results that compare the first three methods.

Near the boundary, the listed methods are just different ways to numerically construct the integrals, that are more or less accurate depending on how the mesh edges relate to the domain boundary.
Instead of explicitly finding the intersection between the domain and the triangles, the method in inlabru places a medium to large number of points on each triangle, and keeps those falling inside the integration domain. This is a technical integration scheme construction, and doesn't change the integration concept.  The fourth method listed in the paper, Voronoi, does change the area weighting in the entire space, so that's a more fundamentally different method, which is also clear from the numerical results.

The "barycentric point spreading approach" is _identical_ to the exact integrals on the interior, and close approximations on the boundary, so all the differences between the "plain dual mesh" and "barycentric" methods ahould be due to how closely they approximate the exact integrals near the boundary, since I believe his dual mesh code uses exact polygon intersections. That's not currently implemented in the inlabru code since the numerical integration scheme is easier to implement for all the different meshes and spaces we need to work on (e.g. point processes on cortical surfaces, where there are no "geographical polygon" intersection tools available.

For cases where the triangles are large relative to the domain, one can increase the number of subsampling points for the integration to get closer to the exact integrals.
inlabru does also have an option to use the fine-grained integration scheme directly instead of placing the weights at the vertices, but that's for special cases where that's needed, and in those cases other integration schemes might be needed. There are strong reasons for why having integration points at the vertices (at least) is essential for a stable likelihood approximation. Not having that can lead to a degenerate likelihood. But having integration points both on the vertices and on the triangle insides would be even better. The reason not to do that is that it's usually not worth the additional cost, and hence not high on the TODO list.

But this is essentially unrelated to the issue of whether the \sum_i \log(\ambda(y_i)) term uses the actually observed locations y_i, or instead approximates that with \sum_j n_j \log(\lambda(s_j)), where the s_j are mesh nodes, and n_j are aggregated counts.  That's the main difference between the inlabru Simpson et al method and the aggregated count methods.

Finn

Finn Lindgren

unread,
Feb 20, 2022, 5:31:36 PM2/20/22
to David Dayi Li, R-inla discussion group
Following up on this once more, I suspect that the comparison paper didn't take into account the differences in approach for the sum-log-lambda term between the inlabru implementations and the other implementations, so the comparisons confound the integration scheme differences with that difference, which I think explains some of the results.  Making proper comparisons between methods is hard.

Finn

Finn Lindgren

unread,
Feb 20, 2022, 5:42:56 PM2/20/22
to David Dayi Li, R-inla discussion group
The "spread many points" method in inlabru is quite a bit faster now than it was when that paper was written (or rather, a version of the extension mentioned in the paper is in the package, and has since changed some of the slow details), so the drawback mentioned should be much less of an issue now.
To use it, use the option
  bru_int_args = list(nsub2 = ???)
where ??? controls the amount of points. Setting nsub2=29 gives (29+1)^2=900 points per triangle. The default is 9, giving 100 points per triangle

Finn

David Dayi Li

unread,
Feb 20, 2022, 7:30:48 PM2/20/22
to R-inla discussion group
Ahh, that makes total sense now. In the paper he did not clarify the weight of the Barycentric spread points were summed up and divided by 3, which got me confused. Thank you very much!

David

Finn Lindgren

unread,
Feb 20, 2022, 7:35:39 PM2/20/22
to David Dayi Li, R-inla discussion group
Since both options are possible, I’d need to look again, but I think it says in the definitions which version was used. When not aggregating the integration weights, having many integration points gets very expensive, so I don’t think that’s what he did; in fact I recall it states somewhere towards the end that the version with 900 points per triangle only gives an additional one-time preprocessing cost.

Finn

On 21 Feb 2022, at 00:30, David Dayi Li <davi...@gmail.com> wrote:

Ahh, that makes total sense now. In the paper he did not clarify the weight of the Barycentric spread points were summed up and divided by 3, which got me confused. Thank you very much!

Elias T. Krainski

unread,
Feb 21, 2022, 1:17:19 AM2/21/22
to R-inla discussion group
Dear all,

I've enjoyed this discussion and the clarifying details. In summary, 5 out of 6 log-Cox point process examples in the book use the same likelihood approximation as in the "Going off grid ..." paper. We considered rgeos::gIntersection() to compute the intersection between the domain and the polygon set made of the dual mesh. And in the spatio-temporal examples we considered C0 from the temporal mesh.

The example in section 8.4 Large point process dataset, where we bind the points into the integration ones, was inspired by a practical example where we had millions of points. The binding was considered because the 'classic' inla mode includes the linear predictor in the latent field. So, instead of adding 'n' latent variables plus the integration points we only have the integration points added. However, we no longer need to do this because a new strategy was implemented in INLA that avoids including the linear predictor in the latent field:  the new "experimental" mode option. Therefore, the example 8.4 in the book could be fitted as the example 8.3 with inla(..., inla.mode='experimental'). I just did that and it turns out that the 'n' simulated in the book is not big enough to make a big difference in the computation time on my laptop...

Elias



Finn Lindgren

unread,
Feb 21, 2022, 3:39:57 AM2/21/22
to Elias T. Krainski, R-inla discussion group
Even without inla.mode experimental, it turns out there is no need to have individual predictor entires for each observed point. They can be combined into a single line in the data frame, with the total count as response value, and
eta = sum_i log(lambda(y_i))
This gives exactly the same effect on the posterior, and is constructed by taking the column sums of the A-matrix for the individual points.
This has now (sometime last year) been implemented in inlabru. Previously, we had only looked at the version where the double sum in that term (where the log-lambda part is a weighted sum over mesh node values) is reversed, so that one gets fractional counts at mesh nodes, which can be used with the “xpoisson” likelihood, but the more drastic “all in one” approach is even more efficient, and still works for more general model like the nonlinear predictors inlabru supports.

Finn

On 21 Feb 2022, at 06:17, Elias T. Krainski <eliask...@gmail.com> wrote:


Reply all
Reply to author
Forward
0 new messages