marked LGCP model and area level model

250 views
Skip to first unread message

Soraia Pereira

unread,
Dec 12, 2016, 6:02:40 PM12/12/16
to R-inla discussion group

Hi,


I'm using a marked LGCP model with spde effects and I would like to compare the results for a specific geographical area level with the results obtained by an area level model. The area level model that I considered uses the same distribution as the one in the LGCP model and uses besag effects. When I compare the results in terms of variation coefficients, relative bias and RRMSE, the area level model produces better results, unlike I expected.. I already tryed to improve the LGCP model used, but the results keep worse than the area level model.. Is there any reason for this?


Thank you very much.



Haakon Bakka

unread,
Dec 15, 2016, 2:30:08 PM12/15/16
to Soraia Pereira, R-inla discussion group
Hi Soraia,

The answer to this question would depend very much on the details. (You may want to share the data and the code with the discussion group, or only me.)

I would also expect the LGCP spde model to perform better than the besag.

What I would need to check first is 
1. The result$logfile for both the models (please save as .txt files and attach to an email)
2. summary(result) for both models
3. Can you plot the mesh, with axes?

One thing I like to try is to fix the range parameter in the LGCP model, as my experience shows that this parameter can be unstable (leading to over- or under-fitting). So if you look at your spatial scale, plot the data, and think about the longest range at which interaction occurs, you can fix the spatial range to this value.

Kind regards,
Haakon Bakka





--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion-group+unsub...@googlegroups.com.
To post to this group, send email to r-inla-discussion-group@googlegroups.com.
Visit this group at https://groups.google.com/group/r-inla-discussion-group.
For more options, visit https://groups.google.com/d/optout.

Soraia Pereira

unread,
Dec 16, 2016, 6:22:55 AM12/16/16
to R-inla discussion group, soraia....@gmail.com

Hi Haakon,


Thank you for your kind answer. 

I will send the logfiles by email.

Please find attached the mesh used in the marked LGCP model.


### The marked LGCP model used:


jform68 <- y ~ 0 + b0.pp  + b0.y  + offset(log(dens.pp)) + f(i, model=spde) + f(j, copy='i', fixed=FALSE)+f(k, model=spde)


j.res68 <- inla(jform68, family=c('poisson', 'poisson'), data=inla.stack.data(j.stk2), E=inla.stack.data(j.stk2)$e,

               control.predictor=list(A=inla.stack.A(j.stk2),compute=TRUE,link=1),  

               control.compute=list(config=TRUE,dic=TRUE,cpo=TRUE,waic=TRUE),

               quantiles=c(0.025, 0.5, 0.975), 

               control.results=list(return.marginals.random=F,return.marginals.predictor=F), 

               control.inla = list(int.strategy = "eb"))



### The summary for marked LGCP model:


> summary(j.res68)


Call:

c("inla(formula = jform68, family = c(\"poisson\", \"poisson\"), data = inla.stack.data(j.stk2), ",  "    quantiles = c(0.025, 0.5, 0.975), E = inla.stack.data(j.stk2)$e, ",  "    control.compute = list(config = TRUE, dic = TRUE, cpo = TRUE, ",  "        waic = TRUE), control.predictor = list(A = inla.stack.A(j.stk2), ",  "        compute = TRUE, link = 1), control.inla = list(int.strategy = \"eb\"), ",  "    control.results = list(return.marginals.random = F, return.marginals.predictor = F))" )


Time used:

 Pre-processing    Running inla Post-processing           Total 

         3.6422       3093.8768          6.6850       3104.2040 


Fixed effects:

         mean     sd 0.025quant 0.5quant 0.975quant    mode kld

b0.pp -8.4515 0.0709    -8.5910  -8.4514    -8.3126 -8.4512   0

b0.y  -1.9791 0.0756    -2.1278  -1.9791    -1.8309 -1.9790   0


Random effects:

Name  Model

 i   SPDE2 model 

k   SPDE2 model 

j   Copy 


Model hyperparameters:

                mean     sd 0.025quant 0.5quant 0.975quant    mode

Theta1 for i -1.4276 0.0756    -1.6008   -1.416    -1.3116 -1.3812

Theta2 for i -0.7983 0.0575    -0.8874   -0.807    -0.6665 -0.8332

Theta1 for k  2.7517 0.3195     2.1228    2.753     3.3774  2.7558

Theta2 for k -2.9486 0.3377    -3.6046   -2.952    -2.2775 -2.9625

Beta for j    0.0032 0.0149    -0.0228    0.002     0.0351 -0.0014


Expected number of effective parameters(std dev): 1947.26(0.00)

Number of equivalent replicates : 16.69 


Deviance Information Criterion (DIC) ...: 48542.68

Effective number of parameters .........: 1907.54


Watanabe-Akaike information criterion (WAIC) ...: 68050.93

Effective number of parameters .................: 15486.26


Marginal log-Likelihood:  -62685.19 

CPO and PIT are computed


Posterior marginals for linear predictor and fitted values computed



### The area level model used:


formula.ST2_poisson<- obs ~ f(ID.area,model="besag",graph=portugal.adj) 


model.inla.ST2_poisson <- inla(formula.ST2_poisson,family="poisson",data=data_paper2,offset=log(n),control.predictor=list(compute=TRUE), control.compute=list(config=TRUE,waic=TRUE,dic=TRUE,cpo=TRUE))



### the summary for the area level model:


> summary(model.inla.ST2_poisson)


Call:

c("inla(formula = formula.ST2_poisson, family = \"poisson\", data = data_paper2, ",  "    offset = log(n), control.compute = list(config = TRUE, waic = TRUE, ",  "        dic = TRUE, cpo = TRUE), control.predictor = list(compute = TRUE))" )


Time used:

 Pre-processing    Running inla Post-processing           Total 

         3.0668          0.3126          0.2059          3.5853 


Fixed effects:

               mean     sd 0.025quant 0.5quant 0.975quant    mode kld

(Intercept) -2.8027 0.0297    -2.8621  -2.8024    -2.7454 -2.8016   0


Random effects:

Name  Model

 ID.area   Besags ICAR model 


Model hyperparameters:

                       mean    sd 0.025quant 0.5quant 0.975quant  mode

Precision for ID.area 16.94 9.189      5.757    14.80      40.53 11.46


Expected number of effective parameters(std dev): 14.28(1.922)

Number of equivalent replicates : 1.96 


Deviance Information Criterion (DIC) ...: 206.20

Effective number of parameters .........: 14.74


Watanabe-Akaike information criterion (WAIC) ...: 205.82

Effective number of parameters .................: 10.92


Marginal log-Likelihood:  -135.15 

CPO and PIT are computed


Posterior marginals for linear predictor and fitted values computed




Thank you.


Soraia



Rplot.png

Haakon Bakka

unread,
Dec 16, 2016, 6:47:33 AM12/16/16
to Soraia Pereira, R-inla discussion group
Hi,

The model you have "summary(j.res68)" does not seem right
1. Do you actually want 3 spatial terms, and 5 hyper-parameters
2: According to the logfile, the model did not converge.
3. Why do you have "+f(k, model=spde)" ?

Suggestions
1. Remove the "+f(k, model=spde)" term (unless you need it for a specific reason)
2. Use PC priors for any spatial effects

## PC Prior for spatial effect
domain.size = 400
spde = inla.spde2.pcmatern(mesh,
                         prior.range=c(domain.size, 0.5),
                         prior.sigma=c(1, 0.5))


Let me know how it goes!
(That means, send me summaries of the model, and logfiles)

Kind regards,
Haakon Bakka





Soraia Pereira

unread,
Dec 19, 2016, 4:43:52 PM12/19/16
to R-inla discussion group, soraia....@gmail.com

Hi Haakon,


I followed your suggestions but the result of model$mode$mode.status is also higher than zero. I’ve attached the log file. The summary is


> summary(j.res75)


Call:

c("inla(formula = jform75, family = c(\"poisson\", \"poisson\"), data = inla.stack.data(j.stk), ",  "    quantiles = c(0.025, 0.5, 0.975), E = inla.stack.data(j.stk)$e, ",  "    control.compute = list(config = TRUE, dic = TRUE, cpo = TRUE, ",  "        waic = TRUE), control.predictor = list(A = inla.stack.A(j.stk), ",  "        compute = TRUE, link = 1), control.inla = list(int.strategy = \"eb\"), ",  "    control.results = list(return.marginals.random = F, return.marginals.predictor = F))" )


Time used:

 Pre-processing    Running inla Post-processing           Total 

         3.4717      25325.2797          4.6753      25333.4267 


Fixed effects:

         mean     sd 0.025quant 0.5quant 0.975quant    mode kld

b0.pp -8.2997 0.0651    -8.4278  -8.2996    -8.1722 -8.2995   0

b0.y  -1.7028 0.0210    -1.7443  -1.7027    -1.6618 -1.7026   0


Random effects:

Name  Model

 i   SPDE2 model 

j   Copy 


Model hyperparameters:

               mean     sd 0.025quant 0.5quant 0.975quant    mode

Range for i  6.4807 0.2140      6.071   6.4771     6.9115  6.4701

Stdev for i  2.3760 0.0037      2.369   2.3760     2.3833  2.3760

Beta for j  -0.0428 0.0128     -0.068  -0.0428    -0.0176 -0.0428


Expected number of effective parameters(std dev): 1848.26(0.00)

Number of equivalent replicates : 17.59 


Deviance Information Criterion (DIC) ...: 48686.36

Effective number of parameters .........: 1805.29


Watanabe-Akaike information criterion (WAIC) ...: 67885.37

Effective number of parameters .................: 15214.72


Marginal log-Likelihood:  -25739.38 

CPO and PIT are computed


Posterior marginals for linear predictor and fitted values computed


###


If I remove the offset term, the result of model$mode$mode.status is zero, but the fitted values for the area level that I need are similar to the obtained values from the first model. I’v attached the log file. The summary of this model is:


> summary(j.res75semoffset)


Call:

c("inla(formula = jform75semoffset, family = c(\"poisson\", \"poisson\"), ",  "    data = inla.stack.data(j.stk), quantiles = c(0.025, 0.5, ",  "        0.975), E = inla.stack.data(j.stk)$e, control.compute = list(config = TRUE, ",  "        dic = TRUE, cpo = TRUE, waic = TRUE), control.predictor = list(A = inla.stack.A(j.stk), ",  "        compute = TRUE, link = 1), control.inla = list(int.strategy = \"eb\"), ",  "    control.results = list(return.marginals.random = F, return.marginals.predictor = F))" )


Time used:

 Pre-processing    Running inla Post-processing           Total 

         3.4388       6134.3440          7.2729       6145.0557 


Fixed effects:

         mean     sd 0.025quant 0.5quant 0.975quant    mode kld

b0.pp -7.7642 0.4424    -8.6341  -7.7638    -6.8974 -7.7630   0

b0.y  -2.3786 0.0390    -2.4553  -2.3786    -2.3022 -2.3785   0


Random effects:

Name  Model

 i   SPDE2 model 

j   Copy 


Model hyperparameters:

               mean     sd 0.025quant 0.5quant 0.975quant    mode

Range for i 34.1695 3.6547    26.9510  34.2472    41.1772 34.7230

Stdev for i  3.8088 0.3341     3.1362   3.8209     4.4359  3.8782

Beta for j   0.0811 0.0169     0.0514   0.0798     0.1172  0.0755


Expected number of effective parameters(std dev): 1696.60(0.00)

Number of equivalent replicates : 19.16 


Deviance Information Criterion (DIC) ...: 55233.21

Effective number of parameters .........: 1602.79


Watanabe-Akaike information criterion (WAIC) ...: 85199.83

Effective number of parameters .................: 20382.41


Marginal log-Likelihood:  -28982.59 

CPO and PIT are computed


Posterior marginals for linear predictor and fitted values computed


###


I think the problem in my case is an over-smoothing. The estimates for the areas with higher values are lower than the observed values and the estimates for the areas with lower values are higher than the observed values. I think this smoothing is due to the spde effects but I don’t know how can I change it. 


Can you help me with some advice?


Thank you.

j.res75$logfile.txt
j.res75semoffset$logfile.txt

Haakon Bakka

unread,
Dec 23, 2016, 4:30:28 PM12/23/16
to Soraia Pereira, R-inla discussion group
Hi,

See inline comments.

Happy holidays!
Haakon Bakka



On 19 December 2016 at 22:43, Soraia Pereira <soraia....@gmail.com> wrote:

Hi Haakon,


I followed your suggestions but the result of model$mode$mode.status is also higher than zero. I’ve attached the log file. The summary is


I do not know what that mode.status is unfortunately. I looked at your logfiles, and the first logfile is horrible (j.res75), it looks like the algorithm diverges. This is the model you are talking about here? This is probably because the posterior is flat or badly shaped.
 

> summary(j.res75)

Model hyperparameters:

               mean     sd 0.025quant 0.5quant 0.975quant    mode

Range for i  6.4807 0.2140      6.071   6.4771     6.9115  6.4701

Stdev for i  2.3760 0.0037      2.369   2.3760     2.3833  2.3760

Beta for j  -0.0428 0.0128     -0.068  -0.0428    -0.0176 -0.0428


###


If I remove the offset term, the result of model$mode$mode.status is zero, but the fitted values for the area level that I need are similar to the obtained values from the first model. I’v attached the log file. The summary of this model is:


The offset should not change the model, assuming it is constant, as it just shifts the intercept around, assuming you have an intercept. I don't see your model, did you have an intercept?

The second log-file looks OK, not great, but not bad either.

 

> summary(j.res75semoffset)

Model hyperparameters:

               mean     sd 0.025quant 0.5quant 0.975quant    mode

Range for i 34.1695 3.6547    26.9510  34.2472    41.1772 34.7230

Stdev for i  3.8088 0.3341     3.1362   3.8209     4.4359  3.8782

Beta for j   0.0811 0.0169     0.0514   0.0798     0.1172  0.0755


###


I think the problem in my case is an over-smoothing. The estimates for the areas with higher values are lower than the observed values and the estimates for the areas with lower values are higher than the observed values. I think this smoothing is due to the spde effects but I don’t know how can I change it. 


Hmm. You have a posterior range of 36, compare that to your study region. It seems like a short spatial range. On the other hand, you have a large amount of data, and just 1 spatial field?

## Other things to try 1: Rerun
Re-run the model with "control.mode=list(restart=T, theta=theta0)" inside the inla call, with
"theta0 = result$internal.summary.hyperpar$mean" from the previous call. This should make the algorithm more stable.


- I usually have these initial values in my code (as numbers), as it also speeds up the running time significantly!

## Other things to try 2: Overdisperson
It is reasonable to think that your model is not correct. And the poisson likelihood does not have any flexibility. And therefore you get oversmoothing.

You can add an iid random effect, to add overdispersion (or account for other errors and some model misspecification):
"+ f(iidx, model=\'iid\', hyper=hyper.iid)" in your formula with shrinkage prior:
"hyper.iid = list(prec = list(prior = 'pc.prec', param = c(3, 0.01)))"


Let me know if it helps,
Haakon

 

Haakon Bakka

unread,
Dec 23, 2016, 4:34:05 PM12/23/16
to Soraia Pereira, R-inla discussion group
PS: The overdispersion: You may want to add this to only one of the likelihood models (I recommend the abundance model). You may also want to add a separate iid effect to both likelihood models, as you have enough data for this to be sensible. 

Haakon
Reply all
Reply to author
Forward
0 new messages