stack set-up for semi-continuous spatiotemporal model with covariates

591 views
Skip to first unread message

Nikos Nikolioudakis

unread,
May 11, 2017, 6:48:57 AM5/11/17
to R-inla discussion group
Hi,
I am trying to set up a semi-continuous model with covariates and I want to also take into account temporal correlation as well. I have been going through the examples in the spde-tutorial, the code in chapter 8 (8.2) of the Blangiardo & Cameletti book and the code provided in the work of Godin et al. 2014, trying to combine bits and pieces to make it work for my data. I have also gone through the R-inla discussion group for a possible answer but I did not find something explicit with the keywords I search. The approach I am using is similar to that of Paradinas et al 2015 (Mar Ecol Prog Ser 528: 245–255). My data have a 6 year span and involve fish biomass locations (hence the semi/continuous approach as there can be stations with 0 catch).

I am running the models on a CentOS server and I am facing 2 problems. 
1) When I run the code, the first time I am running 

formula.y0 <- Biomass_Pos ~ -1 +  y.intercept + Ttop50.std + Stop50.std
out.y0 <- inla(formula.y0, family="gamma", data=inla.stack.data(stk.y),
               control.predictor=list(A=inla.stack.A(stk.y), compute=TRUE, link=1), 
               control.compute=list(dic=TRUE, waic=TRUE, config=T
               ), 
               control.inla=list(strategy="laplace"), verbose = T)

inla crashes. I have attached the output from verbose = T (crashLog.txt). However(!), if I just simply re-run the out.y0 <- inla(…) code it pulls through without any error message…So this is not my major problem given that I manage to solve this through re-running the code. If I run the code on my laptop (Windows10, i7, 16Gb RAM) inla crashes no matter what. 

2) My main problem however, is the proper creation of the stack when it comes to the grouping / replication part of the spatial field. I manage to make the code work when a common field is assumed for all years (pooled years NOT replicated spatial field in each year), but I get an error message when I try either to have a replicated spatial random effect or a correlated among years one (‘ar1’ or ‘exchangeable’). To my understanding the problem is in the stack formulation, but I can’t get my head around it…I would also like to maintain the ability to have the stacks set up to run both the separate models and the joint model (i.e. I don’t prefer a joint stack that will only include the ‘alldata’ and not the ‘y’ and ‘z’ data as well (I hope I am making sense here…)). I am probably missing some kind of replication but I can’t figure it out…The error I am getting is 

‘Error in parse.input.list(effects[[k]], input.ncol(A[[k]]), paste("Effect block ",  : 
  Effect block 2:
All variables must have unique names
Names: ('')’

But I am sure that it is not the only error there is in the stacks formulations

Any help in the formulation of the stacks, the model formula and the call to inla, would be greatly appreciated. I am attaching the code I am using so far, plus the data required to make it run and the crashLog.txt in case something makes sense. For the moment, I am using a simple mesh formulation, but in the end, given that the data are for fish and land is involved as a barrier, I plan to use a more sophisticated mesh (probably using the ‘Barrier Models’ approach).  

Thank you,
Nikos
 

crashLog.txt
question.RData
Rcode_question.R

Håvard Rue

unread,
May 11, 2017, 7:16:48 AM5/11/17
to Nikos Nikolioudakis, R-inla discussion group
On Thu, 2017-05-11 at 03:48 -0700, Nikos Nikolioudakis wrote:
> Hi,
>
>                control.inla=list(strategy="laplace"), verbose = T)
>

can you remove strategy="laplace"

it is not needed.

if you do, does it help?

--
Håvard Rue
Department of Mathematical Sciences
Norwegian University of Science and Technology
N-7491 Trondheim, Norway
Voice: +47-7359-3533 URL : http://www.math.ntnu.no/~hrue
Mobile: +47-9260-0021 Email: havar...@math.ntnu.no

R-INLA: www.r-inla.org

Finn Lindgren

unread,
May 11, 2017, 7:22:14 AM5/11/17
to Nikos Nikolioudakis, R-inla discussion group
On the second issue, you have many inla.stack calls. Which one gives the error?
Take a look at the individual pieces of information you provide to that specific call, and make sure that at each step of the construction you have lists or data.frames with unique (and nonempty) column names).

Sensible inla.stack calls are closely tied to the model formula; using a common stack across different formulas sometimes works, but will typically results in needlessly large models, as well as undefined behaviour if the stack has multicolumn data but the inla calls just uses one likelihood (I would expect it to complain and/or crash in such situations).

Finn
--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To post to this group, send email to r-inla-disc...@googlegroups.com.
Visit this group at https://groups.google.com/group/r-inla-discussion-group.
For more options, visit https://groups.google.com/d/optout.
<crashLog.txt>
<question.RData>
<Rcode_question.R>

INLA help

unread,
May 11, 2017, 7:26:37 AM5/11/17
to Nikos Nikolioudakis, R-inla discussion group
On Thu, 2017-05-11 at 03:48 -0700, Nikos Nikolioudakis wrote:
> Hi,
> I am trying to set up a semi-continuous model with covariates and I
> want to also take into account temporal correlation as well. I have
> been going through the examples in the spde-tutorial, the code in
> chapter 8 (8.2) of the Blangiardo & Cameletti book and the code
> provided in the work of Godin et al. 2014, trying to combine bits and
> pieces to make it work for my data. I have also gone through the R-
> inla discussion group for a possible answer but I did not find
> something explicit with the keywords I search. The approach I am
> using is similar to that of Paradinas et al 2015 (Mar Ecol Prog Ser
> 528: 245–255). My data have a 6 year span and involve fish biomass
> locations (hence the semi/continuous approach as there can be
> stations with 0 catch).
>


you have some issues with 'out.z1', you plot it, and the component
'z.field' you'll see the huge marginal variability, which seems to make
the DIC hit the roof and get infinity....

--
Håvard Rue
he...@r-inla.org

Nikos Nikolioudakis

unread,
May 11, 2017, 11:06:25 AM5/11/17
to he...@r-inla.org, R-inla discussion group
Hi again and thank you both for your prompt responses.

Regarding the strategy="laplace" I have removed but I experience the same behavior. It takes a second run to the inla() call to pull through. The reason I initially used it was because I was following the guidelines of the book where it is stated:
‘We choose to use the strategy="laplace" because we have asymmetric non-Gaussian data and we need a more accurate approximation than the one provided by R-INLA as default …’

Regarding the DIC issue with ‘out.z1’ is this something that could be related to the mesh I am using, or the reason is something else (e.g. data-related)? I must point out here that I have uploaded only a subset of the data I am using.

Regarding the error: 


‘Error in parse.input.list(effects[[k]], input.ncol(A[[k]]), paste("Effect block ",  :
Effect block 2:
All variables must have unique names
Names: ('')’


it was encountered when I try to run:

stk.y2 <- inla.stack(data=list(Biomass_Pos=y, #for the single model
                                           alldata=cbind(y, NA), link=1), #for the joint model
                             A=list(A.2, 1),
                             effects=list(list(y.field=ind.2),
                                         list(rep(data.frame(y.intercept=1,covars),nrepl.2))),
                             tag="est.y")

 (Please use the RCode uploaded in this post as in the previous version this object was names stk.y instead of stk.y2 (sorry about that). However, this didn’t of course affect the call to inla.stack() as I could not move on with the code anyway. But I have corrected it for using it in the joint stack ‘stk.yz2’)

When I built new separate stacks with only amount or occurrence (i.e., not combined c(y,NA) or c(NA,z) in each of the separate stacks) and no joint stack of course in the end, these work well throwing no errors and calls to inla execute without any problems.

stk.y2 <- inla.stack(data=list(Biomass_Pos=y),
                             A=list(A.2, 1),
                             effects=list(ind.2,
                                         list(data.frame(y.intercept=1,covars))),
                             tag="est.y")

stk.z2 <- inla.stack(data=list(Biomass_PrAb=z),
                             A=list(A.2, 1),
                             effects=list(ind.2,
                                         list(data.frame(z.intercept=1,covars))),
                             tag="est.z")

 

However, when I try to create stacks like the ones below, I cannot understand what needs to be placed in the respective y and z fields. Moreover, I am not sure if in the list of effects the intercept+covariates data frame should have only one instance or it should be replicated (x times the n.repl) so as the number of rows to match…

stk.y3 <- inla.stack(data=list(Biomass_Pos=y, #for the single model
                                     alldata=cbind(y, NA), link=1), #for the joint model
                             A=list(A.2, 1),
                             effects=list(list(y.field=.........),
                                         list(data.frame(y.intercept=1,covars))),
                             tag="est.y")

stk.z3 <- inla.stack(data=list(Biomass_PrAb=z, #for the single model
                                     alldata=cbind(NA, z) , link=2), #for the joint model
                             A=list(A.2, 1),
                             effects=list(list(z.field=..........,
                                                    zc.field=..........),
                                         list(data.frame(z.intercept=1,covars))),
                             tag="est.z")

For stk.y3, if I use the following formulation I get no error  

stk.y3 <- inla.stack(data=list(Biomass_Pos=y, #for the single model
                                     alldata=cbind(y, NA), link=1), #for the joint model
                             A=list(A.2, 1),
                             effects=list(ind.2,
                                         list(data.frame(y.intercept=1,covars))),
                             tag="est.y")

and the use of 
formula.y3 <- Biomass_Pos ~ -1 +  y.intercept + Ttop50.std + Stop50.std + 
  f(s, model=spde.a, replicate=s.repl)

out.y3 <- inla(formula.y3, family="gamma", data=inla.stack.data(stk.y3),
               control.predictor=list(A=inla.stack.A(stk.y3), compute=TRUE, link=1), 
               control.compute=list(dic=TRUE, waic=TRUE, config=T))

works fine (meaning it does not throw errors)

BUT doing the same for 
stk.z3 <- inla.stack(data=list(Biomass_PrAb=z, #for the single model
                                     alldata=cbind(NA, z), link=2), #for the joint model
                             A=list(A.2, 1),
                             effects=list(list(z.field=ind.2,
                                                   zc.field=ind.2),
                                         list(data.frame(z.intercept=1,covars))),
                    tag="est.z")
throws:
Error in rbind(deparse.level, ...) : 
  numbers of columns of arguments do not match

so I cannot move on to creating a joint stack and running the joint model. 

Cheers,
Nikos
crashLog_removed_Laplace_option.txt
Rcode_question_updated.R

Finn Lindgren

unread,
May 11, 2017, 11:09:49 AM5/11/17
to Nikos Nikolioudakis, he...@r-inla.org, R-inla discussion group
Your call to
 rep(data.frame(...),...)
produces a list where each column of the data.frame occurs multiple times, with the same name each time.
Was the intent to repeat the values of each original column, within each column?
Then you need to do something else, such as repeated calls to rbind.
In any case, look at what your code produces as input to the stack call, and change it until it produces the _input_ that is needed.

Finn

On 11 May 2017, at 16:03, Nikos Nikolioudakis <ninik...@gmail.com> wrote:

Hi again and thank you both for your prompt responses.

Regarding the strategy="laplace" I have removed but I experience the same behavior. It takes a second run to the inla() call to pull through. The reason I initially used it was because I was following the guidelines of the book where it is stated:
‘We choose to use the strategy="laplace" because we have asymmetric non-Gaussian data and we need a more accurate approximation than the one provided by R-INLA as default …’

Regarding the DIC issue with ‘out.z1’ is this something that could be related to the mesh I am using, or the reason is something else (e.g. data-related)? I must point out here that I have uploaded only a subset of the data I am using.

Regarding the error:

‘Error in parse.input.list(effects[[k]], input.ncol(A[[k]]), paste("Effect block ",  :
Effect block 2:
All variables must have unique names
Names: ('')’



<crashLog_removed_Laplace_option.txt>
<Rcode_question_updated.R>

Nikos Nikolioudakis

unread,
May 16, 2017, 12:41:52 PM5/16/17
to Finn Lindgren, he...@r-inla.org, R-inla discussion group
Hi, and thank you again for your helpful comments. No that was not my intention :). I think my main problems were stemming from the wrong names I was passing to the weight indices and their structure (whether I was passing it as a list/data.frame or not. I think I now have sorted that out and what I needed to produce is the following:

1. For the model with just spatial effect (no temporal component) the following combined stack for the Gamma and the Bernoulli distribution

stk.y <- inla.stack(data=list(Biomass_Pos=y, #for the single model
                              alldata=cbind(y, NA), link=1), #for the joint model
                    A=list(A.1, 1),
                    effects=list(list(y.field=1:spde.a$n.spde),
                                 list(data.frame(y.intercept=1,covars))),
                    tag="est.y")

stk.z <- inla.stack(data=list(Biomass_PrAb=z, #for the single model
                              alldata=cbind(NA, z), link=2), #for the joint model
                    A=list(A.1, 1),
                    effects=list(list(z.field=1:spde.a$n.spde, zc.field=1:spde.a$n.spde),
                                 list(data.frame(z.intercept=1,covars))),
                    tag="est.z")

stk.yz <- inla.stack(stk.y, stk.z)

with call to inla 

formula.y1 <- Biomass_Pos ~ -1 +  y.intercept + 'name of each column in the covars data frame' + f(y.field, model=spde.a)

out.y1 <- inla(formula.y1, family="gamma", data=inla.stack.data(stk.y),
               control.predictor=list(A=inla.stack.A(stk.y), compute=TRUE, link=1), 
               control.compute=list(dic=TRUE, waic=TRUE, config=T)) 

formula.z1 <- Biomass_PrAb ~ -1 + z.intercept + 'name of each column in the covars data frame' + f(y.field, model=spde.a)

out.z1 <- inla(formula.z1, family="binomial", data=inla.stack.data(stk.z),
               control.predictor=list(A=inla.stack.A(stk.z), compute=TRUE, link=1),
               control.compute=list(dic=TRUE, waic=TRUE, config=T)) 

formula.yz1 <- alldata ~ -1 + y.intercept + z.intercept + covars + 
  f(y.field, model=spde.a) + f(z.field, model=spde.a) + f(zc.field, copy="y.field", fixed=FALSE)

out.yz1 <- inla(formula.yz1, family=c("gamma", "binomial"),
                data=inla.stack.data(stk.yz),
                control.predictor=list(A=inla.stack.A(stk.yz), compute=TRUE, link=inla.stack.data(stk.yz)$link),
                control.compute=list(dic=TRUE, waic=TRUE, config=TRUE))


2. For the model with a replicated spatial effect within years the following combined stack for the Gamma and the Bernoulli distribution

# spatial random effect replicated each of the years, constant within year

table(repl.2 <-  mac2011in$YEAR-(min(mac2011in$YEAR)-1))
nrepl.2 <- length(unique(mac2011in$YEAR))

dim(A.2 <- inla.spde.make.A(mesh.a, repl=repl.2, loc=Loc))
ind.2 <- inla.spde.make.index(name='s', n.spde=spde.a$n.spde, n.repl=nrepl.2)

stk.y2 <- inla.stack(data=list(Biomass_Pos=y, #for the single model
                               alldata=cbind(y, NA), link=1), #for the joint model
                     A=list(A.2, 1),
                     effects=list(ind.2,
                                  list(data.frame(y.intercept=1,covars))),
                     tag="est.y")

# I am skipping the separate model

stk.z2 <- inla.stack(data=list(Biomass_PrAb=z, #for the single model
                               alldata=cbind(NA, z), link=2), #for the joint model
                     A=list(A.2, 1),
                     effects=list(list(data.frame(z.field=ind.2,
                                                  zc.field=ind.2)),
                                  list(data.frame(z.intercept=1,covars))),
                     tag="est.z")
# I am skipping the separate model

stk.yz2 <- inla.stack(stk.y2, stk.z2)

formula.yz2 <- alldata ~ -1 + y.intercept + z.intercept + Ttop50.std + Stop50.std +
  f(s, model=spde.a, replicate=s.repl) + f(z.field.s, model=spde.a, replicate=z.field.s.repl) + 
  f(zc.field.s, copy="s", fixed = FALSE, replicate=zc.field.s.repl)

out.yz2 <- inla(formula.yz2, family=c("gamma", "binomial"),
                data=inla.stack.data(stk.yz2),
                control.predictor=list(A=inla.stack.A(stk.yz2), compute=TRUE, link=inla.stack.data(stk.yz2)$link), 
                control.compute=list(dic=TRUE, waic=TRUE, config=TRUE))


3. For the model with autoregressive correlation 
#the spatial random effect is correlated on consecutive years, constant within each year (autoregressive correlation)

table(Groups <-  mac2011in$YEAR-(min(mac2011in$YEAR)-1))
NGroups <- length(unique(mac2011in$YEAR))

dim(A.3 <- inla.spde.make.A(mesh.a, group = Groups, loc=Loc))
ind.3 <- inla.spde.make.index(name='s', n.spde=spde.a$n.spde, n.group=NGroups)

stk.y3 <- inla.stack(data=list(Biomass_Pos=y, #for the single model
                               alldata=cbind(y, NA), link=1), #for the joint model
                     A=list(A.3, 1),
                     effects=list(ind.3,
                                  list(data.frame(y.intercept=1,covars))),
                     tag="est.y")
# I am skipping the separate model

stk.z3 <- inla.stack(data=list(Biomass_PrAb=z, #for the single model
                               alldata=cbind(NA, z), link=2), #for the joint model
                     A=list(A.3, 1),
                     effects=list(data.frame(z.field=ind.3,
                                             zc.field=ind.3),
                                  list(data.frame(z.intercept=1,covars))),
                     tag="est.z")
# I am skipping the separate model

stk.yz3 <- inla.stack(stk.y3, stk.z3)

formula.yz3 <- alldata ~ -1 + y.intercept + z.intercept + Ttop50.std + Stop50.std +
  f(s, model=spde.a, group=s.group, control.group=list(model='ar1')) + 
  f(z.field.s, model=spde.a, group=z.field.s.group, control.group=list(model='ar1')) + 
  f(zc.field.s, copy="s", fixed = FALSE, group=zc.field.s.group, control.group=list(model='ar1'))

out.yz3 <- inla(formula.yz3, family=c("gamma", "binomial"),
                data=inla.stack.data(stk.yz3),
                control.predictor=list(A=inla.stack.A(stk.yz3), compute=TRUE, link=inla.stack.data(stk.yz3)$link), 
                control.compute=list(dic=TRUE, waic=TRUE, config=TRUE))



The issue with the huge huge marginal variability in out.z1 is still there when a common spatial field is used, but improves when the spatial field is replicated in each year, or when temporal auto-correlation is included. I am looking into what might be the cause of that, but otherwise everything else is performing with no errors or warning messages.

Do you see anything that might look strange in the new stacks and calls to inla? I am sure there is a lot of space for improvement, such as pc priors, a better mesh and I am experimenting with these and the complete dataset at the moment to see if the models improve, but it would be extremely useful to know whether the code up to now is correct. I am attaching the updated code along with the .RData in case someone wants to play around with it.

The issue with inla crashing when out.y0 is called still persists so I have attached the log for when it crashes and the log for when it passes executes successfully in case something could be figured out. 

Thanks again,
Nikos


--
You received this message because you are subscribed to a topic in the Google Groups "R-inla discussion group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/r-inla-discussion-group/6C_TQ85mtPg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to r-inla-discussion-group+unsub...@googlegroups.com.
To post to this group, send email to r-inla-discussion-group@googlegroups.com.
out_y0_pass_log.txt
out_y0_crash_log.txt
Rcode_question_updated.R
question.RData
Reply all
Reply to author
Forward
0 new messages