Spatio-temporal jointy (hurdle) model with covariates

Erin Urquhart

unread,

Oct 27, 2016, 11:43:56 AM10/27/16

to R-inla discussion group

Hello all,

I am trying to use the hurdle/jointly model frame work to estimate continuous data. I have followed the tutorial successfully for 1 discrete time step with covariates. My example is 532 sampling locations and 4 time steps with two predictors (temp and precitation.) also at each time step.

My question is how to build the joint model WITH covariates AND through time if I am trying to predict for the last time step? Thanks!

Attached is the dataset and the border shapefile, and below is my code:

FL_data6<-read.csv("small4.csv",header=TRUE)

FL_border=readOGR(dsn="FL_crop.shp",layer="FL_crop") ## spatial polygon of boundary

loc<-cbind(FL_data6$LONG,FL_data6$LAT) #Stores the lat/long of each observation site

ob<-unionSpatialPolygons(FL_border,rep(1,nrow(FL_border))) #This simplifies and unifies the boundary for faster processing

bound<-inla.sp2segment(ob) #Converts the polygon boundary to a inla.mesh.segment object, which is a series of line segments

bound$loc[,1]<-sapply(bound$loc[,1], function (x) x/1000) #Convert meters to kilometers

bound$loc[,2]<-sapply(bound$loc[,2], function (x) x/1000)

##### MAKE MESH

bnd<-inla.nonconvex.hull(loc,convex=8,resolution=(35)) #Makes a nonconvex hull around the points rather than the spatial extent of Florida

mesh=inla.mesh.2d(boundary=bnd,cutoff=2,max.edge=c(2,8))

spde = inla.spde2.matern(mesh=mesh, alpha=2)

A <- inla.spde.make.A(mesh, loc=as.matrix(FL_data6[,2:3]))

#set up two datasets! z is binomial occurence dataset, y is concentration over 10232.93!

day<-4

z<-(FL_data6[,day]>10232.93)+0

y<-ifelse(FL_data6[,day]>10232.93,FL_data6[,day],NA)

stk.z <- inla.stack(tag='est.z', data=list(z=z, ### occurrence for separate model y=cbind(z, NA)), ### z at first column of y A=list(A, 1), effects=list( list(i.z=1:spde$n.spde), list(z.b0=rep(1,length(z)))))

stk.y <- inla.stack(tag='est.y', data=list(r=y, ### rainfall for separate model y=cbind(NA, y)), ### at second column A=list(A, 1), effects=list( list(i.y=1:spde$n.spde), list(y.b0=rep(1,length(y)))))

res.z <- inla(z ~ 0 + z.b0 + f(i.z, model=spde), family='binomial', data=inla.stack.data(stk.z), control.compute=list(dic=TRUE), control.predictor=list(A=inla.stack.A(stk.z), compute=TRUE)) res.y <- inla(r ~ 0 + y.b0 + f(i.y, model=spde), family='gamma', data=inla.stack.data(stk.y), control.compute=list(dic=TRUE), control.predictor=list(A=inla.stack.A(stk.y), compute=TRUE))

stk.zy <- inla.stack(stk.z, stk.y)

res.yz <- inla(y ~ 0 + z.b0 + y.b0 + f(i.y, model=spde) + f(i.z, copy='i.y', fixed=FALSE), family=c('binomial', 'gamma'), data=inla.stack.data(stk.zy), control.compute=list(dic=TRUE), control.predictor=list(A=inla.stack.A(stk.zy), compute=TRUE))

small4.csv

FL_crop.cpg

FL_crop.dbf

FL_crop.sbn

FL_crop.sbx

FL_crop.shp

FL_crop.shx

Elias T. Krainski

unread,

Oct 28, 2016, 4:13:46 PM10/28/16

to r-inla-disc...@googlegroups.com

You can try from the example in the attached code.
Elias

spde-tutorial-st-semicontinuous.R

Erin Urquhart

unread,

Oct 31, 2016, 12:40:17 PM10/31/16

to R-inla discussion group

Thanks Elias! I am running this now. My question of how to make predictions for the last time step still remains. Would I add that into the stack? Something like this?

### observation structure, predictions

A.pred<-inla.spde.make.A(mesh, group=4)

## pred for stack

stk.p <- inla.stack(tag='pred',

data=list(y=NA),

A=list(A.pred),

effects=

list(c(ix1,ix2,iu),

data.frame(y.b0=1, z.b0=1,

Temp.y=FL_data4$Atemp)))

Thank you for your help!

-Erin

Erin Urquhart

unread,

Nov 1, 2016, 10:26:24 AM11/1/16

to R-inla discussion group

Hey all, I have successfully ran the binomia/gamma hurdle model with independent covariates and an ar1 time step. In terms of getting a prediction I have simply added NAs to the datapoints at the last time step that I am trying to predict. I have 8 weeks that the model is being trained on. I NA out the last week and let the model fill those in.

Is this correct for predictions?

field_pred_mean<-exp(res$summary.linear.predictor[inla.stack.index(jdat,"est.y")$data, "mean"])

field_pred_sd<-exp(res$summary.linear.predictor[inla.stack.index(jdat,"est.y")$data, "sd"])

-Erin

Elias T. Krainski

unread,

Nov 1, 2016, 10:31:19 AM11/1/16

to r-inla-disc...@googlegroups.com

You have to consider

field_pred_mean<-exp(res$summary.linear.predictor[inla.stack.index(jdat,"pred")$data,
"mean"])
field_pred_sd<-exp(res$summary.linear.predictor[inla.stack.index(jdat,"pred")$data,
"sd"])

where the the tag for the prediction data stack is considered.

Elias

Finn Lindgren

unread,

Nov 1, 2016, 10:37:45 AM11/1/16

to Erin Urquhart, R-inla discussion group

No, the standard deviations definitely don't transform in that way!

You'd be be better off using fitted.values to properly apply the link function, and that just requires that you set the "link" parameter appropriately. Add link=1 to all the data lists in the stack calls, e.g.

data=list(y=NA, link=1)

for the prediction stack, and add

link=inla.stack.data(stack)$link

to the control.predictor parameter list in the inla() call, I believe.

Finn

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To post to this group, send email to r-inla-disc...@googlegroups.com.
Visit this group at https://groups.google.com/group/r-inla-discussion-group.
For more options, visit https://groups.google.com/d/optout.

Elias T. Krainski

unread,

Nov 1, 2016, 10:40:30 AM11/1/16

to r-inla-disc...@googlegroups.com

On 11/01/2016 12:37 PM, Finn Lindgren wrote:

No, the standard deviations definitely don't transform in that way!

Oops... The correct code to extract the predictions is then

field_pred_mean <- res$summary.linear.predictor[inla.stack.index(jdat,"pred")$data, "mean"]

field_pred_sd <- res$summary.linear.predictor[inla.stack.index(jdat,"pred")$data, "sd"]

Elias

Elias T. Krainski

unread,

Nov 1, 2016, 10:43:14 AM11/1/16

to r-inla-disc...@googlegroups.com

On 11/01/2016 12:37 PM, Finn Lindgren wrote:

No, the standard deviations definitely don't transform in that way!

Still wrong in my last email, now it goes correct, that is consider fitted.values and the correct tag

field_pred_mean <- res$summary.fitted.values[inla.stack.index(jdat,"pred")$data, "mean"]

field_pred_sd <- res$summary.fitted.values[inla.stack.index(jdat,"pred")$data, "sd"]

Elias T. Krainski

unread,

Nov 1, 2016, 10:43:37 AM11/1/16

to r-inla-disc...@googlegroups.com

On 11/01/2016 12:37 PM, Finn Lindgren wrote:

No, the standard deviations definitely don't transform in that way!

Still wrong in my last email, now it goes correct, that is consider fitted.values, the correct tag and without exp, as it is already fitted.values

Finn Lindgren

unread,

Nov 1, 2016, 10:44:11 AM11/1/16

to Elias T. Krainski, r-inla-disc...@googlegroups.com

For predictions using the link function, the indexing needs to be into

summary.fitted.values, not summary.linear.predictor.

Finn

Erin Urquhart

unread,

Nov 1, 2016, 10:47:40 AM11/1/16

to R-inla discussion group

Thanks Finn,

Ok, updated code is attached here. Please feel free to edit.. Attention to the pred. stack and whether or not I can run the model with the entire time series (even for the last time step that I'm trying to predict), or if I should NA out the response variable for the last time step.

-Erin

On Tuesday, November 1, 2016 at 10:44:11 AM UTC-4, Finn Lindgren wrote:

For predictions using the link function, the indexing needs to be into
summary.fitted.values, not summary.linear.predictor.

Finn

On 1 Nov 2016, at 14:40, Elias T. Krainski <eliask...@gmail.com> wrote:

On 11/01/2016 12:37 PM, Finn Lindgren wrote:

No, the standard deviations definitely don't transform in that way!

Oops... The correct code to extract the predictions is then

field_pred_mean <- res$summary.linear.predictor[inla.stack.index(jdat,"pred")$data, "mean"]

field_pred_sd <- res$summary.linear.predictor[inla.stack.index(jdat,"pred")$data, "sd"]

Elias

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.

To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion-group+unsub...@googlegroups.com.

INLA-hurdle2.R

Finn Lindgren

unread,

Nov 1, 2016, 10:59:00 AM11/1/16

to Erin Urquhart, R-inla discussion group

Hi,

I haven't followed the entire thread, but I think you should add n.group=k to all the inla.spde.make.A() calls so that they are setup to connect to the whole k-step spde model; otherwise the A-matrix for the observations will think there are at most 7 timesteps in the model, since it doesn't magically know that you actually want to build a model that extends beyond that, to k=8.

Predictions are setup exactly as ordinary observations, except that all the response values are set to NA, so that seems to be ok in your code now.

Finn

To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.

To post to this group, send email to r-inla-disc...@googlegroups.com.
Visit this group at https://groups.google.com/group/r-inla-discussion-group.
For more options, visit https://groups.google.com/d/optout.

<INLA-hurdle2.R>

INLA help

unread,

Nov 1, 2016, 11:00:07 AM11/1/16

to Finn Lindgren, Erin Urquhart, R-inla discussion group

On Tue, 2016-11-01 at 14:58 +0000, Finn Lindgren wrote:
> Hi,
> I haven't followed the entire thread, but I think you should add
> n.group=k to all the inla.spde.make.A() calls so that they are setup
>

ngroup=k

--
Håvard Rue
he...@r-inla.org

Finn Lindgren

unread,

Nov 1, 2016, 11:01:07 AM11/1/16

to Erin Urquhart, R-inla discussion group

Further comment:

posterior expectations and std.dev.s can be a bit non-robust when using exponential link functions, so I prefer looking at the median and quantiles instead.

Finn

On 1 Nov 2016, at 14:47, Erin Urquhart <erin.u...@gmail.com> wrote:

To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.

To post to this group, send email to r-inla-disc...@googlegroups.com.
Visit this group at https://groups.google.com/group/r-inla-discussion-group.
For more options, visit https://groups.google.com/d/optout.

<INLA-hurdle2.R>

Erin Urquhart

unread,

Nov 1, 2016, 11:01:51 AM11/1/16

to R-inla discussion group

OK, when I try to stack the pred tag I get this error. I think it has to do with my effect list.

stk.p <- inla.stack(tag='pred',

data=list(y=NA,link=1),

A=list(A.pred,1),

effects=

list(

c(ix1,ix2,iu),

data.frame(y.b0=1, z.b0=1,

Temp.y=FL_dataPRED$Atemp,

Prec.y=FL_dataPRED$Precip)))

Here is the error:

Error in (function (data, A, effects, tag = "", compress = TRUE, remove.unused = TRUE) :

Row count mismatch for A: 423,5560

Thanks!

Finn Lindgren

unread,

Nov 1, 2016, 11:06:06 AM11/1/16

to he...@r-inla.org, Erin Urquhart, R-inla discussion group

n.group, not ngroup, for inla.spde.make.A

Finn

INLA help

unread,

Nov 1, 2016, 11:07:42 AM11/1/16

to Finn Lindgren, Erin Urquhart, R-inla discussion group

On Tue, 2016-11-01 at 15:06 +0000, Finn Lindgren wrote:
> n.group, not ngroup, for inla.spde.make.A
>

aaah, my fault. an inconsistency in our coding ;-)

--
Håvard Rue
he...@r-inla.org

Finn Lindgren

unread,

Nov 1, 2016, 11:07:50 AM11/1/16

to Erin Urquhart, R-inla discussion group

What's mesh$n ? What's the dim() of A.pred? What's the dim() of the Atemp matrix, and length() of Precip?

Finn

To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.

Finn Lindgren

unread,

Nov 1, 2016, 11:12:56 AM11/1/16

to Erin Urquhart, R-inla discussion group

You need the prediction covariate vectors to have length matching the number of prediction points you have (just as the covariate values are given as one value for each data value in the "est" stacks). When doing prediction, just pretend that you actually have data at the prediction points and set up everything up for that situation, except setting y=NA (since you don't actually have such data).

Finn

On 1 Nov 2016, at 15:01, Erin Urquhart <erin.u...@gmail.com> wrote:

To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.

Erin Urquhart

unread,

Nov 1, 2016, 1:11:01 PM11/1/16

to R-inla discussion group, erin.u...@gmail.com

Thanks Finn,

Ok, I fixed that error. Where I'm confused is specifically in which effects to list in the prediction stack. For the z (binomial) and y (gamma) stacks there are different effects (ie.. ix1, ix2, iu & z.b0, y.b0). So for the prediction stack wouldn't I have to take into account both of those stacks? That then causes an issue in stacking them as there is a name mismatch. Do I make a prediction stack for z and y separately? The hurdle approach is confusing me. See below code. Red is where I'm getting caught up.

Also, one last question just for clarification. Should the dataset for training have values in the last time step? Or do those need to be NAed out? I have 8 weeks of data. I am trying to predict the 8th week. Does INLA know based off of n.group=8 not to include the last week in the model training?

ix1 <- inla.spde.make.index('i', spde$n.spde, n.group=k)

ix2 <- inla.spde.make.index('ic', spde$n.spde, n.group=k)

iu <- inla.spde.make.index('j', spde$n.spde, n.group=k)

#####

A <- inla.spde.make.A(

mesh=mesh,

loc= as.matrix(FL_data7[,3:4]),

n.group=k)

A.pred<-inla.spde.make.A(mesh=mesh,loc= as.matrix(FL_data7[,3:4]),

n.group=k)

#######################EXAMPLE TO TRY with time component (ar1) and predictor (ATEMP, PRECIP)

stk.z <- inla.stack(tag='est.z',

data=list(y=cbind(z, NA), link=1),

A=list(A, 1),

effects=list(

ix1,

data.frame(z.b0=1,

Temp.z=FL_data7$Atemp,Prec.z=FL_data7$Precip)))

stk.y <- inla.stack(tag='est.y',

data=list(y=cbind(NA, y),

link=1),

A=list(A, 1),

effects=list(

c(ix2, iu),

data.frame(y.b0=1,

Temp.y=FL_data7$Atemp,Prec.y=FL_data7$Precip)))

## NEED HELP HERE!!!

### PRED STACK... but I'm not sure how to do the effects here...

stk.p <- inla.stack(tag='pred',

data=list(y=NA,link=1),

A=list(A.pred,1),

effects=

list(

c(ix1,ix2,iu),

data.frame(y.b0=1, z.b0=1,

Temp.y=FL_data7$Atemp,

Prec.y=FL_data7$Precip)))

jdat <- inla.stack(stk.z, stk.y, stk.p)

Thanks!

-Erin

Erin Urquhart

unread,

Nov 1, 2016, 1:12:43 PM11/1/16

to R-inla discussion group, erin.u...@gmail.com

Now that I'm looking at it, I probably need to include the Temp.z and Prec.z field too in the prediction effects? Part in blue?

stk.p <- inla.stack(tag='pred',

data=list(y=NA,link=1),

A=list(A.pred,1),

effects=

list(

c(ix1,ix2,iu),

data.frame(y.b0=1, z.b0=1,

Temp.y=FL_data7$Atemp,

Prec.y=FL_data7$Precip)))

Finn Lindgren

unread,

Nov 1, 2016, 1:34:29 PM11/1/16

to Erin Urquhart, R-inla discussion group

On 1 Nov 2016, at 17:11, Erin Urquhart <erin.u...@gmail.com> wrote:

Ok, I fixed that error. Where I'm confused is specifically in which effects to list in the prediction stack. For the z (binomial) and y (gamma) stacks there are different effects (ie.. ix1, ix2, iu & z.b0, y.b0). So for the prediction stack wouldn't I have to take into account both of those stacks? That then causes an issue in stacking them as there is a name mismatch.

Ah, I missed that you have two likelihoods. You should set link=1 for stk.z and link=2 for stk.y (but I think they both have the same link, so it won't actually make a difference in this case).

Do I make a prediction stack for z and y separately?

Yes, that would be easiest. For z-prediction, just include the covariates etc that make sense for z, and for y-prediction, just include the ones that make sense for y. The stack functions will join the stacks together as needed by adding NA vectors for "missing" effects. (NA in an effect vector means "no effect")

Also, one last question just for clarification. Should the dataset for training have values in the last time step? Or do those need to be NAed out? I have 8 weeks of data. I am trying to predict the 8th week.

When you say "data", do you mean just covariate, or also response variables?

The way to think about these models is that they exist wether you observe the responses or not. So build the model to include any time steps that need to exist to make a coherent universe.

Does INLA know based off of n.group=8 not to include the last week in the model training?

If you only have non-NA response variables for time steps 2 through 7, there is nothing to "train" for time step 8. The NA response "values" in the prediction stacks mean "these were not observed".

Finn

To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.

Erin Urquhart

unread,

Nov 1, 2016, 2:38:06 PM11/1/16

to R-inla discussion group, erin.u...@gmail.com

On Tuesday, November 1, 2016 at 1:34:29 PM UTC-4, Finn Lindgren wrote:

On 1 Nov 2016, at 17:11, Erin Urquhart <erin.u...@gmail.com> wrote:

Ok, I fixed that error. Where I'm confused is specifically in which effects to list in the prediction stack. For the z (binomial) and y (gamma) stacks there are different effects (ie.. ix1, ix2, iu & z.b0, y.b0). So for the prediction stack wouldn't I have to take into account both of those stacks? That then causes an issue in stacking them as there is a name mismatch.

Ah, I missed that you have two likelihoods. You should set link=1 for stk.z and link=2 for stk.y (but I think they both have the same link, so it won't actually make a difference in this case).

Do I make a prediction stack for z and y separately?

Yes, that would be easiest. For z-prediction, just include the covariates etc that make sense for z, and for y-prediction, just include the ones that make sense for y. The stack functions will join the stacks together as needed by adding NA vectors for "missing" effects. (NA in an effect vector means "no effect")

OK, but I fit the models jointly right, thus a "joint" model? In the precipitation example in the spde tutorial they fit the models separately then jointly. I am not looking for separately predictions of z and y, but rather 1 overall prediction based on the z and y likelihoods. So for the 1 prediction stack and 1 joint model how do I include the different inla.spde.make.index from z and y? That is where I am getting the name mismatch error. This is highlighted in the red part of my previous included code.

Also, one last question just for clarification. Should the dataset for training have values in the last time step? Or do those need to be NAed out? I have 8 weeks of data. I am trying to predict the 8th week.

When you say "data", do you mean just covariate, or also response variables?
The way to think about these models is that they exist wether you observe the responses or not. So build the model to include any time steps that need to exist to make a coherent universe.

By data I mean both covariate values and response variables. I have values for all 8 weeks. Ideally this would eventually be an operational model where I would not have response variables or covariate models. As I am doing hindcast prediction right now, I have the data for the "next" week.

Does INLA know based off of n.group=8 not to include the last week in the model training?

If you only have non-NA response variables for time steps 2 through 7, there is nothing to "train" for time step 8. The NA response "values" in the prediction stacks mean "these were not observed".

Finn

Erin

Finn Lindgren

unread,

Nov 1, 2016, 2:54:53 PM11/1/16

to Erin Urquhart, R-inla discussion group

> OK, but I fit the models jointly right, thus a "joint" model? In the precipitation example in the spde tutorial they fit the models separately then jointly. I am not looking for separately predictions of z and y, but rather 1 overall prediction based on the z and y likelihoods.

Well, the predictions are not on the data level, but rather on the linear model level, but apart from that, yes, the modeling and estimation is done jointly. However, the predictions are made for _something_, and if I understand correctly that would be for your GLM predictor models, of which there is one for z and one for y. Or I'm misunderstanding what you are trying to predict. You may need to spell out the actual mathematics of what your model is; R code only goes so far, and if it doesn't already describe what you want, then there is no way to figure it out without a proper model definition.

> So for the 1 prediction stack and 1 joint model how do I include the different inla.spde.make.index from z and y? That is where I am getting the name mismatch error. This is highlighted in the red part of my previous included code.

Please provide the exact error message.

Can you write down the two separate formula definitions you would have used if you were modeling z and y separately, as well as the joint model formula? The terms in those first 2 subformulas should be what you need to put in the respective stacks.
The joint model formula is just the union of those two formulas. This means that any effects that should be in both submodels must use the same names, and any effect that only should occur in one of the submodels must use a name that does not occur in the other submodel. Each substack should then only include names used corresponding subformula.

> By data I mean both covariate values and response variables. I have values for all 8 weeks.

As I said, the mathematical model exists independently of any response variables. Prediction just means "for these values of the covariates (including spatial locations etc), what would I expect the observances to be?".

> Ideally this would eventually be an operational model where I would not have response variables or covariate models. As I am doing hindcast prediction right now, I have the data for the "next" week.

You can't do prediction (in time) without a model. In particular, if your model includes covariates, you must have them for the time/locations you want to do a prediction. If not, you should also have to build a prediction model for the covariates themselves.
(As I said, I haven't followed the entire thread, so I may be missing something obvious)

Finn

Erin Urquhart

unread,

Nov 2, 2016, 3:25:33 PM11/2/16

to R-inla discussion group, erin.u...@gmail.com

Ok, for a joint logistic(binomial)/gamma regression model does this look right? I'm a bit unfamiliar with formulas in this sense with gamma likelihoods.

z=occurence

z(s)=beta1*Temp(s)+beta2*Precip(s) +spde(s)

Pred_z~exp(z(s))/(1+exp(z(s))

y=concentraton

y(s)=gamma * spde(s) +beta1*Temp(s) + beta2*Precip(s)

Pred_y=exp(y(s))

Will I use these to get the simulated predictions? Or do I have to take the exponent of them?

field_pred_mean <- res$summary.fitted.values[inla.stack.index(jdat,"pred.y")$data, "mean"]

field_pred_sd <- res$summary.fitted.values[inla.stack.index(jdat,"pred.y")$data, "sd"]

Thanks!

-Erin

Finn Lindgren

unread,

Nov 2, 2016, 5:04:58 PM11/2/16

to Erin Urquhart, R-inla discussion group

On 2 Nov 2016, at 19:25, Erin Urquhart <erin.u...@gmail.com> wrote:

Ok, for a joint logistic(binomial)/gamma regression model does this look right? I'm a bit unfamiliar with formulas in this sense with gamma likelihoods.

z=occurence
z(s)=beta1*Temp(s)+beta2*Precip(s) +spde(s)
Pred_z~exp(z(s))/(1+exp(z(s))

y=concentraton
y(s)=gamma * spde(s) +beta1*Temp(s) + beta2*Precip(s)
Pred_y=exp(y(s))

In the usual inla math notation z(s) above would be etaz(s), and the observations would be z(s) ~ Bin(1, Pred_z(s)), and similarly for y with a gamma distribution, but yes, that looks ok to me.

Will I use these to get the simulated predictions? Or do I have to take the exponent of them?

field_pred_mean <- res$summary.fitted.values[inla.stack.index(jdat,"pred.y")$data, "mean"]
field_pred_sd <- res$summary.fitted.values[inla.stack.index(jdat,"pred.y")$data, "sd"]

Since your prediction stacks are now setup with the proper link= information, fitted.values will already have the exponential applied, so yes, the above will give you the posterior mean and sd for Pred_y.

Finn

Thanks!
-Erin

On Tuesday, November 1, 2016 at 2:54:53 PM UTC-4, Finn Lindgren wrote:

> OK, but I fit the models jointly right, thus a "joint" model? In the precipitation example in the spde tutorial they fit the models separately then jointly. I am not looking for separately predictions of z and y, but rather 1 overall prediction based on the z and y likelihoods.

Well, the predictions are not on the data level, but rather on the linear model level, but apart from that, yes, the modeling and estimation is done jointly. However, the predictions are made for _something_, and if I understand correctly that would be for your GLM predictor models, of which there is one for z and one for y. Or I'm misunderstanding what you are trying to predict. You may need to spell out the actual mathematics of what your model is; R code only goes so far, and if it doesn't already describe what you want, then there is no way to figure it out without a proper model definition.

> So for the 1 prediction stack and 1 joint model how do I include the different inla.spde.make.index from z and y? That is where I am getting the name mismatch error. This is highlighted in the red part of my previous included code.

Please provide the exact error message.

Can you write down the two separate formula definitions you would have used if you were modeling z and y separately, as well as the joint model formula? The terms in those first 2 subformulas should be what you need to put in the respective stacks.
The joint model formula is just the union of those two formulas. This means that any effects that should be in both submodels must use the same names, and any effect that only should occur in one of the submodels must use a name that does not occur in the other submodel. Each substack should then only include names used corresponding subformula.

> By data I mean both covariate values and response variables. I have values for all 8 weeks.

As I said, the mathematical model exists independently of any response variables. Prediction just means "for these values of the covariates (including spatial locations etc), what would I expect the observances to be?".

> Ideally this would eventually be an operational model where I would not have response variables or covariate models. As I am doing hindcast prediction right now, I have the data for the "next" week.

You can't do prediction (in time) without a model. In particular, if your model includes covariates, you must have them for the time/locations you want to do a prediction. If not, you should also have to build a prediction model for the covariates themselves.
(As I said, I haven't followed the entire thread, so I may be missing something obvious)

Finn

--

You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.

To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.

Erin Urquhart

unread,

Nov 9, 2016, 9:26:24 AM11/9/16

to R-inla discussion group

Hi Elias,

When I follow the provided code I get this error when running inla: Any ideas? Can you explain what the difference between the three indexes are? ix1, ix2, and iu?

Error in inla(form, family = c("binomial", "gamma"), data = inla.stack.data(jdat), :

Model component 'f(j, ...)' have only NA values in 'j'. In this case, either argument 'n' or 'values' must be spesified.

Here is my code:

library(sp) # classes for spatial data

###########

FL_data1<-read.csv("FL_data22.csv",header=TRUE)

FL_data1<-FL_data1[,2:9] #correct data file removes leading columns

## define coordinates

FL_data4<-FL_data1[which(FL_data1$LONG<1462.861),]

FL_data5<-FL_data4[which(FL_data4$LONG>1451.527),]

FL_data6<-FL_data5[which(FL_data5$LAT<599.311),]

FL_data7<-FL_data6[which(FL_data6$LAT>584.225),]

rm(FL_data4)

rm(FL_data5)

rm(FL_data6)

coords<-FL_data7[1:695,4:5]

FL_data8<-FL_data7[which(FL_data7$time>14),]

FL_data8<-FL_data8[1:4865,]

#set up two datasets for Binomial/Gamma Hurdel model

z<-(FL_data8$Cyano>10232.93)+0

y<-ifelse(FL_data8$Cyano<10232.93,NA,1)*FL_data8$Cyano

###########################

mean_covariates2 = apply(FL_data8[,6:7],2,mean)

sd_covariates2 = apply(FL_data8[,6:7],2,sd)

FL_data8[,6:7] =

scale(FL_data8[,6:7],

mean_covariates2, sd_covariates2)

########### SET UP INLA...

loc<-cbind(coords$LONG,coords$LAT) #Stores the lat/long of each observation site

##### MAKE MESH

bnd<-inla.nonconvex.hull(loc,convex=1,resolution=(25)) #Makes a nonconvex hull around the points rather than the spatial extent of Florida

mesh=inla.mesh.2d(boundary=bnd,cutoff=.51,max.edge=c(1,8))

mesh$n

n_stations <- length(coords$LAT)

n_data <- length(FL_data8$Station.ID)

n_days <- as.integer(n_data/n_stations)

FL_data8$time = rep(1:n_days,each = n_stations)

i_day = 8

#Setting priors on the SPDE for variance and range

A <-inla.spde.make.A(mesh, loc=as.matrix(coords[,1:2]))

sigma0 = 1

size = min(c(diff(range(mesh$loc[, 1])), diff(range(mesh$loc[, 2]))))

range0 = size / 5

kappa0 = sqrt(8) / range0

tau0 = 1 / (sqrt(4 * pi) * kappa0 * sigma0)

spde = inla.spde2.matern(mesh,alpha=2,

B.tau = cbind(log(tau0), -1, +1),

B.kappa = cbind(log(kappa0), 0, -1),

theta.prior.mean = c(0, 0),

theta.prior.prec = c(0.1, 1) )

#####

A.est <- inla.spde.make.A(

mesh=mesh,

loc=

as.matrix(coords[FL_data8$Station.ID,

c("LONG","LAT")]),

group=FL_data8$time,

n.group=n_days

)

#### make index

ix1 <- inla.spde.make.index('i', spde$n.spde, n.group=n_days)

ix2 <- inla.spde.make.index('ic', spde$n.spde, n.group=n_days)

iu <- inla.spde.make.index('j', spde$n.spde, n.group=n_days)

#####

#######################EXAMPLE TO TRY with time component (ar1) and predictor (ATEMP, PRECIP) ############################

stk.z <- inla.stack(tag='est.z',

data=list(y=cbind(z, NA), link=1),

A=list(A.est, 1),

effects=list(

ix1,

data.frame(z.b0=1,Temp.z=FL_data8$Atemp,Prec.z=FL_data8$Precip)))

stk.y <- inla.stack(tag='est.y',

data=list(y=cbind(NA, y),

link=2),

A=list(A.est, 1),

effects=list(

c(ix2, iu),

data.frame(y.b0=1,Temp.y=FL_data8$Atemp,Prec.y=FL_data8$Precip)))

jdat <- inla.stack(stk.z, stk.y)#, pred.z,pred.y)

## set up model

form <- y ~ 0 + z.b0 + y.b0 + Temp.z + Prec.z + Temp.y+Prec.y+

f(i, model=spde, ##'generic0', Cmatrix=spde$param.inla$M2,

group=i.group, control.group=list(model='ar1')) +

f(ic, copy='i', group=ic.group, fixed=FALSE) +

f(j, model=spde, group=j.group, control.group=list(model='ar1'))

### run INLA model

res <- inla(form, family=c('binomial', 'gamma'),

data=inla.stack.data(jdat),

verbose=TRUE,

control.inla=list(strategy='gaussian'),

control.predictor=list(A=inla.stack.A(jdat),

compute=TRUE,

link=inla.stack.data(jdat)$link))

print(summary(res))

On Friday, October 28, 2016 at 4:13:46 PM UTC-4, Elias T. Krainski wrote:

Finn Lindgren

unread,

Nov 9, 2016, 10:06:56 AM11/9/16

to Erin Urquhart, R-inla discussion group

Hi,

Can you provide

summary(iu$j)

and

summary(inla.stack.data(jdat)$j)

?

Finn

--

Erin Urquhart

unread,

Nov 9, 2016, 10:10:59 AM11/9/16

to R-inla discussion group, erin.u...@gmail.com

> summary(inla.stack.data(jdat)$j)

Min. 1st Qu. Median Mean 3rd Qu. Max. NA's

NA NA NA NaN NA NA 9480

> summary(iu$j)

Min. 1st Qu. Median Mean 3rd Qu. Max.

1.0 107.0 213.5 213.5 320.0 426.0

Thanks Finn.

To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion-group+unsub...@googlegroups.com.

Nikos Nikolioudakis

unread,

Sep 16, 2017, 7:21:36 AM9/16/17

to R-inla discussion group

I am facing a problem related to what's previously discussed in this thread.
In short, I am trying to set up a spatio-temporal hurdle model, where I have 6 years of data, but a varying number and location of stations within each year.
I followed Elias' code (supplied further up this thread) and tried to adjust it to my data (given the varying number and location of stations).
When running the model I get the following error, that I fail to resolve:

> out.zy3 <- inla(formula.zy3, family=c("binomial","Gamma"),

+ data=zy3.data,

+ control.predictor=list(A=inla.stack.A(stk.zy3), compute=TRUE, link=zy3.data$link),

+ control.compute=list(dic=TRUE, waic=TRUE, config=TRUE))

Error in inla(formula.zy3, family = c("binomial", "Gamma"), data = zy3.data, :

Error in the values of `group'; not in [1,...,6]

I have attached for ease the RData and the script I am using.

When creating
A3 <- inla.spde.make.A(mesh, group = Groups, loc=plotLoc), where Groups are the row indices of annual locations for the group model (i.e a vector of values from 1 to 6 with each value having a length equal to the annual number of stations sampled), I get the above error.

Changing to:
A3 <- inla.spde.make.A(mesh, group = Groups, loc=plotLoc, n.group = NGroups) , with NGroups as the number of years (6), also throws the same error.

To my surprise, when I tried this:

A3 <- inla.spde.make.A(mesh, n.group = NGroups, loc=plotLoc) in which the index for each row is not explicitly provided,
the model doesn't throw any errors and I get results. I was under the impression that in the inla.spde.make.A function the row indices (i.e group) should be supplied.

I would greatly appreciate some clarification on this, as I am unsure if the output of the model is actually what it would be if the row indexing was used (and not throwing errors).

Nikos

ar1.R

ar1.RData

Elias T. Krainski

unread,

Sep 16, 2017, 9:25:24 AM9/16/17

to r-inla-disc...@googlegroups.com

There is a tiny detail in making the formula. The coply argument should
be a character, as you did. However, the group argument not. So, just
having

" + f(xi.c.AR1, copy='xi.AR1', group=xi.c.AR1.group, fixed=FALSE)",

make it to work with the correct specification of the A3, which is

A3 <- inla.spde.make.A(mesh, loc=plotLoc, group = Groups, n.group=NGroups)

best,
Elias

Nikos Nikolioudakis

unread,

Sep 17, 2017, 7:13:25 AM9/17/17

to R-inla discussion group

aaaahhhh...these tiny mistakes...
Thanks a million Elias, it works perfect now...
Best,
Nikos

Kimberly

unread,

Oct 25, 2017, 10:20:38 PM10/25/17

to R-inla discussion group

Hi, I had a question relating to the earlier questions in this thread about predictions from hurdle models.

In JAGS, to combine the predictions from binomial and gamma models, I would define a presence/ absence dummy variable conditional on the probability from the binomial prediction (w[i] ~ Bernoulli(p[i])) and define the likelihood as (1- w[i]) * gamma likelihood.

Is it possible to do something similar in INLA? I assume would need to be through some application of make.lincomb?

Many thanks, Kimberly

Helpdesk

unread,

Oct 26, 2017, 6:54:54 AM10/26/17

to Kimberly, R-inla discussion group

On Wed, 2017-10-25 at 19:20 -0700, Kimberly wrote:
> Hi, I had a question relating to the earlier questions in this thread
> about predictions from hurdle models.
>
> In JAGS, to combine the predictions from binomial and gamma models, I
> would define a presence/ absence dummy variable conditional on the
> probability from the binomial prediction (w[i] ~ Bernoulli(p[i])) and
> define the likelihood as (1- w[i]) * gamma likelihood.
>
> Is it possible to do something similar in INLA? I assume would need
> to be through some application of make.lincomb?

sure, this is indeed possible. you'll have two likelihoods, Bern and Gamma.
if the response=y > 0, then you know that w=0 and Gamma(y), if y=0, then you know that w=1.

there is an example of this in the SPDE_manual about rain model, using this trick.

H

--
Håvard Rue
Helpdesk
he...@r-inla.org

Elias T Krainski

unread,

Oct 26, 2017, 7:14:18 AM10/26/17

to r-inla-disc...@googlegroups.com

The Bernoulli, Gamma example is now in the Chapter 8 of the INLA book

https://sites.google.com/a/r-inla.org/stbook/

Elias

Kimberly

unread,

Nov 7, 2017, 10:50:00 PM11/7/17

to R-inla discussion group

Thanks so much for pointing this out to me - I had missed this example in the book and it was very useful.

I've managed to run the full hurdle model with part of my dataset but am still having trouble running with the full dataset (~ 400,000 observations). The error I get when I try to run the full dataset is status 253 so I think this is an issue with memory?

From the other threads in this group, I've specified the strategy="gaussian", int.strategy = "eb" and added starting values for hyperparameters based on the results from the model run with the subset of data. Would you have any other suggestions on how to optimise this model for a large dataset?

Thanks again, Kimberly

INLA help

unread,

Nov 8, 2017, 12:11:38 AM11/8/17

to Kimberly, R-inla discussion group

You can additionally try to run with fewer threads, like num.threads=2. That will use less memory. Also use verbose=TRUE, and you’ll see what is going on.

H

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.

To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.

Kimberly

unread,

Nov 8, 2017, 12:18:31 AM11/8/17

to R-inla discussion group

Thanks very much - I've just tried your suggestions and I'm still getting the same error - it looks like it's crashing at this point:

Strategy: Use the Gaussian approximation

Fast mode: On

Use linear approximation to log(|Q +c|)? Yes

Method: Compute the derivative exact

Parameters for improved approximations

Number of points evaluate: 9

Step length to compute derivatives numerically: 0.000100002

Stencil to compute derivatives numerically: 5

Cutoff value to construct local neigborhood: 0.0001

Log calculations: On

Log calculated marginal for the hyperparameters: On

Integration strategy: Use only the modal configuration (EMPIRICAL_BAYES)

f0 (CCD only): 1.100000

dz (GRID only): 0.750000

Adjust weights (GRID only): On

Difference in log-density limit (GRID only): 6.000000

Skip configurations with (presumed) small density (GRID only): On

Gradient is computed using Central difference ..done

Error in inla.inlaprogram.has.crashed() :

The inla-program exited with an error. Unless you interupted it yourself, please rerun with verbose=TRUE and check the output carefully.

If this does not help, please contact the developers at <he...@r-inla.org>.

In addition: Warning message:

running command had status 253

On Wednesday, 8 November 2017 13:11:38 UTC+8, help help wrote:

You can additionally try to run with fewer threads, like num.threads=2. That will use less memory. Also use verbose=TRUE, and you’ll see what is going on.

H

On 8 Nov 2017, at 06:50, Kimberly <kfor...@gmail.com> wrote:

Thanks so much for pointing this out to me - I had missed this example in the book and it was very useful.

I've managed to run the full hurdle model with part of my dataset but am still having trouble running with the full dataset (~ 400,000 observations). The error I get when I try to run the full dataset is status 253 so I think this is an issue with memory?

From the other threads in this group, I've specified the strategy="gaussian", int.strategy = "eb" and added starting values for hyperparameters based on the results from the model run with the subset of data. Would you have any other suggestions on how to optimise this model for a large dataset?

Thanks again, Kimberly

On Thursday, 26 October 2017 19:14:18 UTC+8, Elias T. Krainski wrote:
The Bernoulli, Gamma example is now in the Chapter 8 of the INLA book

https://sites.google.com/a/r-inla.org/stbook/

Elias

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.

To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion-group+unsub...@googlegroups.com.

Helpdesk

unread,

Nov 8, 2017, 12:48:31 AM11/8/17

to Kimberly, R-inla discussion group

On Tue, 2017-11-07 at 21:18 -0800, Kimberly wrote:
> Thanks very much - I've just tried your suggestions and I'm still
> getting the same error - it looks like it's crashing at this point:
>
>

you have access to a proper machine to run this on?

Kimberly

unread,

Nov 8, 2017, 12:52:12 AM11/8/17

to R-inla discussion group

I'm running it on a machine with 4 cores and 32GB RAM but maybe should look into other options if you think it's just an issue with the hardware. Thanks again

Helpdesk

unread,

Nov 8, 2017, 12:57:36 AM11/8/17

to Kimberly, R-inla discussion group

On Tue, 2017-11-07 at 21:52 -0800, Kimberly wrote:
> I'm running it on a machine with 4 cores and 32GB RAM but maybe
> should look into other options if you think it's just an issue with
> the hardware. Thanks again

yes, you need to do this. let us know how this goes.

Reply all

Reply to author

Forward