Ntrials in SPDE model

391 views
Skip to first unread message

Helen Sofaer

unread,
Sep 5, 2013, 1:24:08 PM9/5/13
to r-inla-disc...@googlegroups.com

Dear INLA team,

I have a question about how to combine the SPDE approach with a binomial model of proportion data (# successes and # of trials).

The non-spatial model works fine, where I specify the number of trials using Ntrials in the inla() call – I’ve simplified my variable names to make them more obvious:

nonspatial_formula = NumSucc ~ cov1 + cov2

mod.nonspatial = inla(nonspatial_formula, data = PondData, family = "binomial", Ntrials = NumTrials)

I’ve been trying to specify Ntrials in the INLA call of an SPDE model, with the number of trials included in my data stack:

stack.est = inla.stack(data = list(NumSucc = PondData_e$NumSucc),

  A = list(A.est, 1),

  effects = list(c(ind1, list(Intercept = 1)),

     list(PondData_e[, c("cov1", "cov2", "NumTrials")])), tag = "est")

stack.val = inla.stack(data = list(NumSucc = NA),

  A = list(A.val, 1),

  effects = list(c(ind1, list(Intercept = 1)),

    list(PondData_v[, c("cov1", "cov2", "NumTrials")])), tag = "val")

stack.two = inla.stack(stack.est, stack.val)

 

formula_spatial = NumSucc ~ -1 + Intercept + cov1 + cov2 + f(i, model = spde)

 

spde1 = inla.spde2.matern(m2, alpha = 2)

data_propwet = inla.stack.data(stack.two, spde = spde1)

result_propwet = inla(formula_spatial,

      data = data_propwet,

      family = "binomial",

      Ntrials = data_propwet$NumTrials,

      control.predictor = list(A = inla.stack.A(stack.two), compute = TRUE),

      control.compute = list(dic = TRUE)

      )

 

The error I get is:

Error in data.frame(..., check.names = FALSE) :

  arguments imply differing number of rows: 20045, 35655

 

My data have 20045 rows; 35655 is the number in the stack after NAs are added. The bigger number comes from the Ntrials piece (if I mess with it and say, for example, Ntrials = data_propwet$NumTrials[15000:16000], then the second number becomes however many rows I’ve specified there).

Because I don’t fully understand the stack functionality, I want to be sure I don’t specify the wrong rows for the weights, since that would make my analysis nonsensical.

 

One potential solution is to specify the number of trials outside the stack by doing:

Ntrials = c(PondData_e$NumTrials, PondData_v$NumTrials)

The model does run this way, but I want to be sure that I’m not making the data line up in the wrong way. Is this correct?

 

Also, I have one other question:

I have spatiotemporal data and I have been comparing a model with a separable spatial and AR1 structure (i.e. f(i, model = spde) + f(i.group, model = "ar1")) with a spatial model with lagged covariates which make biological sense (so the second model doesn’t have an AR1 component). In all the models, I’ve been setting up A as though I will run the AR1 structure (here Yr_group is 1 for the first year of data, and so on):

 

A.est = inla.spde.make.A(m2, loc = as.matrix(PondData_e[, c("POINT_X", "POINT_Y")]), group = PondData_e$Yr_group, n.group = n_years)

A.val = inla.spde.make.A(m2, loc = as.matrix(PondData_v[, c("POINT_X", "POINT_Y")]), group = PondData_v$Yr_group, n.group = n_years)

ind1 = inla.spde.make.index(name = 'i', n.spde = m2$n, n.group = n_years)

 

Is this ok to retain in the model without the AR1 piece, or should I create A without the groups?

 

Thank you for all the work you have put into creating this program, and thanks in advance for taking the time to answer my questions.

Helen


Finn Lindgren

unread,
Sep 5, 2013, 1:53:15 PM9/5/13
to Helen Sofaer, r-inla-disc...@googlegroups.com
The Ntrials information should go in the data list, not the effects list, in the inla.stack calls.
Finn

Sent from my iP
--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To post to this group, send an email to r-inla-disc...@googlegroups.com.
Visit this group at http://groups.google.com/group/r-inla-discussion-group.
For more options, visit https://groups.google.com/groups/opt_out.

Finn Lindgren

unread,
Sep 5, 2013, 3:38:54 PM9/5/13
to Helen Sofaer, r-inla-disc...@googlegroups.com
A more complete reply than my previous one-liner:

On 5 September 2013 18:24, Helen Sofaer <hso...@gmail.com> wrote:
> stack.est = inla.stack(data = list(NumSucc = PondData_e$NumSucc),
> A = list(A.est, 1),
> effects = list(c(ind1, list(Intercept = 1)),
> list(PondData_e[, c("cov1", "cov2", "NumTrials")])), tag = "est")
<cut>
> Ntrials = data_propwet$NumTrials,

The Ntrials information should go in the data list, not the effects
list, in the inla.stack calls,
i.e. the first line of the first inla.stack() call should be
stack.est = inla.stack(data = list(NumSucc = PondData_e$NumSucc,
Ntrials=PondData_e$NumTrials),
and in the inla() call you should then use
Ntrials= data_propwet$Ntrials

> Because I don’t fully understand the stack functionality, I want to be sure
> I don’t specify the wrong rows for the weights, since that would make my
> analysis nonsensical.

There is an example using inla.stack and Ntrials in Section 3.1 of
this tutorial:
http://www.r-inla.org/examples/tutorials/spde-tutorial-from-jss

> Ntrials = c(PondData_e$NumTrials, PondData_v$NumTrials)
>
> The model does run this way, but I want to be sure that I’m not making the
> data line up in the wrong way. Is this correct?

In this case, yes, this should give you the right thing, but the other
way described keeps everything together in the stacks, making it
easier to use only one of the stacks when calling inla(), for example.

> I have spatiotemporal data and I have been comparing a model with a
> separable spatial and AR1 structure (i.e. f(i, model = spde) + f(i.group,
> model = "ar1")) with a spatial model with lagged covariates which make
> biological sense (so the second model doesn’t have an AR1 component). In all
> the models, I’ve been setting up A as though I will run the AR1 structure
> (here Yr_group is 1 for the first year of data, and so on):

The A matrix you use in the model that doesn't use a group/AR1 model
must be setup with n.group=1.
If the alternative model is simply independent replicates for each
year instead of AR1, then you need to replace "group" with "repl" in
the inla.spde.make.A() call, and specify f(..., replicates=i.repl)
instead of f(group=i.group) in the formula.

For example, the A.est of the AR1 model,

> A.est = inla.spde.make.A(m2, loc = as.matrix(PondData_e[, c("POINT_X",
> "POINT_Y")]), group = PondData_e$Yr_group, n.group = n_years)

turns into this for independent replicates:

A.est = inla.spde.make.A(m2, loc = as.matrix(PondData_e[, c("POINT_X",
"POINT_Y")]), repl = PondData_e$Yr_group, n.repl = n_years)

If the alternative model is based on a single replicate of the spatial
effect then you need both n.repl=1 and n.group=1.

In essence, the number of columns in the A-matrix must be equal to
n.spde*n.repl*n.group, where the n.* values match whatever you specify
in the f()-formula and in inla.spde.make.index().

Finn

Finn Lindgren

unread,
Sep 5, 2013, 3:40:39 PM9/5/13
to Helen Sofaer, r-inla-disc...@googlegroups.com
On 5 September 2013 20:38, Finn Lindgren <finn.l...@gmail.com> wrote:
> the inla.spde.make.A() call, and specify f(..., replicates=i.repl)

Correction:
It should be f(..., replicate=i.repl)

Finn

Helen Sofaer

unread,
Sep 5, 2013, 4:03:16 PM9/5/13
to Finn Lindgren, r-inla-disc...@googlegroups.com
Thank you - that is very helpful.
I really appreciate the detailed answer.

Helen
Reply all
Reply to author
Forward
0 new messages