Re: [r-inla] INLA crashing with spatial effects

313 views
Skip to first unread message

Håvard Rue

unread,
May 17, 2014, 2:32:45 AM5/17/14
to Víctor Hugo Gutiérrez Vélez, r-inla-disc...@googlegroups.com
On Fri, 2014-05-16 at 20:12 -0700, Víctor Hugo Gutiérrez Vélez wrote:
> Hello,
>
> I am running the model below as a function of two variables and
> structured and unstructured spatial effects. The model consists of a
> sub-sample of about 352.000 observations. Observations are repeated
> measures of around 44000 zones.
>
> function=a + b + f(id, model="bym", graph=g).
>
> cheap.mod= inla(formula, data=DataSubset,family="binomial,
> verbose=TRUE,
> control.fixed=list(prec.intercept=0.1,
> expand.factor.strategy='inla'),
> control.inla=list(diagonal = 10000,
> strategy="gaussian", int.strategy="eb"),
> control.predictor = list(compute=TRUE),
> control.compute=list(dic=TRUE, cpo=TRUE))
>
> I want to start with a very simple specification using the cheap mode
> in INLA so I can get quick rough results from different model
> specifications before running a definite model. However I am starting
> to run into the errors that appear at the end of this message.

Hi,

your problem is that you graph has many disconnected subgraphs, 6696, I
guess, and that this is transferred into a linear constraint for each of
those, and this require to much storage and the program crash.

try to add f(id,..., adjust.for.con.comp=FALSE), remove diagonal=10000,
and let me know if this goes through. (you have a huge model...).



> 2. Eventually I am hoping to run a way more complex model given the
> nature of my data. The model would include nested random effects of
> zones within regions, temporal effects and possibly non-linear
> relationships with one of the covariates. I also would like to
> evetually expand the model to the whole dataset (1'200.000
> observations, 150,000 zones). Am I being naive by hoping that I can
> run such a large model? I am working on an imac with 32 Gb memory. The
> function would be something like:
>
> function=f(a, model="rw2") + b + f(id, model="bym", graph=g,
> group=regions, control.group=list(model="iid")) + f(year.struct,
> model= "ar1") + f(year, model= "iid").

the main problem here would be the size. is there any way to try to
aggregate some of the data?

Best
H


--
Håvard Rue
Department of Mathematical Sciences
Norwegian University of Science and Technology
N-7491 Trondheim, Norway
Voice: +47-7359-3533 URL : http://www.math.ntnu.no/~hrue
Mobile: +47-9260-0021 Email: havar...@math.ntnu.no

R-INLA: www.r-inla.org

Elias T Krainski

unread,
May 17, 2014, 2:52:02 AM5/17/14
to r-inla-disc...@googlegroups.com
I'm wondering if: is 'id' nested into 'regions'?

Víctor Hugo Gutiérrez Vélez

unread,
May 17, 2014, 11:38:37 AM5/17/14
to r-inla-disc...@googlegroups.com, Víctor Hugo Gutiérrez Vélez, hr...@math.ntnu.no

Thanks Håvard and Elias for your useful replies.

I modified the model the way Håvard suggested and it happily goes through!

Regarding the suggestion by Håvard, the data used for the analysis comes from spatial gridded information and I am considering each cell as a zone_id. The only way I can think of aggregating the data would be to consolidate the information in cells in larger cell sizes (from 1 km to 2 or 4 km for example). However that would sacrifice variability so I would try that if that was my ultimate resource. Is there any alternative?

To answer Elias question, cells representing zone_ids are contained within administrative units that I call regions  so  yes, I am hoping to nest the zones within regions. Would nesting help somehow dealing with a large model when I eventually run the definite one with the full dataset? Here is the formula I would like to eventually run (including the suggestion by Håvard):

formula = z~ f(x, model="rw2") + y + f(zone_id_sub, model="bym", graph=g, group=region, control.group=list(model="iid", adjust.for.con.comp=FALSE) + f(year.struct, model= "ar1") +   f(year, model= "iid").

Thanks for all your help.

Victor.

Elias T Krainski

unread,
May 17, 2014, 1:15:04 PM5/17/14
to r-inla-disc...@googlegroups.com

On 17/05/14 17:38, Víctor Hugo Gutiérrez Vélez wrote:
>
> To answer Elias question, cells representing zone_ids are contained
> within administrative units that I call regions so yes, I am hoping
> to nest the zones within regions. Would nesting help somehow dealing
> with a large model when I eventually run the definite one with the
> full dataset?

I'm not sure what you want combining 'bym' and group in this way. When
you have
i j
1 1
2 1
1 2
2 2
by f('i', ..., group='j', ...) it means that effects with same index 'i'
are correlated for different values of 'j', in accord to the model
('iid' is not valid) in control.group(). You can see the specification
and examples in
https://www.math.uzh.ch/fileadmin/math/events/bilgm11/andreaRiebler.pdf

Víctor Hugo Gutiérrez Vélez

unread,
May 17, 2014, 10:37:10 PM5/17/14
to r-inla-disc...@googlegroups.com
Thank you Elias. You are right. I changed the specification of the nested effects for the one below.

f(zone_id_sub, model="bym", graph=g, adjust.for.con.comp=FALSE, replicate=region)

Now the main challenge is how to deal with such a large dataset. I was wondering if running several iterations with random subsets would be a robust approach or if it's better to aggregate cells in coarser units to reduce the number of zones and data entries at the expense of spatial variability.

Thanks again,

Victor.

INLA help

unread,
May 18, 2014, 3:27:14 AM5/18/14
to Víctor Hugo Gutiérrez Vélez, r-inla-disc...@googlegroups.com
On Sat, 2014-05-17 at 19:37 -0700, Víctor Hugo Gutiérrez Vélez wrote:
> Thank you Elias. You are right. I changed the specification of the
> nested effects for the one below.
>
> f(zone_id_sub, model="bym", graph=g, adjust.for.con.comp=FALSE,
> replicate=region)
>
> Now the main challenge is how to deal with such a large dataset. I was
> wondering if running several iterations with random subsets would be a
> robust approach or if it's better to aggregate cells in coarser units
> to reduce the number of zones and data entries at the expense of
> spatial variability.

I would suggest to reduce the spatial resolution and then check how fine
it could be for your (huge) model to still run.

H


--
Håvard Rue
he...@r-inla.org

Reply all
Reply to author
Forward
0 new messages