Optimal hardware for large INLA analysis

696 views
Skip to first unread message

Colin Beale

unread,
Jun 23, 2017, 5:22:26 AM6/23/17
to R-inla discussion group
Hi everyone,

I've a spatial besag model I'm running on a largish dataset (120k rows). On my desktop I can run the model with a poisson family in about 24hrs, not a problem. I'm just trying exactly the same models but with a zeroinflatedpoisson0 family, and it has so far run for 2 weeks with no sign of completion... I have access to a range of high performance research computing platforms, but I'm not really sure what the optimal set up would be, because I'#m not quite sure why it is so much slower. On my desktop the inla process is using 10.5GB of memory. That's not all that I have available (16GB), but it might be close to some working limit. I have access to significantly more on the HPC platform. Does anyone have any insight if allocating more memory might reduce the processing time? If it is unlikely, my job on the HPC would be killed after 2 weeks and I'm already running well on that level of processing. The alternative option would be to run on a GPU processor, but might involve a slightly trickier setup. So I'm wondering if anyone has any useful insights? Even why the zeroinflatedpoisson0 family takes so much longer to fit seems worthwhile understanding!

Thanks,
Colin

Finn Lindgren

unread,
Jun 23, 2017, 5:58:41 AM6/23/17
to Colin Beale, R-inla discussion group
Did you run with verbose=TRUE so you could see if it's was each step in the optimization that was slow and/or it used many steps?
(Run inside a "screen" terminal to not have to have a session open all the time)

Finn
--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To post to this group, send email to r-inla-disc...@googlegroups.com.
Visit this group at https://groups.google.com/group/r-inla-discussion-group.
For more options, visit https://groups.google.com/d/optout.

Haakon Bakka

unread,
Jun 23, 2017, 7:09:27 AM6/23/17
to Finn Lindgren, Colin Beale, R-inla discussion group
For why zeroinflated takes longer: it might be because you have an additional hyper-parameter, the zero-inflation probability. To check this, fix that parameter to some value and compare that run time to the poisson run time.

In general: You want to use good starting values for the optimizer, for example from the poisson result or running on a reduced dataset.

For your specific likelihood, zeroinflatedpoisson0, I think you can remove all the zeros from your dataset, and then just do a separate "ad-hoc" estimate of the zero-probability. 

Kind regards,
Haakon Bakka


On 23 June 2017 at 11:58, Finn Lindgren <finn.l...@gmail.com> wrote:
Did you run with verbose=TRUE so you could see if it's was each step in the optimization that was slow and/or it used many steps?
(Run inside a "screen" terminal to not have to have a session open all the time)

Finn

On 23 Jun 2017, at 10:22, Colin Beale <colin...@york.ac.uk> wrote:

Hi everyone,

I've a spatial besag model I'm running on a largish dataset (120k rows). On my desktop I can run the model with a poisson family in about 24hrs, not a problem. I'm just trying exactly the same models but with a zeroinflatedpoisson0 family, and it has so far run for 2 weeks with no sign of completion... I have access to a range of high performance research computing platforms, but I'm not really sure what the optimal set up would be, because I'#m not quite sure why it is so much slower. On my desktop the inla process is using 10.5GB of memory. That's not all that I have available (16GB), but it might be close to some working limit. I have access to significantly more on the HPC platform. Does anyone have any insight if allocating more memory might reduce the processing time? If it is unlikely, my job on the HPC would be killed after 2 weeks and I'm already running well on that level of processing. The alternative option would be to run on a GPU processor, but might involve a slightly trickier setup. So I'm wondering if anyone has any useful insights? Even why the zeroinflatedpoisson0 family takes so much longer to fit seems worthwhile understanding!

Thanks,
Colin

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion-group+unsub...@googlegroups.com.
To post to this group, send email to r-inla-discussion-group@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion-group+unsub...@googlegroups.com.
To post to this group, send email to r-inla-discussion-group@googlegroups.com.

Colin Beale

unread,
Jun 23, 2017, 8:50:00 AM6/23/17
to R-inla discussion group, colin...@york.ac.uk
Thanks for this suggestion. I've run it for a couple of hours like with verbose, and I'm seeing the sort of output below. It looks like each iteration isn't too slow, but (a) there are lots and (b) it is tripping over sometimes, which probably further slows it down. It may well be as Haakon comments that providing intelligent starting values would solve it, but I'm not entirely certain how to do that... My model looks like this:

 interp.modSnr.final <- inla(snare ~  offset(log(trans.length)) + tc.s + tc.s2 +  
                      villages.s +  sampleyear + ranger.zone + rivers.s + rangers.s +
                      f(ID, model="besag", graph="adj.txt",
                        hyper = list(prec = list(prior = "loggamma",
                                                 param = c(0.1, 1), initial = 0.01))),
                    data=all.data,  family = "zeroinflatedpoisson0",
                    control.predictor = list(compute = TRUE, link = "log"),
                    control.compute = list(dic = TRUE, cpo = TRUE), verbose = TRUE)




max.logdens= -184568.024812 fn= 5 theta= -0.000012 0.005021  range=[-0.107 4.367]

file: smtp-taucs.c  hgid: 7e45fc2b7a2f  date: Tue Jan 31 09:14:12 2017 +0300
Function: GMRFLib_factorise_sparse_matrix_TAUCS(), Line: 829, Thread: 1
Fail to factorize Q. I will try to fix it...


file: smtp-taucs.c  hgid: 7e45fc2b7a2f  date: Tue Jan 31 09:14:12 2017 +0300
Function: GMRFLib_factorise_sparse_matrix_TAUCS(), Line: 829, Thread: 1
Fail to factorize Q. I will try to fix it...


file: smtp-taucs.c  hgid: 7e45fc2b7a2f  date: Tue Jan 31 09:14:12 2017 +0300
Function: GMRFLib_factorise_sparse_matrix_TAUCS(), Line: 829, Thread: 1
Fail to factorize Q. I will try to fix it...

max.logdens= -184567.968301 fn= 7 theta= -0.000012 -0.004979  range=[-0.107 4.370]
max.logdens= -184557.584171 fn= 8 theta= 0.009988 0.005021  range=[-0.107 4.367]

file: smtp-taucs.c  hgid: 7e45fc2b7a2f  date: Tue Jan 31 09:14:12 2017 +0300
Function: GMRFLib_factorise_sparse_matrix_TAUCS(), Line: 829, Thread: 0
Fail to factorize Q. I will try to fix it...


file: smtp-taucs.c  hgid: 7e45fc2b7a2f  date: Tue Jan 31 09:14:12 2017 +0300
Function: GMRFLib_factorise_sparse_matrix_TAUCS(), Line: 829, Thread: 2
Fail to factorize Q. I will try to fix it...

max.logdens= -183577.978482 fn= 10 theta= 1.527780 -0.002586  range=[-0.107 4.369]

file: smtp-taucs.c  hgid: 7e45fc2b7a2f  date: Tue Jan 31 09:14:12 2017 +0300
Function: GMRFLib_factorise_sparse_matrix_TAUCS(), Line: 829, Thread: 0
Fail to factorize Q. I will try to fix it...


file: smtp-taucs.c  hgid: 7e45fc2b7a2f  date: Tue Jan 31 09:14:12 2017 +0300
Function: GMRFLib_factorise_sparse_matrix_TAUCS(), Line: 829, Thread: 2
Fail to factorize Q. I will try to fix it...


file: smtp-taucs.c  hgid: 7e45fc2b7a2f  date: Tue Jan 31 09:14:12 2017 +0300
Function: GMRFLib_factorise_sparse_matrix_TAUCS(), Line: 829, Thread: 2
Fail to factorize Q. I will try to fix it...

max.logdens= -183574.859875 fn= 11 theta= 1.537780 -0.002586  range=[-0.107 4.369]


Finn Lindgren

unread,
Jun 23, 2017, 9:01:17 AM6/23/17
to Colin Beale, R-inla discussion group
Yes, that looks like it would be helped by improved starting values.
Unfortunately I haven't worked with ZIP models, so not sure how how to do that either.

Finn

Haakon Bakka

unread,
Jun 23, 2017, 9:49:44 AM6/23/17
to Finn Lindgren, Colin Beale, R-inla discussion group
Setting the starting values is slightly complicated, as you must figure out which internal hyperparameter corresponds to what "interpretable" hyper-parameter.

You can read from the logfile which hyperparameter is which, and from the documentation how to transform them. I would guess the first hyper is the zero-probability. Internally, it uses a logit transform, see http://www.math.ntnu.no/inla/r-inla.org/doc/likelihood/zeroinflated.pdf.

The code for doing it can be found in many scripts on spatial models, for example 
the line 
"M[[1]]$init"
and  the 
"inla("
function call.

PS. Note also the use of "control.inla= list(int.strategy = "eb")" to speed up the last stage of the computation. This is an approximation.

Kind regards,
Haakon Bakka


To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion-group+unsub...@googlegroups.com.
To post to this group, send email to r-inla-discussion-group@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion-group+unsub...@googlegroups.com.
To post to this group, send email to r-inla-discussion-group@googlegroups.com.

INLA help

unread,
Jun 23, 2017, 8:20:18 PM6/23/17
to Colin Beale, R-inla discussion group
On Fri, 2017-06-23 at 05:50 -0700, Colin Beale wrote:
>
> file: smtp-taucs.c  hgid: 7e45fc2b7a2f  date: Tue Jan 31
> 09:14:12 2017 +0300
> Function: GMRFLib_factorise_sparse_matrix_TAUCS(), Line: 829,
> Thread: 1
> Fail to factorize Q. I will try to fix it...
>

yes, a steady flow of these will take time... (it might be as well
that it that there is some memory leak (by purpose) each time this
happens, as this should not happen).

the precision matrix is numerical singular, meaning that there is
something in your model that do that. try to remove the besag model.
if that goes fine (ie, these warnings does not come), we know that its
something with that one. assume it is, then add scale.model=TRUE, and a
proper prior for the intercept, like
control.fixed=list(prec.intercept=1) or something, and let us know how
this goes

Best
H


--
Håvard Rue
he...@r-inla.org

Colin Beale

unread,
Jul 7, 2017, 11:38:59 AM7/7/17
to R-inla discussion group, colin...@york.ac.uk, he...@r-inla.org
So, I've been (slowly) following up on these suggestions. I managed to kill the singular precision matrix OK - thanks for point that out. I'm still struggling with implementing appropriate initial values, but am having a bi more luck with the manual creation of the likelihood. Once I've solved everything I will report back!

Colin Beale

unread,
Jul 20, 2017, 6:59:44 AM7/20/17
to R-inla discussion group, colin...@york.ac.uk, he...@r-inla.org
Just to report back. Running the single large model on my 8 core, 16GB desktop took nearly 2 weeks. Running a non-optimised INLA on a 24 core, 128 GB server, INLA made use of 101GB of memory, running on all cores and did the job in 23 hrs. By splitting the model into a Poisson model and a binomial model then combining the outputs I could run on my desktop in 4 days - but have the disadvantage that 95% confidence intervals don't combine (obviously the joint probability of being at the 95% upper end of the Poisson model AND the 95% upper end of the binomial are not the combined model 95% quantile), which complicates things. In this example, (and as expected) the single zip model and the two models combined were more or less equivalent.

Thanks for assistance - notching this up as a success!

Colin

PS In case anyone keeps note, a couple of recent papers using INLA from my colleagues:
Reply all
Reply to author
Forward
0 new messages