failure on using doParallel and INLA on AWS

175 views
Skip to first unread message

lisa C

unread,
Sep 27, 2018, 6:29:26 AM9/27/18
to R-inla discussion group
Dear all,

I am trying to calculate the posterior (prediction) for a very large data set, here are a few technically questions I encountered, and hope experts here can help me to make my thoughts clear.

To calculate more efficiently, I used parellel computing from doParallel package, and here are the abstract codes:


# 1. first I seperated my data into 10 subsets, and for each subset (main reason is due to the high memory cost), I do parellel computing, as below:

library(doParallel)
ncores <- 60 
#cl     <- makeCluster(ncores, type="PSOCK")
registerDoParallel(ncores)


 foreach(i = (1:ncores),.combine=rbind) %dopar% { 

#. 2.1 take 1/60 of the subset

......
#. 2.2 simulate posterior distribution

  N_sim <- 1000
  set.seed(320)
  sim <- inla.posterior.sample(n=N_sim, result=myfit)
#. 2.3 use the simulated parameters to simulate posterior distribution
#. 2.4 write to result
  res             <- lapply(sim, FUN=myfun, n=n, n1=n1, n_w=n_w, Xm=Xm, Am=Am)
  res             <- do.call(cbind, res)

#. 2.5 save result
}

my questions:

1. first, I runed the script on our institute super machine (300g memory, 70 cores). For a gaussian likelihood model (myfit), I was able to run the above code under setting (10 subset + 60 cores per subset,+lapply function), however, when I use binomial likelihood model (myfit), The memory usage becomes extremely large (why?), so I have to modify the lapply into a loop where in each loop I delete non-used A.pridiction part, and seperate the data into further smaller part with fewer cores running (so 20 subset + 10 cores per subset + forloop). Does anyone have experience with such technical issues? any suggestions and comments?

2. Afterwards, I tried to run my scrips on an AWS machine (380G memory, 96 cores), which has a higher linux version than our in-house machine. However, there are some linux components missing, so I have to use INLA:::inla.dynload.workaround() to get it working. Unfortunately, I do not get consistent performance from it. For some runs, I was able to get the result, but seems slower than our in house machine. In some cases, I have got error as below. Seems that it was mainly having issue with fmesher... The fact is that I was able to run the same script on our in-house machine without error. I am wondering what could be the problem?  INLA, doParallel, or the in-complete linux on AWS?


 Message   : Condition `1==0' is not TRUE
        Message   : Condition `1==0' is not TRUE
        Message   : Condition `1==0' is not TRUE
        Message   : Condition `1==0' is not TRUE
        Message   : Condition `1==0' is not TRUE
        Message   : Condition `1==0' is not TRUE
        Function  : GMRFLib_write_fmesher_file
        File      : fmesher-io.c
        File      : fmesher-io.c
        Function  : GMRFLib_write_fmesher_file
        Function  : GMRFLib_write_fmesher_file
        Function  : GMRFLib_write_fmesher_file
        Function  : GMRFLib_write_fmesher_file
        Function  : GMRFLib_write_fmesher_file
        Function  : GMRFLib_write_fmesher_file
        File      : fmesher-io.c
        Line      : 466
        Line      : 466
        File      : fmesher-io.c
        File      : fmesher-io.c
        Line      : 466
        File      : fmesher-io.c
        File      : fmesher-io.c
        File      : fmesher-io.c
        File      : fmesher-io.c
        Line      : 466
        RCSId     : file: fmesher-io.c  hgid: b84cb08f11c9  date: Thu Jul 12 13:38:25 2018 +0300

GMRFLib version 3.0-0-snapshot, has recived error no [23]
        Reason    : Misc error
        Message   : Condition `1==0' is not TRUE
        Function  : GMRFLib_write_fmesher_file
        File      : fmesher-io.c
        Line      : 466
        RCSId     : file: fmesher-io.c  hgid: b84cb08f11c9  date: Thu Jul 12 13:38:25 2018 +0300



Thanks very much

Chun

Helpdesk

unread,
Sep 27, 2018, 7:42:21 AM9/27/18
to lisa C, R-inla discussion group
Hi & thanks for your email.

some issues

- since you're running in parallel on a higher level, I would control
the num.threads within each run, like

library(INLA)
inla.setOption(num.threads=1)

if you run 60 parallel's then each one should run in serial, for
example.

- the fmesher isssue. it can be an issue with temp-directory, which is
kind of system-controlled. I would try to create and use your own, like

$ mkdir -p ~/tmp/inla.tmp

then in R

inla.setOption(working.directory=normalizePath("~/tmp/inla.tmp")


if you have issues with the buildt-in version, there are other Linux
builds at

http://inla.r-inla-download.org/Linux-builds/

I would guess the CentOS ones would be fine.

Let us know.

Best
H

The pardiso-library, would probably also offer some speed, see

inla.pardiso()

and then use the version linked with MKL

inla.setOption(inla.call=paste0(dirname(INLA:::inla.call.builtin()),
"/inla.mkl.run"))

you have to try to check this out.
> --
> You received this message because you are subscribed to the Google
> Groups "R-inla discussion group" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to r-inla-discussion...@googlegroups.com
> .
> To post to this group, send email to
> r-inla-disc...@googlegroups.com.
> Visit this group at
> https://groups.google.com/group/r-inla-discussion-group.
> For more options, visit https://groups.google.com/d/optout.

--
Håvard Rue
Helpdesk
he...@r-inla.org

Reply all
Reply to author
Forward
0 new messages