failure on using doParallel and INLA on AWS

175 views

Skip to first unread message

lisa C

unread,

Sep 27, 2018, 6:29:26 AM9/27/18

to R-inla discussion group

Dear all,

I am trying to calculate the posterior (prediction) for a very large data set, here are a few technically questions I encountered, and hope experts here can help me to make my thoughts clear.

To calculate more efficiently, I used parellel computing from doParallel package, and here are the abstract codes:

# 1. first I seperated my data into 10 subsets, and for each subset (main reason is due to the high memory cost), I do parellel computing, as below:

library(doParallel)

ncores <- 60

#cl <- makeCluster(ncores, type="PSOCK")

registerDoParallel(ncores)

foreach(i = (1:ncores),.combine=rbind) %dopar% {

#. 2.1 take 1/60 of the subset

......

#. 2.2 simulate posterior distribution

N_sim <- 1000

set.seed(320)

sim <- inla.posterior.sample(n=N_sim, result=myfit)

#. 2.3 use the simulated parameters to simulate posterior distribution

#. 2.4 write to result

res <- lapply(sim, FUN=myfun, n=n, n1=n1, n_w=n_w, Xm=Xm, Am=Am)

res <- do.call(cbind, res)

#. 2.5 save result

}

my questions:

1. first, I runed the script on our institute super machine (300g memory, 70 cores). For a gaussian likelihood model (myfit), I was able to run the above code under setting (10 subset + 60 cores per subset,+lapply function), however, when I use binomial likelihood model (myfit), The memory usage becomes extremely large (why?), so I have to modify the lapply into a loop where in each loop I delete non-used A.pridiction part, and seperate the data into further smaller part with fewer cores running (so 20 subset + 10 cores per subset + forloop). Does anyone have experience with such technical issues? any suggestions and comments?

2. Afterwards, I tried to run my scrips on an AWS machine (380G memory, 96 cores), which has a higher linux version than our in-house machine. However, there are some linux components missing, so I have to use INLA:::inla.dynload.workaround() to get it working. Unfortunately, I do not get consistent performance from it. For some runs, I was able to get the result, but seems slower than our in house machine. In some cases, I have got error as below. Seems that it was mainly having issue with fmesher... The fact is that I was able to run the same script on our in-house machine without error. I am wondering what could be the problem? INLA, doParallel, or the in-complete linux on AWS?

Message : Condition `1==0' is not TRUE

Function : GMRFLib_write_fmesher_file

File : fmesher-io.c

Function : GMRFLib_write_fmesher_file

File : fmesher-io.c

Line : 466

File : fmesher-io.c

Line : 466

File : fmesher-io.c

Line : 466

RCSId : file: fmesher-io.c hgid: b84cb08f11c9 date: Thu Jul 12 13:38:25 2018 +0300

GMRFLib version 3.0-0-snapshot, has recived error no [23]

Reason : Misc error

Message : Condition `1==0' is not TRUE

Function : GMRFLib_write_fmesher_file

File : fmesher-io.c

Line : 466

RCSId : file: fmesher-io.c hgid: b84cb08f11c9 date: Thu Jul 12 13:38:25 2018 +0300

Thanks very much

Chun

Helpdesk

unread,

Sep 27, 2018, 7:42:21 AM9/27/18

to lisa C, R-inla discussion group

Hi & thanks for your email.

some issues

- since you're running in parallel on a higher level, I would control
the num.threads within each run, like

library(INLA)
inla.setOption(num.threads=1)

if you run 60 parallel's then each one should run in serial, for
example.

- the fmesher isssue. it can be an issue with temp-directory, which is
kind of system-controlled. I would try to create and use your own, like

$ mkdir -p ~/tmp/inla.tmp

then in R

inla.setOption(working.directory=normalizePath("~/tmp/inla.tmp")

if you have issues with the buildt-in version, there are other Linux
builds at

http://inla.r-inla-download.org/Linux-builds/

I would guess the CentOS ones would be fine.

Let us know.

Best
H

The pardiso-library, would probably also offer some speed, see

inla.pardiso()

and then use the version linked with MKL

inla.setOption(inla.call=paste0(dirname(INLA:::inla.call.builtin()),
"/inla.mkl.run"))

you have to try to check this out.

> --
> You received this message because you are subscribed to the Google
> Groups "R-inla discussion group" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to r-inla-discussion...@googlegroups.com
> .
> To post to this group, send email to
> r-inla-disc...@googlegroups.com.
> Visit this group at
> https://groups.google.com/group/r-inla-discussion-group.
> For more options, visit https://groups.google.com/d/optout.

--
Håvard Rue
Helpdesk
he...@r-inla.org

Reply all

Reply to author

Forward

0 new messages