Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Running multiple copies of inla in parallel

28 views
Skip to first unread message

Hannes Kuehnert

unread,
Nov 20, 2024, 3:51:01 AM11/20/24
to R-inla discussion group
Dear all,

I'm trying to run multiple copies of inla in parallel and for some data sets everything is working fine, but for some it is not.
Let me describe what I'm trying to do and then I share a light on the problem and below I share my implementation.
In before, everything is running until some point at which the routine seems to be stuck.

Plan: I have 2^15 variable combinations and I want to have results on all of those different models. For that I create a list with all the possible variable combinations.
 I'm partitioning the long list into sub lists of a desired length ( this is done by an outer for loop). In the inner foreach loop I run a suitable number of models in parallel. Note that I have 96 cores available and my memory is around 500GB. So there is not a real issue here. I tried for different numbers of num.threads for inla, but this seems to be not the problem. Right now this is set to 1, i.e, num.threads =1. This is not solving the problem as well.

The Problem: Everything is running there is no issue with the CPU nor the Memory capacity being used. Every sub list takes some time but then suddenly you can tell that something is not working anymore since it is taking much longer as the previous sublist. Opening the task manager reveals that there is only one copy of inla open any more and the memory in that inla copy is piling up. Usually it is around 100-400, put if the process is not manually ended then it keeps piling up. Ending the process manually yields then the following message :

 *** inla.core.safe:  inla.program has crashed: rerun to get better initial values. try=1/1

*** inla.core.safe:  rerun with improved initial values

I have the feeling that inla is not finishing for some models an is stuck in a loop.

Maybe it is that. I would appreciate if there is a solution within R. Oven ending this task would me an option and then having an error in my results which states that I should check this model more carefully.

Any help is appreciated

Hannes

Code:

library(foreach)
library(future)
library(doFuture)
library(doRNG)
doFuture::registerDoFuture()
doRNG::registerDoRNG()

plan(multisession, workers = 32)

for (i in 473:n.dic) {# used from 427
  top = dic.model.list[((i - 1) * split.dic + 1):(i * split.dic)]
  tic()
 
  result = foreach(iter = top, .packages = c("INLA", "sf", "dplyr")) %dopar% {
    tryCatch({
      cat("Starting iteration:", iter, "\n")
      inla_output = inla_cv_bin_MM3(data_train = data_train_z_scaled,
                                    coords_train = coords_cv1,
                                    cov = iter)
      gc()
      list(values = data.frame(CPO = -mean(log(inla_output$fit$cpo$cpo)),
                               N.Failures = sum(inla_output$fit$cpo$failure),
                               WAIC = inla_output$fit$waic$waic,
                               DIC = inla_output$fit$dic$dic,
                               cov = paste0(iter, collapse = ","),
                               N.cov = length(iter),
                               N.cov.sig = sum(inla_output$fit$summary.fixed[, 3] > 0 |
                                                 inla_output$fit$summary.fixed[, 5] < 0)),
           fixed = inla_output$fit$summary.fixed,
           hyper = inla_output$fit$summary.hyperpar)
    }, error = function(e) {
      cat("Error in iteration:", iter, "Message:", e$message, "\n")
      NULL
    })
  }
 
  saveRDS(result, paste0("C:/Users/results.",paste0(i),".rds"))
  toc()
  print(i)
}


INLA help

unread,
Nov 20, 2024, 4:08:55 AM11/20/24
to R-inla discussion group, Hannes Kuehnert
Can u share to help@… so I can rerun it here?

Haavard Rue
--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/r-inla-discussion-group/262e71ca-1d46-49c4-a657-740a4630a415n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages