Using wolfalps in parallel with foreach on hpc

31 views
Skip to first unread message

claram...@googlemail.com

unread,
Aug 22, 2019, 10:39:59 AM8/22/19
to SpaDES Users
Hi everyone,
I am now trying to use foreach to compute multiple repetitions of wolfalps simulation in parallel on hpc. It does not work at all in this combination, even though the same thing works perfectly when I do it locally. And wolfalps without foreach works on hpc. Using foreach on hpc works, too, when I leave out wolfalps initialization. If I do the wolfalps initialization and then a very simple foreach with just a print command it does not work. Any ideas why that could be? I am completely confused..
Thanks, Clara

Alex Chubaty

unread,
Aug 22, 2019, 11:04:38 AM8/22/19
to SpaDES Users
Hi Clara,

You probably don't need to do the simInit a bunch of times - do it once, and then run the simulations using foreach, ensuring each one gets a unique random seed.

Alex

Louis-Etienne Robert

unread,
Aug 22, 2019, 11:34:48 AM8/22/19
to SpaDES Users
Hello Clara,

Alex comment is spot on, 

My experience with parallel computing is limited but a possible line of inquiry is how you are handling the .combine argument of foreach. Without looking at your code or your error message it is hard to make a call on what the problem could be. 
Some really good conceptual discussion here: 

Alex Chubaty

unread,
Aug 22, 2019, 12:06:25 PM8/22/19
to spades...@googlegroups.com
I think I do it only once - the simInit is before the foreach, of course. Or what did you mean? Sorry if what I wrote is a bit confusing.. I described several things that I tested individually.. 

You said "Using foreach on hpc works, too, when I leave out wolfalps initialization", which suggests you are referring to the simInit() call for initialization. Can you please clarify? It would greatly help if you could make your code available. It doesn't need to be the full code, just enough to reproduce the problem you're having (see https://stackoverflow.com/q/5963269/1380598) so we can see how you're calling simInit(), foreach, and spades().

Also, please include specific error messages or other descriptions of what you mean when you say "it's not working".

Parallel computing requires that you make copies of all objects/functions available to the additional workers. Depending on the cluster type (FORK or SOCK) you may need to explicitly load packages and export objects to the cluster. Note that you will likely need to export a Copy() of your simList object after creating it using simInit(), because you don't want each worker modifying the same master object (simList objects use pass-by-reference). Note that the experiment() function handles all of this stuff for you. If you are simply using HPC for replication, experiment() can do that. See the examples in ?experiment.

Thanks,
Alex

claram...@googlemail.com

unread,
Aug 22, 2019, 12:45:00 PM8/22/19
to SpaDES Users
Sorry for the late reply, I am on a train and have very spotty reception. I wrote this a while ago, but it didn't show up:

####
Hi,
Thank you! I don't specify anything for combine, my code looks something like

Packages
Paths, Module, etc
#SimInit wolfalps....
Getparworkers..
Doparallel...
print("hi")
Foreach (i=1..3) %dopar% print("hello")

I will post more details when I am home. Like this it works, but if I remove the # before simInit it only goes to the "hi". There is unfortunately no error message, it goes on until timeout. I don't really care how things are combined, only that it runs somehow.
Thanks a lot for your fast answers :)

####

I decided against experiment because when this works I also want to change parameters. But if I can't get foreach to work I will probable come back to it, so thank you for the reminder.

When I tried to really run the simulation I did include the packages in the dopar command, if I hadn't it would not have worked locally.

Could it be because of some R Version clash? I am very much guessing only because I don't get any error messages, it just goes to the foreach command and stays there forever without doing the things in it.

I will send you the exact Code I used as soon as posible, so maybe don't bother guessing if the above is too confusing.

Chao,
Clara

claram...@googlemail.com

unread,
Aug 26, 2019, 6:09:12 AM8/26/19
to SpaDES Users
Hi again,
I couldn't get to my files these last days, but now I am on it again. So finally I can show you my code:

rm(list = ls(all.names = TRUE))

library(devtools)
library(SpaDES)
library(NetLogoR)
library(doParallel)
library(tictoc)
library(plyr)
library(data.table)

.libPaths("/home/mypath")



#tic("start initialization")

moduleName <- "wolfAlps"


workingDirectory <- file.path("/home/mypath/wolves_firstpackfuerPALMA", "wolfAlps")


times <- list(start = 0, end = 20, timeunit = "year") #start: 2000.

# These are the default parameters, none is being changed at the moment.
wolfiparameters <- list(
  # .plotInitialTime = start(sim),
  # .plotInterval = 1,
  # .saveInitialTime = start(sim),
  # .saveInterval = 1,
  StartingNumberOfWolves = 2+4+2, #34.0+60+52, #Alps = 61, Germany = 34
  MeanNpups = 3.387, #4.314
  SdNpups = 1.210, #2.069
  AdultMortalityRate  = 0.18, JuvMortalityRate = 0.449, DispMortRatePerMove = 0.0353, #0.0353.
  MeanPackSize = 4.405, SdPackSize = 1.251,
  EndDispersal = 0.98, #0.98
  CellWidth = 1.00, MoveStep = 10.929 * 1.25, #Alps = 1.25, Germany = 1.00
  sigma = 21.802,
  MeanPixelQuality = 0.84, MinPixelQuality = 0.376, MinPackQuality = 89.288 * 1.25*1.25,
  PackArea = 256 * 1.25 * 1.25, PhaseTransitionLower = 0.198, #original 0.198, last run: 0.08
  run.tests = FALSE
)


modules <- list(moduleName)
moduleDir <- file.path(workingDirectory)


paths <- list(
  modulePath = moduleDir,
 # modulePath = file.path(moduleDir, "wolfAlps"),
  inputPath = file.path(moduleDir, "wolfAlps", "data_Germany_firstpack"),
  outputPath =  file.path(moduleDir, "outputR")
)


# easier and more direct than via objects loading
inputs <- data.frame(file = c("wolves2008.asc", "packs2008.asc", ##### Germany: file ..2008.. has 2000/1 data.
                              "CMR.asc", "HabitatSuitability.asc"))


# accept default parameters: put comment behind "times,"
wolfModuleStart <- simInit(times = times, params = list(wolfAlps = wolfiparameters),
                           modules = modules, inputs = inputs, paths = paths)

#toc()



registerDoParallel(cores=3) #32 pro node
getDoParWorkers()



#tic("singlewolfoutput")

#singlewolfoutput<-spades(wolfModuleStart, progress = 20,debug=T)
################ this works, runs through, everything ok.

#toc()
tic("combinedWolfAlpsOutput_2001")

combinedWolfAlpsOutput_2001 <-  foreach(i=1:3, .packages = c("devtools", "NetLogoR", "SpaDES", "plyr", "data.table")) %dopar%    
 spades(wolfModuleStart, progress =10,debug=T)
############ this does not run through. Starts (maybe?), but never stops, and 
############ does not produce output or error or anything. But it works exactly like this, only changed paths, on my local ############ computer.


#simpletest <-  foreach(i=1:3) %dopar%    
#  print("gehtdas?")  ############## does not work either. Does not seem to start. But it works if I do it without the                            ############SimInit call

toc()
combinedWolfAlpsOutput_2001


save.image("/scratch/tmp/mypath/firstpack_3rep.Rdata")




In this code there are the three things I tried - without foreach, it works really well, see singlewolfoutput.
What I want to do is the combinedWolfAlpsOutput_2001 but it does not run at all.
And simpletest does not run if I do it after the SimInit call, but works if I leave out the SimInit. 

I hope this is better to understand now. Thank you for your help!
Greetings from Germany,
Clara


Alex Chubaty

unread,
Aug 27, 2019, 12:51:07 PM8/27/19
to SpaDES Users
I obviously can't run this to test since you're using modified code, but as I mentioned in an earlier message, you need to provide a `Copy()` of the simList (i.e., `Copy(wolfModuleStart)`) to each of the threads. I also can't tell from your code what type of cluster you are creating. SOCK vs FORK will matter when it comes to the packages and objects each thread has access to. I.e., you may need to get packages loaded. Try `reproducible::Require(packages(wolfModuleStart))`.

I suggest you follow the parallel example for the module and use experiment() with a PSOCK cluster, since that one has been tested, and as far as I can tell you aren't wanting to do anything differently.

Alex

claram...@googlemail.com

unread,
Aug 28, 2019, 8:09:39 AM8/28/19
to SpaDES Users
I think running the above should work with the original wolfalps stuff if the StartingNumberOfWolves is commented out. That is the only important thing I changed to make it a bit more flexible. 

Do you know why even the simpletest does not work if I do the simInit first, but does if I don't? The print() shouldn't need any information about simList, I think. Unfortunately I don't know either what type of cluster I am creating, I will try to find out. 

claram...@googlemail.com

unread,
Sep 3, 2019, 5:55:15 AM9/3/19
to SpaDES Users
I had the opportunity to get some help from our HPC coordinator, and it turned out that the problem was due to compiler issues.. Changing from intel to foss compiled R packages seemed to solve the issues, at least for computation on one node. Thank you, though, for your input. I think it will be helpful if I need more nodes and I will definitely come back to it. 
Reply all
Reply to author
Forward
0 new messages