improve on a script to parallelize a loop for ctmm

114 views
Skip to first unread message

K Nicholson

unread,
Nov 1, 2021, 3:16:09 PM11/1/21
to ctmm R user group
I am a firm believer in sharing R scripts. I am a wildlife biologist not a computer programmer. i was trained to understand, interpret wildlife behavior and how people can use this knowledge to conserve and better co-exist with them. So, an analysis shouldn't be what holds up this final result. I am NOT an expert, I am constantly looking for assistance. I probably cannot help you beyond just getting you started here with some code. (i hope not to confuse you more than needed) It took my efforts from 29 hrs to 11 hrs while I still was able to work my day job.  In my limited understanding, all this will do is specifically assign a task to a specific core instead of the computer trying to sparse that out across each one where several may share the task to complete, thus eliminating interruptions and lags due to shifting task loads between cores. (others can correct me if Im wrong -- I just know it cut my times in 1/2 if not more).  I just thought for those of you who are better at coding may be able to incorporate something like this into the work and help speed up processing ctmm loops -- and then hopefully share with the rest of us :).  Maybe need a different apply method than parLapply for moving window loops.

This is code I use before running a JAGS model for Stable Isotope analysis using MixSIAR library.  Im running on a WINDOWS Dell Latitude 7490 Dual QuadCore (8 cores). 

library(parallel) 
       
ncores <-  detectCores()-1 #set how many cores you will use. Ive 8 cores and still want to work, so I recommend using at least 1 or 2 cores less than the max. If you know your # already, you can delete the code and just say ncores <-7

start <- Sys.time()   #just to keep track of how long things take, unnecessary
cl <- makeCluster(ncores, type = "SOCK")  #look at the different types for your system in the vignette.

clusterExport(cl, c("source", "discr", "mix", "source.filename",
                    "mix.filename", "discr.filename", "output_JAGS"), envir = environment()) # look in the vignette as to what the cluster is going to need to read and export.  The compiled "c()" are what are called on ("source.filename", "mix.filename", "discr.filename") by the MixSIAR package and what will be produced in the end (jags object "output_JAGS", lists that are written to and used by MixSIAR "source", "discr", "mix"). Within my code, I also specify graphs to be produced and saved and they are still written/saved without being specified here. for example: plot_save_png=TRUE,  return_obj = TRUE. So, again look at the vignette fore better explanation of what goes here. Likely, for ctmm it will be the telemetry data at minimum.

jags.mod <- parLapply(cl, seq_len(n.mod), function(mod) {  #this tells the loop to parallelize the function there are a BUNCH of different ways to parse out the load of data to your cores -- look at the vignette to see which method may work better for you.
  library(MixSIAR)  #because you are sending the loop to different (isolated) nodes/cores, you have to tell each  node what libraries to use.
  library(tidyr)
  library(R2jags)
  library(ggplot2)
#######....the rest of your loop
})
stopCluster(cl)  #read vignette for why you need to do this
elapsed <- Sys.time() - start  #just to figure out how long it took
elapsed



Reply all
Reply to author
Forward
0 new messages