Speed-up things for multiple animals

243 views
Skip to first unread message

Ingo Miller

unread,
Mar 28, 2021, 4:18:31 AM3/28/21
to ctmm R user group
Hi Chris (or anyone else who can help),

I have a large data set of 40+ seaturtles for which I want to calculate AKDEs. Currently I'm doing so using lapply to calculate all of them in one go. AS this takes forever, I would like to put my code into a 'foreach' loop, which you metioned in your last webinar would speed-up the process as thhis funtion would use all cores (?). 
I have never used foreach before and would appreciate if someone could help me to set it up.

Here's my current code:

# fit models for all animals
GUESS <- lapply(turtle.trj, function(b) ctmm.guess(b, CTMM = ctmm(error = 10),  interactive=FALSE) )
FITS <- lapply(1:length(turtle.trj), function(i) ctmm.select(turtle.trj[[i]],GUESS[[i]], verbose=F,  trace = TRUE, cores = 2)) 
names(FITS) <- names(turtle.trj)
UDS <- akde(turtle.trj, FITS, grid=list(dr=500, align.to.origin=T), weights=F)

Cheers,
Ingo

Christen Fleming

unread,
Mar 29, 2021, 10:49:17 AM3/29/21
to ctmm R user group
Hi Ingo,

The first step would be to turn this into a regular for() loop and to make sure that works. I would also remove the cores argument from ctmm.select() to do all parallelization in the foreach loop.

Best,
Chris

Ingo Miller

unread,
Mar 29, 2021, 1:57:49 PM3/29/21
to ctmm R user group
Thanks Chris for your reply!

A loop function that seems to work is the following (except the names() argument which interrupts the code, however I would like the have names instead of numbers for the turtle IDs):

T.FITS <- list()
T.AKDE <- list()
for(i in 1:length(turtle.trj)){
  print(i)
  T.GUESS <- ctmm.guess(turtle.trj[[i]], CTMM = ctmm(error = 10), interactive=F)
  T.FITS[[i]] <- ctmm.select(turtle.trj[[i]], T.GUESS, verbose = T, trace=2)
  #names(T.FITS) <- names(turtle.trj) # this results in error: Error in names(T.FITS) <- names(turtle.trj) : 'names' attribute [44] must be the same length as the vector [1]
  T.AKDE[[i]] <- akde(turtle.trj[[i]], T.FITS[[i]], grid=list(dr=500, align.to.origin=T), weights=F, trace=T)
  }

Would be great if you could help me to transform this into a foreach loop as I'm very new to loops in general and I can't find any examples that help me to get it to work.

Cheers,
Ingo 

Christen Fleming

unread,
Mar 29, 2021, 4:56:33 PM3/29/21
to ctmm R user group
Hi Ingo,

You need to move your names() assignment to after the loop, for that to work as is.

The next step to making a foreach() loop is to make the body of the loop a function that returns the element of a return list. It's actually a lot more like lapply() than for(). So you will probably want to break your for() loop up into two loops - one that fills the FITS list and one that fills the AKDE list. So the first loop will look like

fitting_function <- function(i)
{
  T.GUESS <- ctmm.guess(turtle.trj[[i]], CTMM = ctmm(error = 10), interactive=F)
  ctmm.select(turtle.trj[[i]], T.GUESS, verbose = T, trace=2)
}

for(i in 1:length(turtle.trj)) { T.FITS[[i]] <- fitting_function(i) }

then the not-yet-parallelized foreach() loop looks like

T.FITS <- foreach(i=1: length(turtle.trj)) %do% { fitting_function(i) }

Then when you have this working, you make it parallel with %dopar% instead of %do% and also declaring the necessary packages in the foreach() function.

T.FITS <- foreach(i=1: length(turtle.trj),.packages='ctmm') %dopar% { fitting_function(i) }

but before this you have to register a parallelization backend to get it to parallelize over the specified number of cores. If you are running in UNIX (Linux or MacOS), then I recommend using forks, because its fast and easy: https://privefl.github.io/blog/a-guide-to-parallelism-in-r/

I think that's everything, but if I've forgotten something, then hopefully someone will chime in.

Best,
Chris

Reply all
Reply to author
Forward
0 new messages