Questions regarding calculating akde(), mean() and interpreting results

301 views
Skip to first unread message

abern...@gmail.com

unread,
Aug 3, 2022, 2:35:00 PM8/3/22
to ctmm R user group
Hi Chris, 

Making progress on my analysis of looking at caribou herd akde's but have a few questions about how to interpret some of the results.  Bit of a rambling list, but hopefully others find the answers helpful!  I have tried to stay up to date on the various manuscripts, but if there is one I'm missing that would answer these technical questions, please point it out to me!  

Thanks, Robin
~~~~~~~
First of all, I ran the following to get my akde UD objects in a list form

#create AKDE home ranges
akde_function <- function(i)
{akde(DATA.trj[[i]], T.FITS[[i]], weights=F, trace=T,grid=list(dr=500, align.to.origin=T))}
#populate list with akde ud objects
for(i in 1:length(T.FITS)){
   print(i)
   T.AKDE[[i]]<-akde_function(i)}

I'm working with a small set of data, 43 individuals, for one month, roughtly 3 locations a day.  Fully aware this may not be enough data, but starting small before working with the larger dataset.  

Question 1: As the for loop proceeds thru the lit, each iteration prints 
"Default grid size of XX.xxxxx minutes chosen for bandwidth(...,fast=TRUE).
Bandwidth optimization complete."  

The XX.xxxxx is not consistent for each iteration.  As each of these UDs need to be on the same grid, and I specified the grid, why am I getting this printout? What does it mean?  

Question 2:  While the DOF in the resulting UD objects varies, with many being well below the minimum of 4, all but 1 Fit object leads to a UD object.  In one case, a UD object is not returned, but the summary of that object (not a ctmm UD object) lists DOF[area] = 1.00532482180468e-08.  So very bad.  Is there a cutoff in the function that if the DOF falls below a certain level, a ctmm UD object is not returned?  

Question 3: If I remove the one case where a UD object wasn't returned, mean() works.  I'm not sure how to interpret the results.  Does the rule that DOF should be above 4 still apply at the population level (on this subset I get DFO 5.4)?  Or is that inadequate,  would it need to be higher?  the wide CI on the area show its not really meaningful as the area. 

pop.ud<-mean(T.AKDE)
> summary(pop.ud)
$DOF
     area bandwidth
  5.42555        NA

$CI
                              low      est     high
area (square kilometers) 2706.349 7873.631 15751.07

attr(,"class")
[1] "area"

Question 4: Finally, If I subset this small dataset further (looking to see how it impacts the DOF) and try to only mean a subset of the akde object, the code fails with the following error.  I can't wrap my head about why, if it works with the entire group, it would fail on a subset.

 Error in if (any(parscale==0)) { :
 missing value where TRUE/FALSE needed
in addition: Warning message:
In sqrt(sigma["xx"] * sigma["yy"]) : NaNs produced

Christen Fleming

unread,
Aug 3, 2022, 5:02:59 PM8/3/22
to ctmm R user group
Hi Robin,

  1. There is an FFT algorithm used in the bandwidth optimization for computational speed, and this was the default temporal grid size chosen. This choice doesn't matter very much when weights=FALSE. This temporal grid doesn't need to be the same across individuals in the way that the spatial grid needs to be compatible for mean(), and it should be reflecting the individual's sampling schedule.
  2. Yes, and there should be a warning issued when this happens. The alternative is that the confidence intervals are so wide that the UD raster is absurdly large and you get an out-of-memory error. The KDE isn't meaningfully different from the Gaussian distribution in these cases, and it's not really worth anything.
  3. I'll make a note to automatically remove these bad UD returns automatically from functions like mean(). I don't know that there's a cutoff at which the log-normal model in mean() starts to degrade badly, because it's a pretty simple model, comparatively, but I am surprised that you only get DOF=5 out of 42 individuals. I would expect a number somewhat less than 42, because of the loss of information from the uncertainty, but 5<<42. I would recheck with a recent development version of the package, via devtools::install_github("ctmm-initiative/ctmm") , to see if the low DOF was from a bad model being selected - there are two model selection routines underlying mean() and I made some improvements to them a while back.
  4. Is this happening in a recent development version of the package? If so, please send me a minimal working example (data + script) and I will make sure this is fixed and working.

Best,
Chris

abern...@gmail.com

unread,
Aug 3, 2022, 6:36:15 PM8/3/22
to ctmm R user group
Thanks for your rapid response Chris!  Followup comments

1 - good to know thanks!
2 - Great, actually an error was issued at the end of the for loop.  I just forgot I saw it;
3- That be cool, in the meantime, i just filtered and removed when an ud object wasn't returned, or when the DOF was >1.  I'm running the ctmm dev package version 1.0.1 downloaded from github in the last couple weeks.  Worth redownloading?  The T.FIT function took almost 3 days to run, so hoping to avoid, although I'm going to have to tackle parallel processing fairly quickly, as we are looking at running these akde's often.  If these low DOFs are true, and not a model selection issue, could it just be I don't have enough data.  Of the 42 UDs i feed into mean, the far majority had very low DOF (rough average 1ish).  I have an entire year of data, but I was starting small to get my workflow sorted and to see if we could come up with a meaningnful herd ud on a monthly basis.  Perhaps we need 6 weeks, or 2 months of data before we can say something. 
4 - This was running on the recent version (dev 1.0.1).  I will send data/script to your gmail.  

Thanks again!  Ill send the data this evening.  R

Ingo Miller

unread,
Aug 3, 2022, 9:28:03 PM8/3/22
to ctmm R user group
Hi Robin,

since I recently had the same issue of model fitting taking days and because I'm quite new to loops, foreach and such, I thought I'll share my approach on speeding up model fits by parallelization with foreach:

library(doParallel); library(foreach) # packages you'll need for this


## backend for foreach (this works for Windows, when running on MAC you may have to use another backend method):
cl <- parallel::makeCluster(detectCores(), outfile = "")
doParallel::registerDoParallel(cl)

# create a 'fitting' function
fitting_function <- function(i){
  T.GUESS <- ctmm.guess(turtle.trj[[i]], CTMM = ctmm(error = T), interactive=F)
  ctmm.select(turtle.trj[[i]], T.GUESS, verbose = T, trace=2)
}

# use the fitting function in a foreach loop and parrallel backend by using %dopar%
FITS <- foreach(i=1: length(turtle.trj), .packages='ctmm') %dopar% { fitting_function(i) }


parallel::stopCluster(cl)

For some reason, the foreach loop still took forever to run for AKDE, so I did that with lapply instead and it was fast:

UDS <- lapply(1:length(turtle.trj), function(j) akde(turtle.trj[[j]], FITS[[j]], grid=list(dr=500, align.to.origin=T),  weights=T, debias=T, smooth=T, trace=3, fast=F, PC='direct')) 


Hope this helps! :)

Cheers,
Ingo 

Christen Fleming

unread,
Aug 4, 2022, 3:15:03 AM8/4/22
to ctmm R user group
Hi Robin,

Okay, that makes more sense with the population DOF if the individual DOFs are all very small. The population DOF is only equal to the number of individuals, in the limit that the individual DOFs are infinite.
The computation time is not an issue here, though, as I'm talking about the model selection within mean() and not within ctmm.select(). ctmm.select() considers a number of individual autocorrelation models and each take some time to fit. mean() currently considers whether or not there is isotropy in the distribution of individual means and in the distribution of individual location covariance matrices, and this calculation should be relatively fast.

Yes, if the monthly home-range estimates tend to have small DOFs, then you might consider expanding your observation window to seasons: https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.13270

I have your data and will make sure it works ASAP.

Best,
Chris

abern...@gmail.com

unread,
Aug 4, 2022, 1:21:21 PM8/4/22
to ctmm R user group
Thanks Ingo!  Working these suggestions In now!  How much time did the parallel processing take of your model fitting?  R

abern...@gmail.com

unread,
Aug 6, 2022, 3:27:14 PM8/6/22
to ctmm R user group
Hey Hey!  Just for those looking to implement the parallel processing shared above - it worked!  Cut my processing time from roughly 3 days to 1.5 days!  R

Ingo Miller

unread,
Aug 6, 2022, 6:02:55 PM8/6/22
to ctmm R user group
Awesome! Glad it worked for you :) I could cut down processing time from about 3 days to roughly half a day as I could run 11 cores at the same time instead of just 2. 
Cheers, Ingo 


Reply all
Reply to author
Forward
0 new messages