Data with highly irregular sampling rate

86 views
Skip to first unread message

Jillian Rutherford

unread,
Aug 26, 2021, 6:31:11 PM8/26/21
to ctmm R user group
Hello Chris,

I work with very large (4000-250,000+ obs) chimpanzee datasets that have essentially no consistent sampling rate. This data was collected long ago with no record of why sampling rate is almost random, but I would like to estimate home range using AKDEc if at all possible. 

Figure 1 below is the telemetry object produced by an example of one of my datasets.
Figure1.png

Here are some details about the sampling rate for this dataset:
- Minimum lag between observations: sub-second
- Maximum lag: 30.8 days
- Median lag: 49 seconds
- Mode lag: 17 seconds

Figure 2 and Figure 3 below are visual representations of the lag times between obs in this dataset.
Figure2.png
Figure3.png
I think the intense irregularity here is causing a number of challenges in the latter steps of estimating home range, so my aim is to start back from the beginning and ensure that I am fully understanding if my approach is appropriate / how to fix it. 

First, the variograms produced by my datasets tend to be very jagged/ugly in comparison to those I see on here and in the vignettes. 

Figure 4 and Figure 5 below show a variogram produced with the default dt and one produced by altering the dt using the "multi method" outlined in a vignette to try to see through some noise. 
Figure4.png
Figure5.png

Figure 6 and Figure 7  below show the model fit for this variogram, generated by ctmm.select() and I am not sure whether it looks acceptable, especially at small time lags. 
Figure6.png
Figure7.png

Do you have any recommendations about processing or adjustments that might need to be applied to this kind of dataset at this stage to ensure that ctmm.guess() and ctmm.select() produce the best model fit? 

Later on in the process I have found difficulty in getting akde() to run without causing memory limit issues (especially when weights=TRUE), and when I try to adjust the dt parameter of akde() I get wildly different looking home range outcomes, so I want to make sure I am catching problems early if they exist. I am hoping to follow up this thread with explanations of these problems further down the line once I sort out this stage.


Thank you very much in advance! I would be happy to provide the data/code I am using if it would help.
Cheers,
Jillian 

Jesse Alston

unread,
Aug 27, 2021, 5:03:31 AM8/27/21
to ctmm R user group
Hi Jillian,

This kind of data seems somewhat common with large primates, where someone follows an individual or group around with a handheld GPS and takes locations as opportunity allows. I'm guessing that's the case here?

I'll let Chris answer your problem about model selection, but to address your out-of-memory problems, if you're only interested in estimating the area of home ranges, you can subsample your data to exclude the closest locations in time. Locations that are just a few seconds apart are not nearly as useful for estimating the area of home ranges as they are for estimating things like location error and short-term movement speeds. I'm guessing that if you subsample the data so that all locations are at least a few minutes or so apart, you'll have a lot fewer data points to iterate through but not lose much information relevant to estimating an accurate home range.

Do you mind telling us how much RAM you have on your machine? It's helpful for us to know how where desktop users are hitting computational limits.

Jesse

Christen Fleming

unread,
Aug 27, 2021, 4:39:02 PM8/27/21
to ctmm R user group
Hi Jillian,

The last variogram plot actually looks pretty good for home-range estimation. For the finer-scale inference, you definitely want an error model, and preferably one derived from calibration data ( see vignette('error') ). Sometimes not having an appropriate error model can also throw off the estimates at larger scales too. I would do something, just to avoid the risk of location errors throwing things off. Another solution is to coarsen the data a bit. If you only care about home ranges, then it shouldn't make much difference which approach you take.

Best,
Chris

Jillian Rutherford

unread,
Sep 3, 2021, 6:38:13 PM9/3/21
to ctmm R user group
Hello Jesse,

Thank you for your helpful reply! You are correct in thinking that the data are collected by humans with handheld GPS units, opportunistically following one or more wild chimpanzees. I appreciate your subsampling recommendation, and have worked to write some code that would allow me to remove the locations less than X seconds/minutes apart. I was initially hesitant to do much subsampling at the risk of removing biologically meaningful information, but I can appreciate that the closest of these points perhaps don't add much extra information and I'm hopeful this will help with memory problems down the road (stay tuned)!

For reference, I have been working on a desktop with 12GB RAM, though I was even having troubles estimating more than 6 or 7 home ranges (from my large datasets) consecutively within the same RStudio session on a server with 256GB RAM. I am just now working on incorporating an error model as Chris suggested in his reply, and then hope to get back to you in this same thread to determine if the subsampling approach has helped solve the memory issue. 

Thank you again!
Cheers,
Jillian
Reply all
Reply to author
Forward
0 new messages