Balancing non-stationarity and sample size

229 views
Skip to first unread message

Genevieve Finerty

unread,
Sep 3, 2020, 9:56:00 AM9/3/20
to ctmm R user group
Hi Chris, 

Apologies for yet another message! I am trying to work out how tau[velocity] and speed/distance change over time, over a stationary period of ranging behaviour. I have an example of a lion here tracked for approx 700 days. I have fit a cttm to calibrated data for the full dataset, which seems to work well, with OUF selected - with a tau[position] in the region of 7 days, and tau[velocity] of just over an hour.

Screenshot 2020-09-03 at 14.29.16.png
With these kinds of estimates, it seems like I would need a minimum of something like 150 days for home range estimates, and 2-days plus for speed/distance to reach DOF ~20.

My first question is about non-stationarity. It seems like lions likely exhibit non-stationarity in their velocity autocorrelation as they are stationary for most of the day, and active/foraging between about 7 pm - 6 am (which I can see from plotting the instantaneous speeds)
Screenshot 2020-09-03 at 14.39.36.png
This is pretty similar in some ways to the simulation from Mike's paper of a CPF which showed biased estimates of e.g. daily distances. The paper suggests subsetting into stationary periods and estimating these separately, however, this would give me very few DOF for speed. I have used calibrated data to try to mitigate error bias during the day when they are moving little -- but am just wondering how much of a concern this should be?

My second question is about estimating speed/distance values for subsets of the dataset, and the relative merits of estimating these based on guesstimates/fits from the full dataset vs the subset.

I first tried using the subsets of data to fit ctmms using ctmm.select(), then using these to generate speed estimates. I found that once I clipped the data down to 48 hours, the models had trouble resolving tau[velocity], but by about 7 days they were producing similar tau[velocity] estimates to the full dataset (albeit with wider CIs) - so I tried also generating speed estimates using this. Here's what the two options looked like compared to the variogram of the first 7 days of data (where red is the fit from the full dataset, and purple is the fit from the subset:
Screenshot 2020-09-03 at 14.48.05.pngScreenshot 2020-09-03 at 14.48.11.png

I guess I have two main questions here -- first, if tau[velocity] is just over an hour, and the sampling schedule is an hour, is the reason that 48-hour subsets wouldn't be able to resolve the velocity param because of a sample size issue? Secondly, in this case, is there much benefit to fitting a separate ctmm per subset, rather than just using the fit from the full dataset? As it seems like this would be computationally more efficient and would mean I only have to check once per trajectory if a model with velocity autocorrelation is selected, rather than for each data subset.

Thank you!

Gen 

Christen Fleming

unread,
Sep 3, 2020, 8:01:46 PM9/3/20
to ctmm R user group
Hi Gen,

In the future, I plan to implement a state switching model, like in the papers from Blackwell's group (e.g., https://besjournals.onlinelibrary.wiley.com/doi/abs/10.1111/2041-210X.13154 ). But for the moment, if you fit one stationary model to resting and foraging data, then you do get an average behavior model that will more slightly underestimate the active speed, while sometimes grossly overestimating the resting speed. How bad that is will depend on how extremely different those two states are. For perching CPF birds, it is very bad. Have you tried segmenting by day/night and then pooling some number of days/nights/ together? There are some gap-skipping arguments in speed/speeds, but the default arguments should readily skip the large gaps in between. [Tangentially, annotate() will estimate the local solar flux, which I find useful for this.]

If DOF[speed] dips too low, then you will eventually fail to select a continuous-velocity model. DOF[speed] should be roughly proportional to the sampling period, so decreasing that will decrease DOF[speed].

Regarding using the full-track fit for the windowed speed estimates, this approximation is good if the behavior isn't changing much. An example of when this would fail would be a big migration. For your lions, doing that with segmented day/night data would probably be a much better approximation than using windowed estimates but not segmenting by day/night, because the differences in behavior are probably more extreme between day/night than day-to-day and night-to-night.

Best,
Chris

Genevieve Finerty

unread,
Sep 17, 2020, 2:18:14 AM9/17/20
to ctmm R user group
Thanks for this Chris! I just realised my response to this hadn't posted! I tried segmenting each by >0.6 sunlight = middle of day, <0.6 = active period (ish) and that seemed to work well (splitting by actual night/day produced some weird looking jumps in the variograms). Here's a plot of one of the instantaneous speeds averaged over each hour for the segmented model (red) and an average model (black) and it looks like the average model does over-estimate resting speeds quite substantially. The same pattern emerges if I run speed() on day/night segmented data using the average FiT vs the night/day FITs.

Screenshot 2020-09-16 at 20.33.57.png
Tangentially, I have just plotted here the mean (+- se) of the point estimates -- which is ignoring the fact each point already has low/high CIs, I'm not quite sure how appropriate that is, as that uncertainty is, therefore (I guess?) artificially small. Although I guess that is more of a question regarding statistics than ctmm!

Some of my ctmms weren't a great fit in terms of capturing all elements of the empirical variogram at all lags (although at short lags they all look pretty good) I think because lions probably exhibit some bursts of periodicity/diffusive movement, even within a broadly stationary mean location -- I'm not sure how much of an impact this would have on speed estimation (I'm not running AKDE on anything that doesn't have a clear asymptote)? Would I be right in thinking that for AKDE (where that is appropriate) I should also run this on the segmented night/day datasets? (then use mean, with weighting, to merge them?). The values for tau[position] don't change as dramatically between night/day as tau[velocity] but I'm not sure intuitively how much of an impact non-stationarity in tau[velocity] would have no AKDE...

I also had a go at using the distance function you put together to discriminate between animals that are shifting a lot month to month, vs those who remain in the same area, and that seems to be working nicely, thank you!

All the best,

Gen

Christen Fleming

unread,
Sep 17, 2020, 4:44:29 PM9/17/20
to ctmm R user group
Hi Gen,

You can use the metafor package to average estimates (with uncertainties). I'm starting to put custom meta-analysis features into ctmm, but for the moment it's just for home ranges.

Yes, for speed estimation, the asymptotic features are not important. For home-range estimation, if it's not changing the tau[position] & asymptote, then I doubt that extra work would make much of a difference and I haven't yet code the UD mean() function to propagate uncertainties.

Best,
Chris

Genevieve Finerty

unread,
Sep 18, 2020, 9:45:41 AM9/18/20
to ctmm R user group

That makes sense. Thanks, Chris, you've been a huge help!

Genevieve Finerty

unread,
Sep 25, 2020, 9:14:37 AM9/25/20
to ctmm R user group
Sorry, me again, 

I re-ran this over a larger subset of data -- and have a couple of instances where the selected models surprised me and was wondering if you might be able to shed some light on what might be going on!

For example, I have a variogram and GUESS that look like the following (this is only day-time data, I've used dt of 1/17 hours to smooth but still a few odd bumps here are there):
And a fitted model (OUF anisotropic) that looks like the following

If I re-run ctmm.select with verbose = TRUE,  I can see that the third-ranked model by AICc (OUF isotropic) has the lowest delta RMSPE, and looks like the following:
 which looks like a much better fit to me, but I'm wondering why it performed so poorly on AICc criteria? 
Screenshot 2020-09-25 at 14.12.02.png

Thanks! Gen

Message has been deleted

Genevieve Finerty

unread,
Sep 25, 2020, 9:41:49 AM9/25/20
to ctmm R user group
My internet seems to hate me today, attaching two images here.
Screenshot 2020-09-25 at 14.19.11.png
Screenshot 2020-09-25 at 14.18.59.png

Christen Fleming

unread,
Sep 25, 2020, 5:37:42 PM9/25/20
to ctmm R user group
Hi Gen,

RMSPE doesn't really mean anything for these comparisons, because their trend models are all the same (fixed centroid). However, it is very strange that the isotropic model looks so much better for the variogram. If you zoom out, is there an upwards trend at the end of the variogram that matches better with the anisotropic model?

Best,
Chris

Genevieve Finerty

unread,
Sep 26, 2020, 2:52:46 PM9/26/20
to ctmm R user group
That's what I thought, as I've only used RMSPE during periodicity model selection. And you're completely right about the upward trend at the end of the variogram!

Genevieve Finerty

unread,
Sep 26, 2020, 3:10:44 PM9/26/20
to ctmm R user group
Scrap that, I am an idiot. There is an upwards trend, but not enough to affect the model selection. However, I stored all my fits in an unnecessary complicated nested list structure and messed up the matching process. Mystery solved.
Reply all
Reply to author
Forward
0 new messages