issues estimating speed and distance moved from irregular dataset

295 views
Skip to first unread message

Anni

unread,
Aug 7, 2020, 12:47:14 PM8/7/20
to ctmm R user group

Hi Chris,

I have a dataset from a turtle with 270 locations spanning almost 3 months. The sampling intervals are very irregular and infrequent at times (sometimes I get >15 locations per day, sometimes nothing for 3 days). This is because the animals spent most of their time in the water where I don’t get a GPS signal from.

I am trying to estimate daily distance moved (at least for the days I have locations for) using the CTMM package. I am not sure if this is even feasible given the infrequent sampling.

I haven’t calibrated the telemetry error yet, but rather have it estimated with the movement model simultaneously.

Doing this, a OU (anisotropic) model is fitted with FITS <- ctmm.select(turtle, CTMM = GUESS), but when I try to get the speed(turtle, FITS) I get the following warning message and Inf estimates for the speeds throughout:

> speed(turtle, FITS)

                      low est high

speed (meters/second)   0 Inf  Inf

Warning message:

In speed.ctmm(CTMM, data = object, level = level, robust = robust,  :

  Movement model is fractal.

 

I suspect this has may have to do with either a small sample (although this is one of my larger samples for one turtle individual with 270 locations) or variability in sampling frequency? Is there a workaround for this?

Alternatively, I have tried assigning a telemetry error uere(turtle) <- 10. Doing this, ctmm.select fit a OUF anisotropic model, but the plot looks like this. 

plot_ctmm.png

Doing it this way I can estimate speed without any warning messages or Inf estimates. But the fit just does not seem right. What is going on here?

Further, when I go on to estimate daily distance moved it seems to be OK for the first 8 days, but then I get this error (and Inf estimates for the distances):


0%Error in emulate.ctmm(CTMM, data = data, fast = fast, ...) :   fast=TRUE (CLT) not possible when minor = 0In addition: There were 31 warnings (use warnings() to see them)

 

I assume the second approach is still better than estimating the error simultaneously with the model. Will I benefit from calibrating the telemetry error using my devices following using uere.fit()? 

Sorry for the many questions. 

Thanks in advance.

Best,

Anni


Christen Fleming

unread,
Aug 7, 2020, 2:43:19 PM8/7/20
to ctmm R user group
Hi Anni,

If you select an OU model, then (ignoring all other complications for the moment) the data are too coarse to estimate speed.

uere(turtle) <- 10 may be better or worse depending on whether or not 10 is closer to the truth. In practice, calibrations tend to range between 3--30 meters and simultaneous estimation tends to overshoot that---sometimes by a ton. What was the result of the simultaneous fit when you run it through summary()?

Yeah, the variograms don't look awesome there. Both were calculated with the same calibrated data? That impacts how the error is modeled in the variogram. Also, does the empirical variogram shoot up if you zoom out?  I can't tell if its error or misfitting that's causing this. The fact that OUF fit is promising, though.

The speed() error is suggesting fast=FALSE. There was a realization in the sampling distribution that looked indistinguishable from linear motion, which estimates a parameter on its boundary (minor=0), which invalidates the central limit theorem (CLT). You might also need robust=TRUE with problematic cases like this.

With a turtle species that moves on the order of 200 meters, I strongly suggest collecting some calibration data for the tags and carefully going through error model selection.

Best,
Chris

Anni

unread,
Aug 10, 2020, 5:43:17 PM8/10/20
to ctmm R user group
Thank you for the quick response, Chris! 

When I do the simultaneous error and model fit, summary() gives the following results:
 $name

[1] "OU anisotropic error"

 $DOF

    mean     area    speed

1.247121 1.352211 0.000000

 $CI

                               low      est      high

area (square kilometers) 0.1618482 2.824010  9.177807

τ[position] (months)     0.0000000 5.766410 15.498323

error (meters)           2.8869724 3.353047  3.818349





Yes, the variograms were calculated with the same calibrated error. 
When I zoom out the variogram looks like this:variogram.JPG

I have now collected calibration data and estimated my devices' error using uere.fit() and assigned the UERE to my dataset. The telemetry error seems to be close to what I had guessed earlier (~10m) and I am running into the same problems as before: strange variogram and inability to calculate distances moved. 

I am not sure I completely follow your second to last comment, but I tried running speed() with both fast=TRUE and robust=TRUE, then go on to get daily distance moved from those speeds. It seemed like I was getting a little further this way, i.e. it looked like it tried to estimate distances for more days than it did before, but I am getting this error at the end (after 32 of 49 days that I have GPS data for): 
Error in names(tau) <- tau.names[1:K] :   'names' attribute [2] must be the same length as the vector [0]In addition: There were 50 or more warnings (use warnings() to see the first 50)    

Looking at the resulting table, there are still only 0 and Inf in the distance estimates. 

I'm really not sure what is going on. 

Christen Fleming

unread,
Aug 10, 2020, 6:37:51 PM8/10/20
to ctmm R user group
Hi Anni,

The erratic behavior of the variogram is from the irregular gaps. You can try to fix this with the dt argument. An example is given in the variogram vignette. This only impacts the visualization, though.

The last error about names(tau) shouldn't be happening. Do you mind messaging/emailing me a copy of the data and script you are using, so that I can take a look at that error and the mismatched variograms?

Best,
Chris

Anni

unread,
Aug 10, 2020, 6:59:41 PM8/10/20
to ctmm R user group
Absolutely, Chris. I sent it to your University of Maryland Email. 
Thank you! The help is much appreciated!

Best, 
Anni

Christen Fleming

unread,
Aug 11, 2020, 1:54:16 PM8/11/20
to ctmm...@googlegroups.com
Hi Anni,

Looking at your data and the distribution of time intervals, this is what I used to clean up the variogram:

uere(turtle) <- UERE
turtle.vg <- variogram(turtle,fast=FALSE,CI="Gauss",dt=c(1 %#% 'hr',2 %#% 'hr',20 %#% 'hr'))

In your code, you calibrated your data after you calculated the variogram, which means that the empirical and theoretical variograms weren't calculated with the same data, and so they don't have to match up, as they would have different error models.

Then, this is what I get from ctmm.select:

SVF.png



which matches very well.

Then, getting to your loop, I think there are some issues:
  1. You're trying to get a daily distance traveled, but you have 85 days and yet for the full track DOF[speed] is 31.5, which, if this number is valid, means that there isn't enough data/information there to estimate daily speeds, as there would be <1 DOF per day. Typically, with maximum-likelihood estimates or any other asymptotically efficient estimates, if you want on the order of <5% bias, then you want DOF>20. You might consider monthly estimates, instead, though I don't know how the behavior of this species changes from month to month.
  2. In your loop, you have ctmm.fit instead of ctmm.select. I would stick with ctmm.select with lower data quality like this. If you have better data quality and know that OUF will fit every time, then you could use ctmm.fit to save time. For instance, on the very first day I got that IID was selected by AICc.
  3. The loop crashes on day 33 because that day only has one location sampled. I haven't gotten around to coding functions to give some kind of error/warning for cases like that.
Also, I would use the fit from the entire track as the guess for the individual windows.

Best,
Chris

Anni

unread,
Aug 11, 2020, 7:19:14 PM8/11/20
to ctmm R user group
Hi Chris, 

thank you for looking into my data and script, and helping me understand the ctmm package better!

Makes sense about the variogram.

It's discouraging that I cannot seem to be able estimate daily distances moved with the day(s)-long gaps I have in my data. Would it make sense to use chunks of 3-5 consecutive days for which I have >1 location per day, or is this too low of a sample to fit a model and estimate speed/distances? 

Monthly distance moved is too coarse scale for my research question unfortunately. Do you think weekly distance moved would be a reliable estimate? 

Just to help me understand: the output from speed() is the average speed over the sampling period?
When looking at the output from SPEEDS <- speeds(turtle, FITS) I get an estimated speed for each timestamp. Is this the speed the animal travels at from that time to the next?  Are these speeds just not reliable based on my sparse data? 
This is probably a dumb question, but why can't I use this speed to calculate the distances moved from each timestamp to the next?

Again, thanks for your time and thoughts!

Anni

Christen Fleming

unread,
Aug 11, 2020, 8:02:05 PM8/11/20
to ctmm R user group
Hi Anni,

You can try your luck with a smaller window, but if DOF[speed] is very low, then the estimates will have very wide confidence intervals, will eventually become quite biased, and at some point you will get nothing but (0,Inf) for your speed CIs. If you don't think the general behavior is changing much over the full observation period, another thing you can try is to use the model fit from the whole track instead of window-specific fits.

Yes, speed() is the time-average speed over the sample period, such that distance = speed * time. speeds() are instantaneous speeds at the specified times. They should come with CI information to check how reliable they are.

If you calculated speeds() at many tiny time steps, then you could, by averaging them, recreate the point estimate from speed(), but you wouldn't be able to get the speed() CIs that way.

Best,
Chris

Anni

unread,
Aug 13, 2020, 9:37:07 AM8/13/20
to ctmm R user group
Thank you, Chris!

Do you mean try the model fit from the whole track in ctmm.select() or speed()within the loop?

Best, 
Anni

Christen Fleming

unread,
Aug 13, 2020, 2:07:33 PM8/13/20
to ctmm R user group
For the individual guesses in the loop, that are used in ctmm.select(), I would use the fit object from the entire track. Calculating a guess object from a tiny amount of data could have a lot of variability. But I wouldn't recommend this universally, as the change in behavior from window-to-window could be larger than that variability in some animals.

If you believe the general behavior of the animal is consistent from window-to-window, then as an approximation, you could use the full-track fit in speed() in the loop. That might let you get away with less data per window, but you are making an approximation there.

Best,
Chris

Anni

unread,
Aug 14, 2020, 10:19:58 AM8/14/20
to ctmm R user group

Thank you, Chris. 

I seem to be particularly slow to understand. 

In the loop I get the guesses with GUESS <- ctmm.guess(turtle, variogram = variogram(turtle), interactive = FALSE), whereby turtle refers to the entire telemetry object. Are you saying I should use the fit for the entire track (i.e. FITS <- ctmm.select(turtle, CTMM = GUESS)) here instead of turtle?

 Or do you mean I should use FITS (calculated from the entire dataset) in ctmm.select() in the loop: ctmm.select(FITS, CTMM = GUESS)?

 They both seem kind of circular.

 

The code below (using fit for entire track in speed()) is what I have used now and I am getting daily distances moved from those (with relatively wide CI though).

#where turtle is my telemetry object

uere(turtle) <- UERE

turtle.vg <- variogram(turtle, fast=FALSE, CI="Gauss", dt=c(1 %#% 'hr',2 %#% 'hr',20 %#% 'hr'))

GUESS <- ctmm.guess(turtle, variogram = turtle.vg, interactive = FALSE)

GUESS$error <- TRUE

 

FITS <- ctmm.select(turtle, CTMM = GUESS) # fit from the entire track

summary(FITS)

plot(turtle.vg, CTMM = FITS)

 

speed(turtle, FITS, fast=TRUE, robust=TRUE)

 

SPEEDS <- speeds(turtle, FITS)

 

# estimating daily movement distance over the study period

turtle$day <- cut(turtle$timestamp, breaks = "day")

days <- unique(turtle$day)

res <- list()

 

#loop over the number of days

for(i in 1:length(days)){

  message("Estimating distance travelled on day ", i, ": ", days[i])

 

  #select data for the day in question

  DATA.SUBSET <- turtle[which(turtle$day == days[i]),]

  

  #calculate the duration of the sampling period (in seconds)

  SAMP.TIME <- diff(c(DATA.SUBSET$t[1],

                      DATA.SUBSET$t[nrow(DATA.SUBSET)]))

 

  #guesstimate the model for the initial parameter values

  GUESS <- ctmm.guess(turtle, variogram = variogram(turtle),

                      interactive = FALSE)

 

  #turn error on

  GUESS$error <- TRUE

 

  #fit movement model to the day's data

  FITS.day <- ctmm.select(DATA.SUBSET, CTMM = GUESS)

 

  #calculate speed in m/s

  ctmm_speed <- speed(object = DATA.SUBSET, CTMM =  FITS, units = FALSE) #using the fit for the entire track

 

  #multiple speed (in m/s) by sample time (in s) to get estimated distance travelled

  ctmm_dist <- ctmm_speed*SAMP.TIME

 

  #re-name the variable

  rownames(ctmm_dist) <- "distance (meters)"

 

  #store results in list

  x <- c(i, #the day

         ctmm_dist[2],

         ctmm_dist[1],

         ctmm_dist[3])

  names(x) <- c("date", "dist.ML", "dist.Min", "dist.Max")

 

  res[[i]] <- x

}

 

 

 

Does this make sense?

 

Thanks,

Anni

Christen Fleming

unread,
Aug 14, 2020, 3:42:34 PM8/14/20
to ctmm R user group
Hi Anni,

So in the loop, your GUESS is recalculated each day, but its the same GUESS calculation. And even if it weren't, I think it might be slightly better/faster to have

FITS.day <- ctmm.select(DATA.SUBSET, CTMM = FITS)

As a second consideration, if you having trouble squeezing out speed estimates from small windows of data and it's safe to assume that the overall behavior is largely consistent, then, as an approximation, you might also consider as you have written

ctmm_speed <- speed(object = DATA.SUBSET, CTMM =  FITS, units = FALSE) #using the fit for the entire track

instead of

ctmm_speed <- speed(object = DATA.SUBSET, CTMM =  FITS.day, units = FALSE) #using the local fit

Best,
Chris

Anni

unread,
Sep 2, 2020, 8:46:14 AM9/2/20
to ctmm R user group
Thank you Chris, 

I've finally gotten around to focus on this issue again. 

I ran the code with your suggested alterations, and I am getting nice results with distances for each day. However, these distances seem unreasonably short, so I suspect there is still something wrong in my code. 
In fact, when I run  speed(turtle, FITS) on the entire track I get an estimate of ~97 meters/day, but after going through the loop I get estimates for 5-20 meters every day (17 meters on average). 

My code is now as follows: 

turtle <- as.telemetry(df, timeformat = "%d/%m/%Y %H:%M", timezone = "America/Toronto", projection = NULL) # if no prj specifide: two-point equidistant prj is calculated
uere(turtle) <- UERE

turtle.vg <- variogram(turtle, fast=FALSE, CI="Gauss", dt=c(1 %#% 'hr',2 %#% 'hr',20 %#% 'hr'))

GUESS <- ctmm.guess(turtle, variogram = turtle.vg, interactive = FALSE)

GUESS$error <- TRUE

FITS <- ctmm.select(turtle, CTMM = GUESS) #fit from entire track
summary(FITS)
plot(turtle.vg, CTMM = FITS)

#estimate mean speed over the duration of the study period
speed(turtle, FITS, fast=TRUE, robust=TRUE)

#estimate the instantaneous speeds
SPEEDS <- speeds(turtle, FITS)

# estimating daily movement distance over the study period 
# first identify how many days the individual was tracked for 
turtle$day <- cut(turtle$timestamp, breaks = "day")
days <- unique(turtle$day)

# empty list to fill with the results
res <- list()

#loop over the number of days
for(i in 1:length(days)){
  message("Estimating distance travelled on day ", i, ": ", days[i])
  
  #select data for the day in question
  DATA.SUBSET <- turtle[which(turtle$day == days[i]),]
  
  #calculate the duration of the sampling period (in seconds)
  SAMP.TIME <- diff(c(DATA.SUBSET$t[1], DATA.SUBSET$t[nrow(DATA.SUBSET)]))
  
  #guesstimate the model for the initial parameter values
  GUESS <- ctmm.guess(turtle, variogram = turtle.vg, interactive = FALSE)
  
  #turn error on
  GUESS$error <- TRUE
  
  #fit movement model to the day's data
  FITS.day <- ctmm.select(DATA.SUBSET, CTMM = FITS)
  
  #calculate speed in m/s
  ctmm_speed <- speed(object = DATA.SUBSET, CTMM =  FITS, units = FALSE) #using the fit for the entire track
  
  #multiple speed (in m/s) by sample time (in s) to get estimated distance travelled
  ctmm_dist <- ctmm_speed*SAMP.TIME
  
  #re-name the variable 
  rownames(ctmm_dist) <- "distance (meters)"
  
  #store results in list
  x <- c(i, #the day
         ctmm_dist[2], #the ML distance estimate
         ctmm_dist[1], #Min CI
         ctmm_dist[3]) #Mac CI
  
  names(x) <- c("date", "dist.ML", "dist.Min", "dist.Max")
  
  res[[i]] <- x
}

# bind results together as data frame
res <- as.data.frame(do.call(rbind, res))
res$date <- as.Date(days)

Best, 
Anni

Christen Fleming

unread,
Sep 2, 2020, 3:49:05 PM9/2/20
to ctmm R user group
Hi Anni,

Looking at your code, I would assume that you are comparing the mean speed of the whole track to distances calculated from

ctmm_dist <- ctmm_speed*SAMP.TIME

where

SAMP.TIME <- diff(c(DATA.SUBSET$t[1], DATA.SUBSET$t[nrow(DATA.SUBSET)]))

which would necessarily be less than or equal to a day, if I understand this code. In this case, I would expect your "daily distances" to be on-average less than the mean track speed * 1 day, because of their time intervals can be shorter than a day. I would compare speed estimates, instead, because they are normalized for comparison.

Additionally, I recall from looking at your data that there were some days where speed could not be estimated, which can imply more tortuosity and thus higher speeds than average on those days. So if you are not including those days in the average, then that would also tend to pull the average down.

Best,
Chris

Anni

unread,
Sep 8, 2020, 10:32:22 AM9/8/20
to ctmm R user group
Thanks Chris, 

I understand. So the distance estimated per day will very much depend on when the first and last location was recorded. 

Since it is continuous time movement models that are fitted, do the model interpolate locations and/or speeds for all times? Is there a way to access those?  
In other words, is there a way to get (instantaneous) speeds for any other times that aren't sampled?

This way I would be able to use the instantaneous speeds from speeds() to estimate the distances moved per day?! Does this make sense?

Thank you. 
Best,
Anni

Christen Fleming

unread,
Sep 8, 2020, 3:58:25 PM9/8/20
to ctmm R user group
Hi Anni,

speed() estimates the mean speed between the endpoint times, with large gaps skipped according to cor.min and dt.max. I can code it to have a specified time interval, if you give me a few days. I can see the utility of that, because it would be conditioning on locations slightly outside of the window.

Using speeds() would only let you calculate the point estimate of speed(), which uses simulate() to obtain both a point estimate and CIs.

Best,
Chris

Christen Fleming

unread,
Sep 8, 2020, 8:13:40 PM9/8/20
to ctmm R user group
Hi Anni,

I pushed an update to the development version of the package on Github where speed() has a t argument to specify the period of interest.

If you run speed() with the whole track, then your loop over days will be O(n^2) and very slow.
If you run speed() with only the 1-day segment of data, then you will loose information from times just outside of the window.
I would recommend running speed() with a slightly larger window (say 3 days centered on the day of interest), and setting t to the day of interest.

Best,
Chris

Anne-Christine Auge

unread,
Sep 17, 2020, 6:41:49 PM9/17/20
to Christen Fleming, ctmm R user group
Hi Chris, 

thank you. I think I have a good understanding now of why I am getting these large estimates for daily distance moved. In fact, the confidence intervals are fairly large and when I estimate daily distance moved using the lower CI estimates of the instantaneous speeds it is much smaller. 

To calculate the average speeds, and from that distances per day, I first calculated speeds() using a timevector including every hour between start and end date of the sampling period (not just at the sampled datetimes):
speeds(turtle, FITS, t = seq.POSIXt(startdate, enddate, by = "1 hour")). 

The device I am collecting GPS locations with is combined with an accelerometer. I have acceleration estimates every second without gaps and am using this data to classify behaviour (walking, swimming, motionless in water and on land).  I wonder if it is worth including the acceleration data into the model, or at least into the step when I calculate average speeds per day. This way I could adjust instantaneous speeds estimated with the model based on acceleration data for at least the times when the animals are not moving. For example, I do know from the acceleration data when turtles are not moving, so I could adjust the speeds during those times to be closer to 0. 

Best, 
Anni


On Thu, 17 Sep 2020 at 16:31, Christen Fleming <chris.h...@gmail.com> wrote:
Hi Anni,

Assuming the location error is not having the larger impact, straight-line-distances strictly underestimate distance travelled, and those data do not look very ballistic where the straight-line approximation would be expected to be accurate. Second, how wide are the confidence intervals? Third, are the instantaneous speeds high all day long or only at some times? It's the integral of the instantaneous speed that gives distance.

If the results still seem strange, I would plot the sample with error circles and some conditional simulations atop to get a sense of where the numbers are coming from.

Best,
Chris


On Wed, Sep 16, 2020 at 9:31 PM Anne-Christine Auge <achr...@gmail.com> wrote:
Hi Chris, 

thanks so much for your input. 
I am still working on the last suggestion you made using a 3-day window to run speed() centering around the day of interest. 

Meanwhile I looked into the instantaneous speeds again. I used speeds() to compute the instantaneous speed every hour, and from these computed the daily average speed, then calculated the distance per day from that. 
For some animals I found that:
  1. the mean distance travelled per day is very similar for all days, and;
  2. when I compare the distances to the mapped raw locations I find large discrepancies from the "eye-balled" straight-line distances between locations and the estimated ones from the instantaneous speeds from the model. 
For example, I have attached an image with the mapped raw locations from Day 1 to Day 3. All locations are fairly close together, but I only have a small sample per day (overall the whole dataset for this animal is pretty small compared to others). 
The estimated daily distances I get from instantaneous speeds are much larger than I would expect (400 to 500 m per day) and very unrealistic. Is this due to the small sample size and large gaps? 

For other animals (mainly the ones with larger datasets) I get more realistic results.  
Millie3days.JPG

Best, 
Anni


--
You received this message because you are subscribed to a topic in the Google Groups "ctmm R user group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ctmm-user/9ZGH86q2f84/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ctmm-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ctmm-user/1555e4d8-a4e3-4575-9395-04490f1810d8o%40googlegroups.com.


--
_________________________________________
Anne-Christine Auge



--
_________________________________________
Anne-Christine Auge

Christen Fleming

unread,
Sep 18, 2020, 12:54:26 AM9/18/20
to ctmm R user group
Hi Anni,

At some point I'd like to have continuous-acceleration models in the package to accept accelerometer data natively (after behavioral state switching models), but until then you can try to use it to classify and segment behaviors, etc., sure.

Best,
Chris
Reply all
Reply to author
Forward
0 new messages