Error in if (CI[3] < UPPER) { : missing value where TRUE/FALSE needed

282 views
Skip to first unread message

maik.he...@gmail.com

unread,
Jan 17, 2020, 4:22:20 AM1/17/20
to ctmm R user group
Dear Chris,

I have a large dataset of deer telemetry data with hourly positions and at least 22 GPS positions per day. My aim is to estimate the distance travelled (kilometres/ day) for each individual and day. This already worked once with a subset of the red deer dataset of nearly 200 000 GPS positions. When I try to do the same for the rest of the dataset, I Keep running  into this error message: Error in if (CI[3] < UPPER) { : missing value where TRUE/FALSE needed, that apparently occurred in the Speed function.
With R Version 0.5.8 this error message already occurred on the eighth day of the first individual. I could not find any peculiarities in this part of dataset.
I changed the Version back to 0.5.6, with which it had already worked once for the above mentioned subset of the data, and the error message did not show up so early in the dataset, but it still occurred much later for a different individual and a different day. 
This is the function I used first:


ctmm_speed_deer4_neu = function(GPS.df)   
{
  # breakdown to datasets per deer
  id = unique(GPS.df$individual.local.identifier)
  n1 = length(id)
  GPS.ss = list()
  GPS.r=list()
 
  # This creates a date sequence at the desired sample interval
 
  for (i in 1:n1){
    id.df = GPS.df[GPS.df$individual.local.identifier== id[i],]
    ## this subsets data from one deer and date at a time
   
    id2= unique(id.df$ID_date)
    n_id=length(id2)
    # all the dates within an ID
   
    id.telemetry=as.telemetry(id.df,timeformat="",timezone="UTC", na.rm="row")
    # make a telemetry object for each individual
   
   
    #id.outlie=outlie(id.telemetry, plot=F)
    #id.telemetry=id.telemetry[which(id.outlie$distance<8400 & id.outlie$speed<0.945),]
    # remove outliers (identified by the histograms of the NPBW deer plus 5%)
   
    GUESS.id.telemetry<-ctmm.guess(id.telemetry, interactive=FALSE)
    FIT.id.telemetry<-ctmm.fit(id.telemetry,GUESS.id.telemetry)
 
    for (j in 1:n_id){
      id.date=id.df[id.df$ID_date==id2[j],]
      id_date_telemetry=as.telemetry(id.date,timeformat="",timezone="UTC", na.rm="row")
      Speeds.id.telemetry<-speed(id_date_telemetry, FIT.id.telemetry,level=0.95, cores=-1)
     
      GPS.ss[[j]] = data.frame(Animal_ID=levels(droplevels(unique(id.date$individual.local.identifier))),
                               ID_date=levels(droplevels(unique(id.date$ID_date))),
                               Day_of_year=as.numeric(strftime(unique(id.date$date),format="%j")),
                               Speed_estimate=Speeds.id.telemetry,
                               Distance=sum(id.date$steplength)/1000,
                               Forest_density_mean=mean(id.date$Forest_density),
                               Elevation_mean=mean(id.date$Elevation),
                               Aspect_mean=mean(id.date$Aspect),
                               Slope_mean=mean(id.date$Slope),
                               Ruggedness_mean=mean(id.date$Ruggedness),
                               Hunting_perc=length(id.date$hunting[id.date$hunting=="Hunting"])/length(id.date$hunting)*100,
                               Cropland_seasonal_perc=length(id.date$habitat_type2[id.date$habitat_type2=="Cropland seasonal"])/length(id.date$habitat_type2)*100,
                               Forest_broadleaved_perc=length(id.date$habitat_type2[id.date$habitat_type2=="Forest broadleaved"])/length(id.date$habitat_type2)*100,
                               Forest_coniferous_perc=length(id.date$habitat_type2[id.date$habitat_type2=="Forest coniferous"])/length(id.date$habitat_type2)*100,
                               Forest_mixed_perc=length(id.date$habitat_type2[id.date$habitat_type2=="Forest mixed"])/length(id.date$habitat_type2)*100,
                               Shrubland_perc=length(id.date$habitat_type2[id.date$habitat_type2=="Shrubland"])/length(id.date$habitat_type2)*100,
                               Grassland_perc=length(id.date$habitat_type2[id.date$habitat_type2=="Grassland"])/length(id.date$habitat_type2)*100,
                               Barren_perc=length(id.date$habitat_type2[id.date$habitat_type2=="Barren"])/length(id.date$habitat_type2)*100,
                               Wetland_perc=length(id.date$habitat_type2[id.date$habitat_type2=="Wetland"])/length(id.date$habitat_type2)*100,
                               Datapoints=length(id.date$habitat_type2),
                               Temp=unique(id.date$T.mitt..0.24Uhr),
                               #NDVI_modis_mean=mean(id.date$NDVI),
                               #NDVI_season_mean=mean(id.date$NDVI_season),
                               #NDVI_2month_mean=mean(id.date$NDVI_2month),
                               Distance_path_mean=mean(id.date$Distance_paths),
                               Distance_roads_mean=mean(id.date$Distance_roads),
                               Distance_villages_mean=mean(id.date$Distance_villages))
    }
    GPS.r[[i]]= GPS.ss[[1]]
    for(j in 2:n_id) {
      GPS.r[[i]] = rbind(GPS.r[[i]], GPS.ss[[j]])}
  }
  GPS.t = GPS.r[[1]]
  for(i in 2:n1) {
    GPS.t= rbind(GPS.t, GPS.r[[i]])}
  return(GPS.t)
}


The only thing I changed following the paper of Noonan et al. 2019 (https://doi.org/10.1186/s40462-019-0177-1) was to use ctmm.fit for the daily subsets instead of the whole datasets for each individual:


ctmm_speed_deer4_neu = function(GPS.df)   
{
  # breakdown to datasets per deer
  id = unique(GPS.df$individual.local.identifier)
  n1 = length(id)
  GPS.ss = list()
  GPS.r=list()
 
  # This creates a date sequence at the desired sample interval
 
  for (i in 1:n1){
    id.df = GPS.df[GPS.df$individual.local.identifier== id[i],]
    ## this subsets data from one deer and date at a time
   
    id2= unique(id.df$ID_date)
    n_id=length(id2)
    # all the dates within an ID
   
    id.telemetry=as.telemetry(id.df,timeformat="",timezone="UTC", na.rm="row")
    # make a telemetry object for each individual
   
   
    #id.outlie=outlie(id.telemetry, plot=F)
    #id.telemetry=id.telemetry[which(id.outlie$distance<8400 & id.outlie$speed<0.945),]
    # remove outliers (identified by the histograms of the NPBW deer plus 5%)
   
    GUESS.id.telemetry<-ctmm.guess(id.telemetry, interactive=FALSE)
 
    for (j in 1:n_id){
      id.date=id.df[id.df$ID_date==id2[j],]
      id_date_telemetry=as.telemetry(id.date,timeformat="",timezone="UTC", na.rm="row")
      FIT.id.telemetry<-ctmm.fit(id_date_telemetry,GUESS.id.telemetry)
      Speeds.id.telemetry<-speed(id_date_telemetry, FIT.id.telemetry,level=0.95, cores=-1)
     
      GPS.ss[[j]] = data.frame(Animal_ID=levels(droplevels(unique(id.date$individual.local.identifier))),
                               ID_date=levels(droplevels(unique(id.date$ID_date))),
                               Day_of_year=as.numeric(strftime(unique(id.date$date),format="%j")),
                               Speed_estimate=Speeds.id.telemetry,
                               Distance=sum(id.date$steplength)/1000,
                               Forest_density_mean=mean(id.date$Forest_density),
                               Elevation_mean=mean(id.date$Elevation),
                               Aspect_mean=mean(id.date$Aspect),
                               Slope_mean=mean(id.date$Slope),
                               Ruggedness_mean=mean(id.date$Ruggedness),
                               Hunting_perc=length(id.date$hunting[id.date$hunting=="Hunting"])/length(id.date$hunting)*100,
                               Cropland_seasonal_perc=length(id.date$habitat_type2[id.date$habitat_type2=="Cropland seasonal"])/length(id.date$habitat_type2)*100,
                               Forest_broadleaved_perc=length(id.date$habitat_type2[id.date$habitat_type2=="Forest broadleaved"])/length(id.date$habitat_type2)*100,
                               Forest_coniferous_perc=length(id.date$habitat_type2[id.date$habitat_type2=="Forest coniferous"])/length(id.date$habitat_type2)*100,
                               Forest_mixed_perc=length(id.date$habitat_type2[id.date$habitat_type2=="Forest mixed"])/length(id.date$habitat_type2)*100,
                               Shrubland_perc=length(id.date$habitat_type2[id.date$habitat_type2=="Shrubland"])/length(id.date$habitat_type2)*100,
                               Grassland_perc=length(id.date$habitat_type2[id.date$habitat_type2=="Grassland"])/length(id.date$habitat_type2)*100,
                               Barren_perc=length(id.date$habitat_type2[id.date$habitat_type2=="Barren"])/length(id.date$habitat_type2)*100,
                               Wetland_perc=length(id.date$habitat_type2[id.date$habitat_type2=="Wetland"])/length(id.date$habitat_type2)*100,
                               Datapoints=length(id.date$habitat_type2),
                               Temp=unique(id.date$T.mitt..0.24Uhr),
                               #NDVI_modis_mean=mean(id.date$NDVI),
                               #NDVI_season_mean=mean(id.date$NDVI_season),
                               #NDVI_2month_mean=mean(id.date$NDVI_2month),
                               Distance_path_mean=mean(id.date$Distance_paths),
                               Distance_roads_mean=mean(id.date$Distance_roads),
                               Distance_villages_mean=mean(id.date$Distance_villages))
    }
    GPS.r[[i]]= GPS.ss[[1]]
    for(j in 2:n_id) {
      GPS.r[[i]] = rbind(GPS.r[[i]], GPS.ss[[j]])}
  }
  GPS.t = GPS.r[[1]]
  for(i in 2:n1) {
    GPS.t= rbind(GPS.t, GPS.r[[i]])}
  return(GPS.t)
}

I also tried the older Version of the function with ctmm 0.5.8 and I still got the same error message, although again later in the dataset. For, now I am therefore all out of ideas...

Do you have any ideas or suggestions how to solve the issue?

Best,
Maik

Christen Fleming

unread,
Jan 17, 2020, 5:35:07 PM1/17/20
to ctmm R user group
Hi Maik,

If you could please message or email a copy of the "eight day" of this individual that causes this bug to manifest, then I will take a look at it.

Best,
Chris

maik.he...@gmail.com

unread,
Jan 23, 2020, 8:28:54 AM1/23/20
to ctmm R user group
Hi Chris,

Did you get my email? I sent it to 'hfle...@umd.edu'. 

Best,
Maik

Christen Fleming

unread,
Jan 23, 2020, 11:00:03 AM1/23/20
to ctmm R user group
Hi Maik,

I have it. Thanks. I just have a backlog of work piling up. It might take me a couple of days.

Best,
Chris

maik.he...@gmail.com

unread,
Jan 23, 2020, 12:55:59 PM1/23/20
to ctmm R user group
Hi Chris,

No problem, thank you for your help!

Best,
Maik

Christen Fleming

unread,
Jan 28, 2020, 4:16:26 AM1/28/20
to ctmm R user group
Hi Maik,

I've fixed the bug on GitHub, which you can install via devtools::install_github("ctmm-initiative/ctmm")

These data are borderline continuous velocity and so you should probably use ctmm.select() instead of ctmm.fit() and robust=TRUE in speed().
You might also take a look at vignette('error'), which can sometimes help with estimating speed/distance.

Best,
Chris

maik.he...@gmail.com

unread,
Jan 28, 2020, 7:03:00 AM1/28/20
to ctmm R user group
Hi Chris,

Thanks a lot for solving this issue and your tips! 


I have two more issues for which you help would be highly appreciated:

- When using ctmm.select for the daily subsets instead of the whole data set for each individual, the resulting speed estimates contain a lot of infinite values even when using robust=T for the Speed function. Would it be wrong to fit the model to the whole dataset for each individual instead again? There might be gaps of some days in the dataset. 
- I tried to incorporate a 10 m GPS error following the guidelines in the vignette, but then I got the error: " Error in eigen(COV, only.values = TRUE) : infinite or missing values in 'x'".

Best, 
Maik 

Christen Fleming

unread,
Jan 28, 2020, 7:02:58 PM1/28/20
to ctmm R user group
Hi Maik,

  1. You are getting(0,Inf) mostly when you can't select a model with measurable velocity. robust=TRUE can't help this. This could happen from having too little data. If you the movement is similar from day to day, then you can use the model fitted to the whole dataset. As a middle ground you can also consider two windowsa small one for the data to condition on and a larger one to estimate the movement model with.
  2. I just fit your first individual that you sent me and didn't get any errors. Can you post the code you used to get this error?

Best,
Chris

maik.he...@gmail.com

unread,
Jan 29, 2020, 6:24:54 AM1/29/20
to ctmm R user group
Hi Chris,

Thanks  again for your help!

Regarding 1., I have at least 22 hourly positions per day. Many of the days with infinite speed estimates also have 24 GPS positions, so it is probably not connected with having too little data. 
Since my aim is to figure out how the distance travelled differs between days based on the environmental variables included in the function, it would make little sense to assume from the beginning that movement is similar from day to day. It would probably be best to stick to using ctmm.guess on the whole dataset for each individual and ctmm.fit for each day, like in the paper. I will just have to exclude days with infinite values.

Regarding 2., this is the code I used:


ctmm_speed_deer4_neu = function(GPS.df)   
{
  # breakdown to datasets per deer
  id = unique(GPS.df$individual.local.identifier)
  n1 = length(id)
  GPS.ss = list()
  GPS.r=list()
 
  # This creates a date sequence at the desired sample interval
 
  for (i in 1:n1){
    id.df = GPS.df[GPS.df$individual.local.identifier== id[i],]
    ## this subsets data from one deer and date at a time
   
    id2= unique(id.df$ID_date)
    n_id=length(id2)
    # all the dates within an ID
   
    id.telemetry=as.telemetry(id.df,timeformat="",timezone="UTC", na.rm="row")
    # make a telemetry object for each individual
   
   
    #id.outlie=outlie(id.telemetry, plot=F)
    #id.telemetry=id.telemetry[which(id.outlie$distance<8400 & id.outlie$speed<0.945),]
    # remove outliers (identified by the histograms of the NPBW deer plus 5%)
   
    GUESS.id.telemetry<-ctmm.guess(id.telemetry, interactive=FALSE)
    GUESS.id.telemetry$error<-10

   
    for (j in 1:n_id){
      id.date=id.df[id.df$ID_date==id2[j],]
      id_date_telemetry=as.telemetry(id.date,timeformat="",timezone="UTC", na.rm="row")
      FIT.id.telemetry<-ctmm.select(id_date_telemetry,GUESS.id.telemetry, trace=T, cores=20)
      Speeds.id.telemetry<-speed(id_date_telemetry, FIT.id.telemetry,level=0.95, robust=T)
     
      GPS.ss[[j]] = data.frame(Animal_ID=levels(droplevels(unique(id.date$individual.local.identifier))),
                               Name=levels(droplevels(unique(id.date$name))),
                               ID_date=levels(droplevels(unique(id.date$ID_date))),
                               Sex=levels(droplevels(unique(id.date$sex))),
                               Day_of_year=as.numeric(strftime(unique(id.date$date),format="%j")),
                               Month=levels(droplevels(unique(id.date$month))),
                               Year=unique(id.date$year),
                               Speed_estimate=Speeds.id.telemetry,
                               Forest_density_mean=mean(id.date$Forest_density),
                               Elevation_mean=mean(id.date$Elevation),
                               Aspect_mean=mean(id.date$Aspect),
                               Ruggedness_mean=mean(id.date$Ruggedness),
                               Hunting_perc=length(id.date$hunting[id.date$hunting=="Hunting"])/length(id.date$hunting)*100,
                               Cropland_seasonal_perc=length(id.date$habitat_type2[id.date$habitat_type2=="Cropland seasonal"])/length(id.date$habitat_type2)*100,
                               Forest_broadleaved_perc=length(id.date$habitat_type2[id.date$habitat_type2=="Forest broadleaved"])/length(id.date$habitat_type2)*100,
                               Forest_coniferous_perc=length(id.date$habitat_type2[id.date$habitat_type2=="Forest coniferous"])/length(id.date$habitat_type2)*100,
                               Forest_mixed_perc=length(id.date$habitat_type2[id.date$habitat_type2=="Forest mixed"])/length(id.date$habitat_type2)*100,
                               Shrubland_perc=length(id.date$habitat_type2[id.date$habitat_type2=="Shrubland"])/length(id.date$habitat_type2)*100,
                               Grassland_perc=length(id.date$habitat_type2[id.date$habitat_type2=="Grassland"])/length(id.date$habitat_type2)*100,
                               Barren_perc=length(id.date$habitat_type2[id.date$habitat_type2=="Barren"])/length(id.date$habitat_type2)*100,
                               Wetland_perc=length(id.date$habitat_type2[id.date$habitat_type2=="Wetland"])/length(id.date$habitat_type2)*100,
                               Datapoints=length(id.date$habitat_type2),
                               NDVI_modis_mean=mean(id.df$NDVI_modis),

                               Distance_path_mean=mean(id.date$Distance_paths),
                               Distance_roads_mean=mean(id.date$Distance_roads),
                               Distance_villages_mean=mean(id.date$Distance_villages))
    }
    GPS.r[[i]]= GPS.ss[[1]]
    for(j in 2:n_id) {
      GPS.r[[i]] = rbind(GPS.r[[i]], GPS.ss[[j]])}
  }
  GPS.t = GPS.r[[1]]
  for(i in 2:n1) {
    GPS.t= rbind(GPS.t, GPS.r[[i]])}
  return(GPS.t)
}


I tried it again on a different computer and it worked, even producing less infinite values than without the error term, so the error message  might be connected to some version of a dependent package. I will let you know if it reappears or if I find out more about why it appeared. 


Best,
Maik

Christen Fleming

unread,
Jan 29, 2020, 7:45:41 AM1/29/20
to ctmm R user group
Hi Maik,

I wouldn't trust simultaneous fitting of the error parameter on samples that small. You might want to check that the error parameter estimates come out to be consistent and plausible across days. Alternatively, you can fix the errors on the data with the uere()<- command.

With platform-dependent bugs, try turning off the parallelization (cores). Often that creates very weird and platform-dependent problems.

Best,
Chris

maik.he...@gmail.com

unread,
Jan 30, 2020, 10:37:56 AM1/30/20
to ctmm R user group
Hi Chris,

I am still stuck with this error on the fastest computer cluster that  I have available, also after setting cores=1:  
Error in eigen(COV, only.values = TRUE) :
  infinite or missing values in 'x Do you have any other ideas how to solve this?
 Thanks a lot again in advance! 

Best,
Maik

Christen Fleming

unread,
Feb 3, 2020, 9:41:11 AM2/3/20
to ctmm R user group
Hi Maik,

Sorry this is taking longer than expected. There are multiple, interconnected issues that I have to solve.

Best,
Chris

Christen Fleming

unread,
Feb 4, 2020, 4:38:33 AM2/4/20
to ctmm...@googlegroups.com
Hi Maik,

Sorry for the delay. There were a number of issues here, now addressed in the GitHub version of the package, which you can install via  devtools::install_github("ctmm-initiative/ctmm")
  1. This particular error came from the default argument fast=TRUE failing in speed(). fast=TRUE invokes the CLT and one of your parameter estimates (a location variance) was on a boundary. I have now included a stop() message to inform users when this happens, so that they can switch to fast=FALSE.
  2. The collapsed variance estimate (no motion along the short axis of the distribution) appears very exacerbated by using AIC on such small sample sizes. AIC/BIC is not really reliable on samples this small. The better alternative for speed/distance estimation is probably "LOOCV", but with 24 data points, this is going to run 24 times slower. With a different dataset, you might also try IC=NA,MSPE='velocity', but without always picking up velocity continuity, I suspect this will not be reliable here.
  3. The way I had things coded, with a collapsing location variance parameter estimate and bad initial guess on its orientation, the optimizer could get stuck. I restructured this code so that this can't happen. However, if you are simultaneously estimating location error (rather than using calibration data or a plausible guess), then I can foresee a scenario where the optimizer can get stuck if both location variances collapse and all variance is explained by location error.
  4. With my limited testing, the simultaneously estimated location error estimates were all over the place and not trustworthy. I would, instead, fix the error estimates and see what the sensitivity is.
Best,
Chris

On Thursday, January 30, 2020 at 10:37:56 AM UTC-5, maik.he...@gmail.com wrote:

maik.he...@gmail.com

unread,
Feb 4, 2020, 1:10:52 PM2/4/20
to ctmm R user group
Hi Chris,

Thanks a lot for the effort! I adjusted my function, keeping in mind  your recommendations.


ctmm_speed_deer4_neu = function(GPS.df)   
{
  # breakdown to datasets per deer
  id = unique(GPS.df$individual.local.identifier)
  n1 = length(id)
  GPS.ss = list()
  GPS.r=list()
 
  # This creates a date sequence at the desired sample interval
 
  for (i in 1:n1){
    id.df = GPS.df[GPS.df$individual.local.identifier== id[i],]
    ## this subsets data from one deer and date at a time
   
    id2= unique(id.df$ID_date)
    n_id=length(id2)
    # all the dates within an ID
   
    id.telemetry=as.telemetry(id.df,timeformat="",timezone="UTC", na.rm="row")
    # make a telemetry object for each individual
   
   
    #id.outlie=outlie(id.telemetry, plot=F)
    #id.telemetry=id.telemetry[which(id.outlie$distance<8400 & id.outlie$speed<0.945),]
    # remove outliers (identified by the histograms of the NPBW deer plus 5%)
   
    #GUESS.id.telemetry<-ctmm.guess(id.telemetry,  interactive=FALSE)
    uere(id.telemetry)<-10
    GUESS.id.telemetry<-ctmm.guess(id.telemetry,  CTMM=ctmm(error = TRUE), interactive=FALSE)
   
    library(ctmm)

    for (j in 1:n_id){
      id.date=id.df[id.df$ID_date==id2[j],]
      id_date_telemetry=as.telemetry(id.date,timeformat="",timezone="UTC", na.rm="row")
      FIT.id.telemetry<-ctmm.select(id_date_telemetry,GUESS.id.telemetry, method="pHREML", IC="LOOCV", control=list(method='pNewton'),cores=1)
      Speeds.id.telemetry<-speed(id_date_telemetry, FIT.id.telemetry,level=0.95, fast=FALSE, robust=TRUE, cores=1)



However, I am getting another error now:

Error in nant(t(P) %*% sForP, 0) + error[i, , ] : non-conformable Arrays The traceback shows this: 
20.
kalman(z, u, dt = dt, CTMM = CTMM, error = error)
19.
ctmm.loglike(data, CTMM, REML = REML, zero = -zero, profile = profile)
18.
fn(parscale * par)
17.
func(x - (i == (1:n)) * h, ...)
16.
genD.default(func, par, method.args = method.args, ...)
15.
numDeriv::genD(func, par, method.args = method.args, ...)
14.
genD(par = pars, fn = fn, zero = -CTMM$loglike, lower = lower,
    upper = upper, parscale = parscale, Richardson = 2, mc.cores = 1)
13.
ctmm.fit(object, CTMM, method = CTMM$method, COV = FALSE, ...)
12.
emulate.telemetry(data, CTMM, fast = fast, ...)
11.
emulate.ctmm(CTMM, data = data, fast = fast, ...)
10.
emulate(CTMM, data = data, fast = fast, ...)
9.
speed.rand(object, data = data, prior = prior, fast = fast, cor.min = cor.min,
    dt.max = dt.max, error = error, precompute = precompute,
    DT = DT, ...)
8.
FUN(X[[i]], ...)
7.
lapply(X, FUN, ...)
6.
plapply(1:cores, spd.fn, cores = cores, fast = FALSE)
5.
unlist(plapply(1:cores, spd.fn, cores = cores, fast = FALSE))
4.
speed.ctmm(CTMM, data = object, level = level, robust = robust,
    units = units, prior = prior, fast = fast, cor.min = cor.min,
    dt.max = dt.max, error = error, cores = cores, ...)
3.
speed.telemetry(id_date_telemetry, FIT.id.telemetry, level = 0.95,
    fast = FALSE, robust = TRUE, cores = 1)
2.
speed(id_date_telemetry, FIT.id.telemetry, level = 0.95, fast = FALSE,
    robust = TRUE, cores = 1)
1.
ctmm_speed_deer4_neu(roe_deer_1h_fulldays_ord[1:494, ])
Best,
Maik

Christen Fleming

unread,
Feb 4, 2020, 9:11:36 PM2/4/20
to ctmm R user group
Hi Maik,

You are missing a line in the inner loop like

uere(id_date_telemetry) <- 10

But I will take a look at this regardless. I don't want bugs happening under any circumstance, and that looks like something that shouldn't be happening.

Best,
Chris

Christen Fleming

unread,
Feb 7, 2020, 1:27:58 AM2/7/20
to ctmm R user group
Hi Maik,

I have fixed that bug (and a couple others) on GitHub. This dataset has been great for uncovering bugs and I'm going to continue to bug hunt with it, at least for a few more days. With the tiny amount of data, the likelihood function is very ugly + initial guesses are usually way off + tau_velocity & location variance parameters are often collapsing to zero = lots of numerical challenges. Its probably not good for daily speed estimation, though. You might consider expanding your window to more than 1 day.

Best,
Chris

maik.he...@gmail.com

unread,
Feb 10, 2020, 5:44:43 AM2/10/20
to ctmm R user group
Dear Chris,

Thanks a lot, it really is a great help!
The only downside with a larger time window is for me, that I won't be able to model the results as daily speed estimates together with the environmental variables of the GPS positions of these days. Do you have any idea if it is still possible to achieve this somehow?

Best,
Maik



Am Freitag, 7. Februar 2020 07:27:58 UTC+1 schrieb Christen Fleming:
Hi Maik,

I have fixed that bug (and a couple others) on GitHub. This dataset has been great for uncovering bugs and I'm going to continue to bug hunt with it, at least for a few more days. With the tiny amount of data, the likelihood function is very ugly + initial guesses are usually way off + tau_velocity & location variance parameters are often collapsing to zero = lots of numerical challenges. Its probably not good for daily speed estimation, though. You might consider expanding your window to more than 1 day.

Best,
Chris

On Tuesday, February 4, 2020 at 9:11:36 PM UTC-5, Christen Fleming wrote:
Hi Maik,

You are missing a line in the inner loop like

uere(id_date_telemetry) <- 10

But I will take a look at this regardless. I don't want bugs happening under any circumstance, and that looks like something that shouldn't be happening.

Best,
Chris

Christen Fleming

unread,
Feb 10, 2020, 10:15:12 AM2/10/20
to ctmm R user group
Hi Maik,

You could extend the window of the movement model slightly or window the environmental variables commensurately.

Best,
Chris

maik.he...@gmail.com

unread,
Feb 11, 2020, 12:59:13 PM2/11/20
to ctmm R user group
Hi Chris, 

I tried extending the time window for model fitting and speed estimation from one day to one week, but this did not change the large number of infinite values (6 out of 8 in the test dataset). 
Any other ideas for improving the results?

Thanks a lot in advance again!


Best,
Maik

Christen Fleming

unread,
Feb 11, 2020, 1:22:41 PM2/11/20
to ctmm R user group
Hi Maik,

You might try improving your error model, which would involve testing any other data columns (like DOP values or number-of-satellites) with calibration data, as in vignette('error').

Aside from that, I'm not sure much more can be done until we have the hierarchical models implemented. And even then, the improvement may be marginal.

Not all datasets are appropriate for estimating speed/distance. One feature I am in the middle of putting into the package is diffusion rate estimation, which is an alternative to speed/distance for datasets that are too coarse for that.

Best,
Chris

maik.he...@gmail.com

unread,
Feb 17, 2020, 9:06:27 AM2/17/20
to ctmm R user group
Hi Chris,

I tried to run the function on the whole dataset now and encountered a "new" error in the 14th individual:


Minimum sampling interval of 29.3 minutes in 14
  |==================================================================================                                                           |  58%Error in abs(vec[c(lo, hi)]) :
  non-numeric argument to mathematical function
In addition: There were 50 or more warnings (use warnings() to see the first 50) Any help is again greatly appreciated! Best, Maik

Christen Fleming

unread,
Feb 17, 2020, 9:07:37 PM2/17/20
to ctmm R user group
Hi Maik,

I am at the end of re-writing the line search algorithm in the optimizer, hoping to speed up the model fitting on your data. After that is working correctly, I will take a look at this individual.

Best,
Chris

maik.he...@gmail.com

unread,
Apr 21, 2020, 1:44:16 PM4/21/20
to ctmm R user group
Dear Chris,

Do you think it would be fine to use fast=TRUE for the speed function without introducing too much bias?
It just took three weeks to do the computations for seventeen out of 170 individuals (before it terminated early because of server problems) and I really cannot wait several months for it to finnish. 

Thanks a lot and best,

Maik



Christen Fleming

unread,
Apr 21, 2020, 8:30:18 PM4/21/20
to ctmm R user group
Hi Maik,

Just looping through the cases where fast=TRUE works might be a good compromise.

I have some extra optimizer work that I'd like to do at some point, but I can't say when I will be able to get around to that.

Best,
Chris

maik.he...@gmail.com

unread,
Apr 24, 2020, 4:04:17 AM4/24/20
to ctmm R user group
Hi Chris,

Thanks a lot!

Which condition can use in an if statement with regard to the telemetry object to make sure that fast=TRUE works and I do not get an error like: fast=TRUE (CLT) not possible when minor = 0? Best, Maik

Christen Fleming

unread,
Apr 24, 2020, 5:37:26 AM4/24/20
to ctmm R user group
Hi Maik,

You will want to wrap the command in try(), and will probably need to add some logic for how you want to skip the problematic cases.

Best,
Chris

maik.he...@gmail.com

unread,
May 18, 2020, 5:12:25 AM5/18/20
to ctmm R user group

Hi Chris,

while I am waiting for the processing of my data, another question came to my mind: 
When I have monthly speed estimates with confidence intervals from different individuals and different years, how can I compute population-level confidence intervals for each month? 

Thanks again,

Maik

Christen Fleming

unread,
May 18, 2020, 5:33:24 AM5/18/20
to ctmm R user group
Hi Maik,

For the time being, you can do this with the metafor package. I'm actually working on code right now to be able to do that kind of analysis within ctmm.

Best,
Chris

maik.he...@gmail.com

unread,
May 26, 2020, 4:56:42 AM5/26/20
to ctmm R user group
Thank you, Chris!

One more question about this: Since I need the standard errors instead of the confidence intervals for the metafor package, how can I obtain these from the confidence intervals which the speed function provides me?

Best,

Maik

Christen Fleming

unread,
May 26, 2020, 9:38:11 AM5/26/20
to ctmm...@googlegroups.com
Hi Maik,

Let's see... I think simplest thing to do with what you have would be to log transform the speed CIs, approximate with normal CIs, and do meta analysis on log(speed) to make them a (-Inf,Inf) random variable, rather than (0,Inf), as I'm pretty sure metafor relies on a normal assumption.

level <- 0.95 # default
alpha <- (1-level)/2 # 1-tail probability
CI <- log(CI) # log transformed CI: c(low, est, high)
PE <- CI[2] # point estimate of log(speed)
SE <- (CI[3]-CI[1]) / (qnorm(1-alpha)-qnorm(alpha)) # approximate standard error of log(speed)

This will be much easier and more exact in the next big package update.

Best,
Chris

maik.he...@gmail.com

unread,
Jun 4, 2020, 6:54:24 AM6/4/20
to ctmm R user group
Hi Chris,

Thanks, this helped a lot!
I just had a look at the first roe deer speed estimations which were computed until now. Out of 131 estimates without an Inf value, four had unrealistically high estimates between 346 (CI: 257-423) km/ day and  622 (CI: 470-835) km/day.This happened despite setting fast=FALSE and robust=TRUE.
Can I trust the other estimates (which look fine) and what kind of solution would you suggest?
Would it be an option to just remove estimates where the range of the confidence interval is above a certain threshold (e.g. 15 km/day)?

Best,

Maik


Christen Fleming

unread,
Jun 4, 2020, 11:01:44 AM6/4/20
to ctmm R user group
Hi Maik,

My leading suspicion would be that this is related to outliers . Have you run your data through outlie() to see what comes out. If its outliers, with this much data, you might consider automating (in a loop over individuals) some kind of outlier rejection threshold, kind of like in vignette('error') but with plot=FALSE for speed.

The second thing I would check is DOF[speed]. Bias in speed estimates due to coarseness is positive and on the order of 1/DOF[speed]. So you might also want to reject estimates with the estimated DOF below 10-20 to keep the bias in the 5-10% range.

Let me know if this doesn't address the issue.

Best,
Chris

maik.he...@gmail.com

unread,
Jun 5, 2020, 2:59:44 AM6/5/20
to ctmm R user group
Hi Chris,

I checked the roe deer data set again and there are no obvious outliers (highest speed  in the outlier table 28 m/s and highest distance 42000 m with continuous values before them) . 

Regarding Point 2 are you referring to summary(fitted.model)$DOF[3]? If this is the only way to limit bias I will have to do the computations, which already took weeks, again. 


One more question regarding the previous problem with the standard errors: Alpha should be level/2 instead of (1-level/2), right? Otherwise the result is a negative number. Regardless, the results for the standars errors are really large. For example, if the range between the upper and the lower bounds of the confidence interval is only 0.33, I get a standard error of the log speed  of 2.75. Is this correct?

Thanks again and best,

Maik

Christen Fleming

unread,
Jun 5, 2020, 2:09:54 PM6/5/20
to ctmm R user group
Hi Maik,

Are you sure a roe deer can travel at 28 m/s (101 km/hr) for the full duration of the sampling interval (1 hour)? And assuming an animal was doing that (not a roe deer), then I don't think 300-600 km/day would be that unrealistic. For roe deer, though, that sounds like there are some outliers hat need to be taken care of.

You can approximate DOF from the CIs. Doing some math... the relation would be

DOF <- PE^2/(2*SE^2)

I edited the formula for alpha, it should be

alpha <- (1-level)/2 # 1-tail probability

So for level=0.95, the total tail probability is 0.05 and one side is alpha=0.025.

Best,
Chris

Christen Fleming

unread,
Jun 5, 2020, 2:11:51 PM6/5/20
to ctmm R user group
One more thing, the DOF formula there is for the estimates without the log transform.

maik.he...@gmail.com

unread,
Jun 6, 2020, 6:20:26 AM6/6/20
to ctmm R user group
Thanks a lot Chris, this is very helpful!

maik.he...@gmail.com

unread,
Jun 9, 2020, 9:27:25 AM6/9/20
to ctmm R user group
Hi Chris,

I had a thorough look at these outliers again. Many of them are caused by positions that were recorded out of schedule and only a few seconds later than the previous position, but the location was several metres away. Interestingly, these cases are not conncected to the extreme daily movement speeds, which I found in the ctmm result table. For example, the most extreme straight line distance speed in the roe deer dataset (266 m in 0.15 min) lies within a month with a very reasonable ctmm speed estimate (1.78 km/day, CI: 1.37-2.46). While it is clear that these positions do not make much sense, they do not seem to have an influence on the ctmm speed estimates. 
On the other hand, there are no obvious outliers of straight line distances in the months with the extreme ctmm daily speed estimates (346-622 km/ day). The maximum SLD speeds in these months are nearly always lower than 0.09 m/s and only reached 1.3 m/s in one of these months.
Filtering by DOF also does not help, since the point estimate is very large along with the confidence interval. 
What else can I check?

Thanks again!

Best,
Maik

Christen Fleming

unread,
Jun 9, 2020, 11:29:02 AM6/9/20
to ctmm R user group
Hi Maik,

If you feed calibrated data into outlie() or if the default arguments for outlie() are reasonable for the error, then locations estimated a few seconds apart, but within a few meters of distance apart (within the range of error), should not be causing high speed estimates in outlie().

If you can send me a dataset (and MWE script) that looks clean but results in large speed() estimates, then I will take a closer look at it.

Best,
Chris

Christen Fleming

unread,
Jun 11, 2020, 6:55:12 PM6/11/20
to ctmm R user group
Hi Maik,

I found the problem. You want units=FALSE in speed() for everything to be in the same units. The individuals with speeds like 346 (CI: 257-423) are given in units of meters/day and not kilometers/day, which is information stored in the row name. units=FALSE outputs SI units instead of pretty units.

Best,
Chris

On Thursday, June 4, 2020 at 6:54:24 AM UTC-4, maik.he...@gmail.com wrote:
Reply all
Reply to author
Forward
0 new messages