Missing PDOP/DOP/HDOP/error values/calibration data and what to do with water locations

Luke Emerson

unread,

May 11, 2023, 5:32:07 AM5/11/23

to ctmm R user group

Hi Chris,

Great work on developing the ctmm package and all the useful support materials! Apologies if my questions have previously been answered but I couldn't find suitable answers in previous posts.

I am new to ctmm and working with movement data. I have been provided with GPS collar data for 109 pumas (with more to come!). I want to calculate aKDEs and eventually RSFs or SSFs. The median fix intervals is 1 to 12 hours but the vast majority are 1 to 3 hrs.

- Q1. I don't have calibration data and I only have PDOP values rather than HDOP values, is that a problem? A few individuals have DOP, rather than PDOP. Is it possible to create an error model without calibration data or is an error model not even necessary in my instance because of infrequent fix intervals?

Q2. How are PDOP/DOP/etc. values used in aKDE estimates and are they required?

- Q3. When converting the location data to a telemetry object using as.telemtry any individual that has both PDOP values and missing PDOP values results in the rows for the missing PDOP values being deleted in the resulting telemetry object. Whereas, if the individual has no PDOP values at all, then all rows are retained and a value of 100 is assigned. This is resulting in a huge loss of location data in some instances. Should I be attempting to keep all rows by assigning a mean PDOP value? E.g for individuals with some PDOP values and some missing values, I could calculate the mean for the existing PDOP values and assign to missing values. For individuals with no PDOP values at all, I could determine the mean PDOP across all individuals and assign this. Is there a problem with using this approach to initially retain all location/rows in dataset?

- The mean PDOP across the entire dataset is 3.85, sd = 4.70 and the range is 0.6 to 50.0, median 2.6. Whereas the mean PDOP of individual means is 3.9 sd = 0.40 range = 3.28 to 4.99, median = 3.9.

Q4. Some locations fall within waterbodies like oceans, rivers, lakes, etc. due to fix error rather than being outliers. Should these be left in the dataset when calculating home range area but removed for RSF/SSF analyses? I will set hard boundaries for the ocean areas when estimating home ranges, but when performing RSF/SSF, should inland waterbodies just be set to NA rather than removing the animal locations in the water?

Many thanks,

Luke

Christen Fleming

unread,

May 11, 2023, 1:22:56 PM5/11/23

to ctmm R user group

Hi Luke,

Either should get imported and it shouldn't be an issue. If you don't have calibration data, then you should specify a prior, which is covered in the vignette. One-hour puma data may not require an error model.
The error model influences the autocorrelation and variance estimates, which influences the bandwidth optimization. By default, it's also used for smoothing the data. In the next year we'll also have optimal down-weighting of the more erroneous location estimates.
That shouldn't be happening. Can you send me a minimal working example? The way that I originally coded it (and tested it) was for missing DOP values to get an assignment of 100 and a separate location class to account for that.
Yes, they can currently be used for akde() but not rsf.fit(), where they need to be removed. You should additionally set the water as NA or with offset() for the appropriate available area.

Best,

Chris

Luke Emerson

unread,

May 13, 2023, 1:06:18 AM5/13/23

to ctmm R user group

Hi Chris, thanks for the quick reply!

Below is some simulated data which shows what happens with different combinations of available and missing data per individual animal.

I have also included a second example of what happens if PDOP and DOP columns are included in the same dataframe before conversion to telemtry object using as.telemtry. I have done this because for some individuals I have PDOP and DOP values (never simultaneously for same location). I wasn't sure if PDOP/DOP is ever associated with location on data upload but I assume it isn't, so I probably need to use both PDOP and DOP values if they are equivalent. I can just copy the DOP values across into the empty PDOP cells as I have mostly PDOP values or am missing values for both PDOP and DOP.

Test1: When using just 'pdop' column, any rows not containing a pdop value are omitted when converted to a telemtry object using as.telemtry. All 'pdop' values are assigned to the new 'HDOP' column.

Test2: When I add a column to my dataframe called 'dop' before conversion to a telemtry object (because some animals have PDOP or DOP values), so the dataframe now contains a column called 'pdop' and another called 'dop', with no values simultaneously in both columns. When I convert the dataframe using as.telemtry it still removes all the rows that don't contain values in 'pdop', but it also removes the column called 'dop' and again adds a column called 'HDOP' but now all 'HDOP' cells are assigned a value of 100 and the 'class' column contains [NA-HDOP] in all cells. I guess you can't have more than one PDOP, DOP, HDOP column to begin with. Are PDOP and DOP equivalent for their use within ctmm?

# Load required packages

library(dplyr)
library(lubridate)

# Create example data
# Concatenate the vectors
ID <- c("Alex", "Alex", "Alex", "Alex", "Alex", "Alex", "Alex", "Alex", "Betty", "Betty", "Betty", "Betty", "Conrad", "Conrad", "Conrad", "Conrad")
timestamp <- as.POSIXct(c("2023-01-16 19:58:53", "2023-04-14 23:16:37", "2023-05-29 03:49:20", "2023-07-11 12:31:46", "2023-10-14 05:38:01", "2023-11-17 16:02:24", "2023-11-21 02:10:21", "2023-12-08 13:55:19", "2023-06-14 07:19:37", "2023-06-15 11:59:10", "2023-07-20 00:20:10", "2023-12-14 12:53:45", "2023-02-06 17:09:00", "2023-07-27 17:31:31", "2023-09-03 22:15:25", "2023-11-23 18:52:15"), tz = "US/Pacific")
latitude <- c(41.92803, 37.46088, 38.27921, 41.40507, 35.42060, 44.54504, 44.94270, 43.89539, 40.94142, 42.08530, 41.55706, 40.44066, 44.63024, 36.47114, 37.89160, 44.02299)
longitude <- c(-115.6718, -106.1859, -119.5077, -113.6364, -104.0907, -110.4441, -115.3675, -104.8308, -112.6231, -111.7091, -117.1440, -111.7255, -115.3393, -117.2239, -116.9511, -110.6808)
pdop <- c(NA, 1.4, 0.3, NA, 4.3, 2.3, NA, NA, 4.3, 0.7, 4.0, 1.4, NA, NA, NA, NA)
dop <- c(4.5, NA, NA, 3.8, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)

# Create the dataframe
test <- data.frame(ID, timestamp, latitude, longitude, pdop, dop)

# Display the dataframe
print(test)

## ############################################## ##
## Example of various combinations of PDOP values ##
## ############################################## ##

# This is how my data currently exists, i.e. without the dop column.

# Drop dop column for this test
test1 <- test[, 1:5]

# Display the dataframe
print(test1)

# Convert to telemetry object
test1_ctmm <- as.telemetry(test1, timeformat = "%Y-%m-%d %H:%M:%S", timezone = "US/Pacific", na.rm = "row", keep = T, drop = F)

# View each resulting telemtry object
# Alex - Contained pdop values and missing pdop values in original dataset
alex <- test1_ctmm$Alex
print(alex) # All 4 NA cells in pdop column are dropped. All remaining pdop cell values are assigned to HDOP column and class [HDOP] assigned

# Betty - No missing pdop values in original dataset
betty <- test1_ctmm$Betty
print(betty) # All rows retained because all pdop cells contain values. All pdop values assigned to HDOP column and class [HDOP] assigned

# Conrad - No pdop values in original dataset
conrad <- test1_ctmm$Conrad
print(conrad) # No rows dropped because all pdop cells did not have a value and so an HDOP value of 100 is assigned to all cells as is class [NA-HDOP]

## ############################################### ##
## Example of PDOP and DOP column in same datframe ##
## ############################################### ##

# I have pdop and dop values for some individuals (never simultaneously for same location) and below is what happens when both pdop and dop columns are included in dataframe when converting to telemtry object using as.telemtry.

# Rename dataframe with included dop column
test2 <- test

# Convert to telemetry object
test2_ctmm <- as.telemetry(test2, timeformat = "%Y-%m-%d %H:%M:%S", timezone = "US/Pacific", na.rm = "row", keep = T, drop = F)

# View each resulting telemtry object
# Alex - Contained pdop values and missing pdop values in original dataset
alex <- test2_ctmm$Alex
print(alex) # Again, all 4 NA cells in pdop column are dropped and dop column is dropped, but this time pdop cell values are not assigned and instead a HDOP value of 100 is assigned to all cells as is class [NA-HDOP].

# Betty - No missing pdop values in original dataset
betty <- test2_ctmm$Betty
print(betty) # All rows retained because all pdop cells contain values but like above, this time pdop cell values are not assigned and instead a HDOP value of 100 is assigned to all cells as is class [NA-HDOP].

# Conrad - No pdop values in original dataset
conrad <- test2_ctmm$Conrad
print(conrad) # No rows dropped because all pdop cells did not have a value and so an HDOP value of 100 is assigned to all cells as is class [NA-HDOP]

Cheers,

Luke

Message has been deleted

Luke Emerson

unread,

May 13, 2023, 7:48:21 PM5/13/23

to ctmm R user group

Hi Chris,

I realise what I have done wrong. I had na.rm = "row" rather than na.rm = "col". I misunderstood what na.rm = "col" would do. I thought it would remove the entire column if it contained some missing values, but it appears to populate the HDOP column with available pdop values and a value of 100 if the pdop value is missing. I had previously tried without the na.rm argument but I guess it is "row" by default.

"na.rm - If some values are NA in the data frame, are the rows (times) deleted or are the columns (data types) deleted"

In the case of having both a pdop and dop columns when na.rm = "col", as.telemetry ignores the 'pdop' column and just uses the 'dop' column and populates the HDOP column accordingly.

Are PDOP and DOP equivalent for the purpose of using them in ctmm?

Thanks,

Luke.

Christen Fleming

unread,

May 16, 2023, 1:27:47 AM5/16/23

to ctmm R user group

Hi Luke,

I have run your test and as.telemetry() is working as intended for me. If you have different tags/collar models - with some having DOP values and some having PDOP values, then please do not mix these data together before importing. It's very hard to sort out the data for just model tag/collar, given that there is no standard format. If you used different tags/collars on the same individual, then you can merge the telemetry objects with tbind() after importing.

If DOP and PDOP values are found in the same dataset, then it is assumed that these are two estimates from the same GPS module and they will both be present or NA. I have never run into a device that can calculate one type of DOP sometimes and another at other times - but please let me know if that happens to be the case for your tag/collar.

The PDOP value will be preferred over the DOP value, because it is a 'position' DOP value and the other is ambiguous. The ranking is detailed here: https://www.biorxiv.org/content/10.1101/2020.06.12.130195v2.abstract

The best DOP value found will be assigned to the HDOP column of the telemetry object, and NA values will be assigned a different location class and an HDOP value of 100 - a value which only impacts plotting the data without calibration. Ultimately the HDOP column of the telemetry object is what is used, and each location class gets a different calibration constant.

Best,

Chris

Luke Emerson

unread,

May 17, 2023, 12:04:13 AM5/17/23

to ctmm R user group

Thanks Chris!

This is all new to me and I have been provided with this data, so I am trying to avoid doing something wrong and having to redo it all. Can I please clarify my understanding?:

1. I don't have calibration data so I can't fit an error model using uere functions, therefore I need to perform 'Simultaneous fitting with uncalibrated data' and assign an error value of 10, i.e. GUESS <- ctmm.guess(data[[data$ID]],CTMM=ctmm(error=10),interactive=FALSE). Is this correct?

2. The GPS collar data was collected using Lotek (vast majority), Vectronic and unknown collar brand. I did not import the collar brand information because I don't have calibration data and can't create an error model, therefore I believed this info was irrelevant. Is that correct, or is the collar model information automatically used in subsequent calculations?

3. I have assumed all DOP values to be PDOP and I assigned them to my PDOP column before conversion using as.telemetry because they are ultimately all assigned to the HDOP column with the same class. However, you said in your previous response "If you have different tags/collar models - with some having DOP values and some having PDOP values, then please do not mix these data together before importing. It's very hard to sort out the data for just model tag/collar, given that there is no standard format."

Despite all my data being mixed up together before conversion to a telemetry object, everything has been assigned correct HDOP values and correct class types (i.e. [HDOP] or [NA-HDOP]). I did not include collar model information in my data. Does this mean that everything is fine and should work as intended for aKDE calculations, etc. or is collar brand information required and hence I need to split the data?

4. Six of the 109 individuals used both Lotek and Vectronic collars during monitoring. I put all the location data from both collars together before conversion to a telemetry object and did not include a collar brand column. But you said "If you used different tags/collars on the same individual, then you can merge the telemetry objects with tbind() after importing." Again, is it necessary to split the data per collar brand per individual if not creating calibrated error models based on collar model/brand?

5. If I am meant to add a collar brand column and split my data because ctmm automatically recognises collar type, what is the column meant to be called so it is recognised by ctmm? And should I convert datasets to telemetry objects in turn by converting all individuals using Lotek collars, then all Vectronic individuals, then all unknown collars, then combine telemetry objects for individuals that used more than one collar type and then combine all datasets into one list, then start removing outliers, model fitting, etc. on an amalgamated dataset?

Sorry for all the questions, I am just a bit confused and I have already converted all the data and completed outlier removal and was about to start model fitting, but I obviously don't want to progress any further if I have don't something wrong that will impact the aKDE estimates.

I really appreciate your advice.

Many thanks,

Luke. :)

Christen Fleming

unread,

May 17, 2023, 1:20:29 PM5/17/23

to ctmm R user group

Hi Luke,

I would supply a prior instead, which is also covered in the vignette. Supplying a prior for the NA-DOP values is trickier. You might consider re-assigning them the maximum DOP value or twice the maximum or something. The default value of 100 is only a placeholder that would be obviated by calibration.
If you don't have calibration data, then you don't necessarily need it.
This is fine because you won't be applying calibration data.
You can try it with one prior covering both model collars. An alternative method would be to supply one prior per model, but I doubt that would help much unless their errors are very different.
It's not that the model itself is recognized, but the structure of the data (like DOP, PDOP, etc.) is parsed out assuming one model is present in the data. If you had calibration data, then you would want to import and calibrate separately, then tbind() and go from there.

Best,

Chris

Luke Emerson

unread,

May 18, 2023, 4:19:12 AM5/18/23

to ctmm R user group

Thanks Chris. That really helps!

Just a few follow-up questions, sorry.

1. If re-assigning a max PDOP/DOP value to the missing values, should the HDOP column class be changed to [HDOP] rather than [NA-HDOP] (this would happen by default if done before conversion to a telemetry object)?

2. If I have an individual that has both PDOP values and missing values, would you recommend assigning the max PDOP value from that individual's dataset to that individual's missing values, or would you just assign the same maximum value to all individuals and missing values?

3. Mean PDOP/DOP for Lotek collars is 3.8 from 200 000 datapoints and 1.6 for Vectronic collars from 10 000 datapoints. Should that be considered very different for the purposes of assigning a different prior to collar type, or is accounting for this unlikely to make any appreciable difference to the aKDE estimates?

4. I don't know if my fixes are 2D or 3D, would you recommend assigning a prior of 10, or given the uncertainty, should a value of 20 be assigned, or something in between?

5. When assigning a prior do you recommend just using a DOF value of 2 as per the vignette (i.e. UERE$DOF[] <- 2), or is there something more that should guide this value/decision?

Cheers,

Luke.

Christen Fleming

unread,

May 18, 2023, 12:44:40 PM5/18/23

to ctmm R user group

Hi Luke,

You would want to do this before importing, to not have to mess with that.
I would use the maximum of all datasets of that model device. In fact, this is why I assign a fixed dummy value (of 100) and not the maximum of the imported dataset.
It's hard to say just from that, because the DOP values alone don't tell you the size of the errors in meters, and the two collars might not get the same satellite reception due to different antennas, etc..
I would center my prior on 10 for a generic fix with a DOP value.
DOF=2 is pretty loose, but if you can find a similar model (such as in the appendix of the paper I cited earlier) then you can use a tighter prior. Ultimately the prior should reflect your certainty/uncertainty in the correct value.