RSF with average ranges

583 views
Skip to first unread message

Michael Taylor

unread,
Apr 27, 2023, 4:41:56 AM4/27/23
to ctmm R user group

Hi Chris et al.,

I am working to calculate RSFs for a group of marine turtles (n = 20) tracked for between 4-13 months.

I aim is to understanding how changing coverage of key habitats effects suitability/distribution across a region (plus a few other variables). I have a mix of categorical (e.g. depth) and continuous (e.g. seagrass cover) predictor layers that I am planning to include in the final analysis.

 I have managed to run RSFs on a test dataset, but I have a few queries.

(1) Averaging ranges - Some individuals (~50%) in my population undergo range shifts so I segmented the data and calculated multiple AKDEs. There was an answer somewhere (maybe this one) that suggested that to return the full range of the individual range I need to run mean with weighting. So to calculate the mean AKDE for my sample population prior to running RSF I assume I have to:

a.       First run mean() for each individual that uses multiple ranges and weight this mean by the time spent in in each range

b.       Then run mean() across all individuals and weight by the tracking duration fo each individual.

(2) Rsf vs rsf.select - As I am using a mean AKDE I also think I need to use rsf.select() instead of rsf.fit (mentioned here)

(3) Viewing points used in rsf calculations – is it possible to plot the points that have been used in the rsf calculations? For my understanding of the rsf calculation (and so that I can explain ot my supervisors/collaborators) It would be great to visualise the chosen presences, available locations, and available area.

(4) Plotting responses – based on this response it seems that diagnostic and residual plots are on the cards at some point. In the meantime is there a simple way to extract the required parts of the rsf.fit to plot model outputs from the summary? I haven't done much manual plotting of models and would love to be able to create plots similar to those attached.

Cheers,

Mike

ExamplePlots.png

Christen Fleming

unread,
Apr 27, 2023, 3:26:48 PM4/27/23
to ctmm R user group
Hi Mike,

  1. I have only coded rsf.fit() to work on stationary autocorrelation models, so you would want to estimate two AKDEs, fit two RSF models, and then mean() the two RSF models. I think that should work, as coded, but let me know if there any issues.
  2. Since you would be applying mean() to RSF models, you would want to use rsf.select() or be very careful about not including unsupported model features/parameters.
  3. The integrated RSFs fit by default in ctmm have a Gaussian "availability" model that is estimated. I think you could plot it along with the data, as you would an autocorrelation model. If Monte-Carlo integration is used, then the "available" points are randomly sampled from the Gaussian. If Riemann integration is used, then the whole grid is used, but each point is weighted by the Gaussian. We have a paper in the works that explains the integrated model, and why its better than the conventional two-stage fitting.
  4. I'm currently working on some diagnostic plotting functions that would help compare models and data.
Best,
Chris

Michael Taylor

unread,
May 2, 2023, 10:21:12 PM5/2/23
to ctmm R user group
Hi Chris,

Thanks as always for your help! I have some queries to make sure I am understanding correctly.

1. I think I understand, but just to be sure on the order I should be running rsf, mean, and suitability do you mind providing comment on the example work flow below?

From what I understand, the suggestion is to run an rsf for each individual with a single range and for each individual with two or more range resident period I would run an rsf for each range and take the mean of those to provide a single rsf for that individual? For example in a population with five individuals A-E (A uses two ranges, B-D use a single range, and E uses three) we would therefore run:
  • two rsfs for individual A and run mean to get a single rsf for A
  • one rsf for each of B-D
  • three rsfs  for individual E and run mean to get  single rsf for E 
This would result in one rsf per individual and from there we could either run (a) mean across all individuals (weighted by the duration each individual was range resident) followed by suitability to get the suitability of the average animal or (b) run suitability per individual followed by mean to get the mean suitability for this population. Below is a schematic workflow I have put together to try and help my brain understand the process.
meanRSF_workflow.png
2. rsf.select sounds like the right thing to me for option a, but maybe not for option b?

3. I think visualising the "used" and "available" points that rsf selects would be useful for understanding how the models work and ensuring predictor layers are not too coarse. When you talk about the monte-carlo sampling from within the gaussian does that mean from within the AKDE range (i.e. within the 95% extent)? and then in the reimann is the grid the entire predicted landscape but weighted by the AKDE output? If the later is the case would processing speed be affected by the size of the predictor layers? For example I have 20 individuals within a landscape of ~ 8,500 km², with 95% ranges of each individual being ~2-50 km² and covering ~650 km². Would the monte-carlo method use random available points within the ~650 km² area of the AKDE outputs, and the reimann method select available points from anywhere within the ~8,500 km² area (but weighted by the gaussian)?

4. I am keen to see the diagnostic/residual plots when they are ready or work out a way to visualise models myself in the meantime. Is there a way to isolate the values myself or extract the values from rsf.fit to plot the relationships between predictors and used/available?

Do you have a pre-print explaining how the rsf models are created? I have read the alston 2023 paper, but am a little confused about the underlying models used in rsf fit/select and the methods that could be used to plot/explain results. The amt rsf methods (and most others) seem to use a wrapper of stats::glm with   binomial(link = "logit")  is this similar in ctmm? 

Thanks again and sorry for the multitude of questions!
Cheers,

Mike

Christen Fleming

unread,
May 3, 2023, 4:10:58 PM5/3/23
to ctmm R user group
Hi Mike,

When averaging over all individuals, you would not want to weight by tracking duration. The weights in the per-individual averaging are serving as estimates of behavioral frequencies, just in case the selection behaviors differ.

Otherwise, (a) is what you want. I haven't coded suitability() to accept a list of RSFs.

3. The AKDE is not used to define the available area. This would be similar to the conventional procedure, but that has many disadvantages. When the contours of an "available area" are fit to the data, then they take explanatory power from the selection parameters, and their uncertainty is not propagated forward. ctmm uses an integrated RSF model, where there are Gaussian parameters fit to the data simultaneous with the selection parameters. The Gaussian parameters define a smooth "available area".
The Riemann integration code is smart enough not to use the entire raster if it's not necessary.

4. It's complicated, because you have to estimate the probabilities with and without selection and then contrast the two to estimate what the selection actually looks like.

5. When using binomial(), you have a model of presences and absences, but you don't have absence data and substitute them with pseudo-absences from the defined available area. ctmm uses an inhomogeneous Poisson point process model (iPPP) which is a model of presences that is much more appropriate for presence-only data. In addition to Alston et al. (2023), other ecological references would be the references therein to weighted-Poisson logistic regression in the SDM literature, and various papers out of Mevin Hooten's group on iPPPs.

Best,
Chris

Michael Taylor

unread,
May 4, 2023, 9:43:11 PM5/4/23
to ctmm R user group
Thanks for the clarification on workflow and weighting Chris! Its also good to learn a little more about how the available points are chosen and the models behind your RSFs. Time for me to delve a bit deeper into the literature I think!

Michael Taylor

unread,
Jul 17, 2023, 6:05:52 AM7/17/23
to ctmm R user group

Hi Chris,
I am back at this after some time on a different project. I am trying to get my code for RSFs to run and am coming up against some problems.

 
The main issue is with memory, similar to this thread, which as well as causing errors takes a long time. I have 42 sites to iterate over and running an rsf for the first site alone takes >1 hour and then runs into memory issues. The resolution of the predictors is quite high (10m) so aggregating to ~100m might be one option, but it would be nice to get it running at this resolution if possible.
I have cropped the predictor layers to the extent of each UD to try and help speed things up (but could I just project the layers to the ctmm projection to limit the function reprojecting everything?)
Integrator = Reimann is a little bit quicker ( I think), but still pretty slow.
I think define formulas would minimise additional calculations/dredging. But, I am unsure how these should be formatted? Also, when defining formulas can I test multiple within the same function or would I have to repeat the rsf calculation for each additional formulae I want to test?

For example with the below, poorly formatted, formulas could I run rsf.select(…,formula = FORM) or would it have to be rsf.select(…,formula = FORM[[i]]) in a loop?
FORM <- c("~ H.EXTRA", “H.EXTRA + Bathy", “H.EXTRA + H.EXTRA:Bathy")

I am also interested in using an offset to prevent predictions from being on land (land should be NA in my predictor layers anyway. I have an layer called landoffset where land is 1 and water is 0, but I am unsure where or how I should add this.

I am predominately using terra and then changing to raster at the end so that may contribute towards my issues. I will try and get my code into a reproducible format soon if that helps, but in the meantime any advice on how to speed up processing would be appreciated.


Thanks again,

Mike

Christen Fleming

unread,
Jul 18, 2023, 4:14:45 AM7/18/23
to ctmm R user group
Hi Mike,
  • Riemann integration should be faster on a stationary model, and shouldn't run out of RAM if it works, but it needs the rasters to be in a consistent lattice and the resolution cannot be too coarse.
  • There is a max.mem argument that you can set for Monte-Carlo integration, so that the calculation can finish without an OOM error. The final numerical precision estimate is stored in the fit object.
  • I don't think that cropping should speed anything up, unless the full raster is too big to keep in RAM without cropping, in which case that might help some.
  • Formula objects just look like ~ trees + land_use and rsf.select() will build up and then pair down the different predictors to try and find the best combination.
  • Try adding  + offset(land) to your formula. This is not as strenuously tested, so tell me if it doesn't work.
  • I will be switching from raster to terra in the next year, but I don't think that will be making much difference, because I think the two packages are basically calling the same functions at this point.
Best,
Chris

Michael Taylor

unread,
Jul 23, 2023, 2:27:07 AM7/23/23
to ctmm R user group

Hi Chris,

 

  • Reimann appears to be working faster, but you are right that my starting raster is not being kept in RAM so that is causing delays. This might also be due to my loading the files and then re-projecting them to lat/lon forcing r to create temporary files due to there size, but I think they are just too large. If I can stick with Reimann hopefully the maxmem argument should not be an issue.

I might see if loading the files and not manipulating them works to keep them in RAM. Otherwise I could try to crop them to a smaller area covering all ranges or stick with cropping them to the extent of each UD as I calculate my loop. Would cropping to the extent of the UD cause issues if some categorical values are not in that area and I want to average across sites?


  • I am also now getting an error with Reimann " Error in NEW[[i]] : attempt to select less than one element in get1index "

It seems like the i value is Null or <1, but I am not sure where in the source code this issue is occurring or how to fix it. I am running R 4.3.1, and tested this with ctmm_1.1.1 and the most recent github version.

I have what should be a reproducible script and data in this drive folder to see if you can recreate the error. Apologies for the size of the files, when I have a moment I will recreate it with smaller pred rasters.

 

Cheers,

 

Mike

Christen Fleming

unread,
Jul 26, 2023, 5:43:03 AM7/26/23
to ctmm R user group
Hi Mike

I had to change some code for rsf.select() to work with categorical variables. It is updated on Github now. rsf.select() does not select among the different categories, but selects the entire set as a whole, as there are too many ways to nonsensically combine categories with an automated selection algorithm.

Categorical variables not in an area shouldn't matter, as by default the reference category is set automatically But I haven't yet coded mean() to work with different reference categories (but can quickly if someone needs it).

In your code, formulas are not character objects like "~ x + y" but objects created by the tilde operator like ~ x + y without quotes.

Best,
Chris

Chad Wilhite

unread,
Aug 20, 2023, 9:40:29 PM8/20/23
to ctmm R user group
Hi all,

I working on fitting an iRSF with an interaction between a categorical raster (land cover) and a time dependent covariate (sunlight from annotate) and when I run  rsf.select I encounter the same error as Mike, Error in NEW[[i]] : attempt to select less than one element in get1index. It seems that I have no trouble fitting iRSF's with interactions between continuous raster covariates and etiher continuous or categorical time dependent covariates. Anyways, I haven't been able to solve it yet so I was wondering if you Mike or Chris (or anyone) have found a solution and might be able to help me out? See below for my session info and a reproducible example.

Cheers,
Chad

###### Reprex:
require(sf)
require(terra)
require(tidyverse)
require(ctmm)

data(buffalo)
DATA <- buffalo$Cilla
GUESS <- ctmm.guess(DATA, interactive = FALSE)
FIT <- ctmm.select(DATA, GUESS)
UD <- akde(DATA, FIT)

tst_r <- UD %>%
  as.sf %>%
  st_buffer(10000) %>%
  st_bbox %>%
  st_as_sfc %>%
  st_as_sf %>%
  rast(resolution = 100)

nr <- nrow(tst_r)
nc <- ncol(tst_r)

values(tst_r) <- round(runif(nr * nc, 0, 7))
tst_r <- raster::raster(tst_r)
tst_r_c <- raster::as.factor(tst_r)
r_list <- list(tst_r = tst_r)
r_list_c <- list(tst_r_c = tst_r_c)

DATA <- annotate(DATA)

RSF_c <- rsf.select(DATA, UD, R = r_list_c, formula = ~ tst_r_c + tst_r_c:sunlight)
RSF <- rsf.select(DATA, UD, R = r_list, formula = ~tst_r + tst_r:sunlight)
###### END

# session info > sessionInfo() R version 4.3.1 (2023-06-16 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 11 x64 (build 22621) Matrix products: default locale: [1] LC_COLLATE=Spanish_Spain.utf8 LC_CTYPE=Spanish_Spain.utf8 LC_MONETARY=Spanish_Spain.utf8 LC_NUMERIC=C [5] LC_TIME=Spanish_Spain.utf8 time zone: America/Los_Angeles tzcode source: internal attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ctmm_1.2.0 lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0 dplyr_1.1.2 purrr_1.0.1 readr_2.1.4 tidyr_1.3.0 [9] tibble_3.2.1 ggplot2_3.4.2 tidyverse_2.0.0 terra_1.7-39 sf_1.0-14 loaded via a namespace (and not attached): [1] utf8_1.2.3 generics_0.1.3 class_7.3-22 KernSmooth_2.23-21 stringi_1.7.12 lattice_0.21-8 [7] digest_0.6.31 hms_1.1.3 magrittr_2.0.3 grid_4.3.1 timechange_0.2.0 Matrix_1.5-4.1 [13] e1071_1.7-13 DBI_1.1.3 fansi_1.0.4 scales_1.2.1 numDeriv_2016.8-1.1 codetools_0.2-19 [19] cli_3.6.1 expm_0.999-7 rlang_1.1.1 units_0.8-3 munsell_0.5.0 withr_2.5.0 [25] tools_4.3.1 raster_3.6-23 tzdb_0.4.0 colorspace_2.1-0 vctrs_0.6.3 R6_2.5.1 [31] proxy_0.4-27 lifecycle_1.0.3 classInt_0.4-9 pkgconfig_2.0.3 pillar_1.9.0 gtable_0.3.3 [37] data.table_1.14.8 glue_1.6.2 Rcpp_1.0.10 tidyselect_1.2.0 rstudioapi_0.14 farver_2.1.1 [43] suncalc_0.5.1 compiler_4.3.1 sp_2.0-0

Christen Fleming

unread,
Aug 21, 2023, 4:35:00 PM8/21/23
to ctmm R user group
Hi Chad,

Could you provide me with a Dropbox folder with these objects in it to test?

Best,
Chris

Chad Wilhite

unread,
Aug 22, 2023, 1:36:38 AM8/22/23
to ctmm R user group
Hi Chris,

I wrote the reprex with the buffalo data from the package and a randomly generated raster, let me know if you still want me to send you my workspace?
Cheers,
Chad

Christen Fleming

unread,
Aug 28, 2023, 1:59:13 PM8/28/23
to ctmm R user group
Hi Chad,

Thank you for the ReprEx. This issue should be fixed on Github now.

Best,
Chris

Chad Wilhite

unread,
Aug 29, 2023, 4:22:15 PM8/29/23
to ctmm R user group
The new get.terms did the trick, thank you Chris!
~Chad
Reply all
Reply to author
Forward
0 new messages