Modelling PDF/CDF as a resonse

135 views
Skip to first unread message

Michael Taylor

unread,
Jul 10, 2022, 10:54:33 PM7/10/22
to ctmm R user group
Hi Chris,

I am looking to understand the habitat etc that several animals might make use of across their home ranges in a marine species where location estimates are relatively rare (e.g. 1- 5 a day). 

I am looking into rsfs and the rsf.fit() function, but would it be possible to model the probability of an individual being in/using a given cell as a response in relation to the habitat variables recorded in that cell and include a spatial autocorrelation factor (e.g. CDF/PDF ~ habitat.var + (1|id) + matern(1| x + y))? This could give us the estimation of what underlying variables determine the probability of presence in a given area and help identify key habitats etc.

If this were possible/advisable, would it be better to model the PDF or CDF values? From my, admittedly limited, knowledge would the PDF be the probability of the individual being in a cell at a given time during the range resident period and the CDF be the probability of the individual using that cell at all while it is in that home range?

Cheers,

Mike

p.s. I was exporting rasters and noticed that when I made a mistake and used "df" (or any other text) rather than "DF" there was no error/warning and the function automatically outputs the "CDF". Is there supposed to be a warning if the variables are defined incorrectly?

Christen Fleming

unread,
Jul 11, 2022, 12:57:52 PM7/11/22
to ctmm R user group
Hi Mike,

I know what a Matern covariance spatial field is, but I'm afraid that I don't understand the regression formula or the question. What is the spatial field in this context with a Matern covariance structure? In rsf.fit, the IID location process is taken to be a inhomogeneous Poisson point process. You want to add some correlated spatial error to this model to emulate some missing environmental covariates or something? If that's the case, it might be worthwhile to instead visualize something like a spatial distribution of errors under the assumed model.

As for unmatched arguments, it depends on how the functions are coded for the "..." argument.  In general, you shouldn't rely on R to notice any bad arguments because of the way that the special "..." argument works in R. Any unmatched argument in R is lumped in with ..., which is generally used to pass additional arguments to additional function calls. An unmatched argument could get passed through multiple function calls and the last function to receive a bad argument may or may not throw an error, because it could also be coded to potentially pass ... to another function, etc..

Best,
Chris

Michael Taylor

unread,
Jul 12, 2022, 4:06:05 AM7/12/22
to ctmm R user group
Hi Chris,

Thanks for getting back to me. That makes sense for the unmatched arguments and gives me yet another reason to be careful with my code.

Apologies for the confusion caused by the original question, I am exploring a few ideas at the moment and may be struggling to explain myself properly. Ignoring the rsf.fit for now, my proposal was to extract the CDF and x/y coordinates from AKDE calculations and use them as a response to investigate if/how habitat (or other variables) at an x/y coordinate could be used to predict the probability of an animal using that space (i.e. modelling drivers of the probability of use rather than presence/absence). I think that modelling using CDF as a response would result in highly spatial autocorrelated residuals given high probabilities are clustered together near the centre and low probabilities are clustered at the edges. In some GLMs (e.g. spaMM or glmTMB) you can account for this somewhat by including spatially-correlated random effects. Above I was proposing using a Matérn random effect to include pairwise correlations between individual pairs of x/y coordinates as part of the model. Given there are multiple animals involved I would also include individual as a random variable as well.

I am not sure if this makes it clearer or not, but I was really interested in your view of how appropriate it is to model CDF or PDF as a response in a GLM or similar analysis? I am also confused about what exactly the PDF and CDF values represent.

Sorry for any further confusion.

Best,

Mike

Christen Fleming

unread,
Jul 12, 2022, 6:14:16 PM7/12/22
to ctmm R user group
Hi Mike,

Okay, I think I understand the question now and it sounds a lot like RUFs, where you model KDE PDF ~ environmental predictors, and a spatial autocorrelation term is often included.


The KDE response variable (rather than the data) adds extra noise to the RUF analysis, and the sample size is harder to keep straight, because it isn't the number of KDE cells.
Since the extra noise in the RUF is from a non-local kernel, it can help with location error, but there are also better ways to deal with that by modeling it explicitly in RSF/SSFs. I've seen this done in a marine system, but the author's name escapes me at the moment.

Best,
Chris

Michael Taylor

unread,
Jul 13, 2022, 4:34:49 AM7/13/22
to ctmm R user group
Thank you Chris!!! That looks almost exactly what I was looking for, I just had not been able to find the name!

According to that paper, I would probably be better off using RSFs as I am using GPS data. I would still be interested in seeing the differences in my results from an RSF or RUF methodology if I can get the later working.

I am still confused about what exactly the PDF and CDF values give you in an ecological sense. Wouldn't modelling each of them answer a different question?

Thanks again,

Mike

Christen Fleming

unread,
Jul 13, 2022, 4:35:35 PM7/13/22
to ctmm R user group
Hi Mike,

The PDF gives you density for the individual, which is probably why it was chosen for modeling, because density ~ good stuff - bad stuff makes intuitive sense. But the density function is normalized so there needs to be a normalization constant, which is constrained by the other parameters. A Poisson regression takes care of that.

The CDF gives gives you the chance of finding the individual within that coverage area, which is much more abstract, but CDF values are 1:1 with PDF values, so you could do a beta regression on 1-CDF and think of it like a transformation of the response variable. That would not require care with the normalization, but you would still need to get the right sample size.

Best,
Chris

Jesse Alston

unread,
Jul 14, 2022, 3:53:44 AM7/14/22
to ctmm R user group
Hi Mike,

I believe the paper Chris was referring to earlier about dealing with error is: https://doi.org/10.1890/15-0472.1.

If you are doing a RUF, be sure to use occurrence() rather than akde() to calculate the UD. The UD of an occurrence distribution is the expected value of used habitat for the landscape, while the UD of a range distribution incorporates both use and availability. We will be mentioning this in a preprint that should be out by the end of August.

I'm just going to give a quick plug for rsf.fit() though. Chris has done a lot of work on this function over the past 6-8 months and so it's fairly easy to run rsf.fit() in ctmm and it has a lot of features that are appealing for marine species in particular. It automatically accounts for autocorrelation by incorporating the animal's movement model, it automatically weights each location to account for (correlated) missing data (like if an animal is mostly underwater at feeding locations but spends more time near the surface at [different] resting locations), you can account for hard boundaries by incorporating a boundary raster, you can propagate uncertainty robustly across your study population using meta-analysis, and it's very easy to generate suitability maps once you've run the RSF (including RSF-informed home ranges and occurrence distributions). We're still working on writing all this stuff up, but the tools are there and we are happy to help you use the tools since not many people have used it yet (although there should be 3-5 preprints and papers coming out about this in the next year or so).

Jesse

Michael Taylor

unread,
Jul 14, 2022, 4:38:26 AM7/14/22
to Jesse Alston, ctmm R user group
Thanks for the advice Jesse (and also for the definitions Chris)! I am currently getting stuck into the literature about RSFs and RUFs, it feels like the more I learn the less I know!

That said, it does appear that, thanks to Chris's hard work coding it, rsf.fit() would be a lot easier for me to apply. I am currently reading through your pre-print, and maybe once I have a better understanding of the function and how my questions will fit with the RSFs. I could come back to you to review how I am applying the functions. Would it be better to keep asking questions like that through the forum or by contacting Chris/yourself directly? 

--
You received this message because you are subscribed to a topic in the Google Groups "ctmm R user group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ctmm-user/i6MrknozcHg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ctmm-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ctmm-user/3445b435-dbb8-4520-a9f3-add6099e63ean%40googlegroups.com.

Jesse Alston

unread,
Jul 14, 2022, 5:18:11 AM7/14/22
to ctmm R user group
Hi Mike,

That's the way these models go--there are so many options with disparate benefits and drawbacks, it can be difficult to choose among them for any particular application (and then when you do, it can be extremely hard to code up from scratch).

The forum is probably best just to document things for everyone, but feel free to contact us off-forum if your question requires sharing data or something you don't want to be public.

Jesse
Reply all
Reply to author
Forward
0 new messages