RSF select memory, timing and formula issues

Lorenzo Frangini

unread,

Jan 7, 2026, 6:48:37 AMJan 7

to ctmm R user group

Hi Chris and all,

I'm back with some questions about iRSF for which I didn't find answer in the group.

I have telemetry data on jackals: some of them living in areas with low anthropization, others in areas with high levels of anthopization.

My aim is to compare the selection coefficents for some covariate between jackals living in 'natural' areas and those in 'anthropic' areas. The idea is to run rsf.select() for each individual, and then to run mean() in order to have average selection coefficients for 'natural' and 'anthropic' areas.

Some details about my data:

- GPS schedules are irregular between day and night (i.e., more locations at night) for most individuals

- some jackals dispersed or were nomadic, therefore I already selected range residency periods for each individuals (i.e., some jackals have more periods)

- raster covariates depict natural and anthropic features: some of them are continuous, other are binary (1=presence, 0= absence). For the latter I did the raster::as.factor() job. (for some jackals, some of the binary rasters are complementary, i.e., their sum covers the ~100% home range of the animal, and I read in a discussion it may represent a problem here)

- I created two boolean (TRUE/FALSE) covariates depicting the day/night time of the day (animal$day and animal$night)

The example code for one animal is:

animal<-annotate(animal)

animal$day<-animal$sunlight>0

animal$night<-animal$sunlight<=0

Rlist<-list(...) #here i create a list of 7 covariates but I plan to have other 3 covariates

rsf_formula <- ~ day:scale(Cov1)+day:scale(Cov2)+day:scale(Cov3)+day:Cov4+day:Cov5+day:Cov6+day:Cov7+night:scale(Cov1)+night:scale(Cov2)+night:scale(Cov3)+night:Cov4+night:Cov5+night:Cov6+night:Cov7 #scaled covariates are those continuous, all the others are binary

RSF<-rsf.select(data=animal,UD=animal_HR,R=Rlist, formula=rsf_formula, trace=T, verbose=T, reference= 0, max.mem="10 Gb")

And now my issues:

1. Basically any time I run the model it returns some warnings about the memory allocation. Therefore, I started with default values form max.mem term, and I increased gradually up to 10 Gb. In this way the warnings decreased, but never ended, but I suppose I cannot furtherly increase due to next issue (I work on a 16Gb ram PC)

2. The computation time is becoming an issue because it takes ~11 hours for some jackals, but for some others it run for more than 3 days and then it crashes (individual with few GPS locations)

I think this happens for two main reasons:

- the formula term is too long as Chris suggested here
- I'm using Montecarlo integrator since I have time-dependent covariates (i.e., day and night)

How can I solve such problems? Should I split in separate day and night RSFs and use Riemann integrator (but if I try to do it it returns an error at some point Error in vapply(1:dim(envir)[1], function(i) { : values must be length 165, but FUN(X[[1]]) result is length 0) )?

If you need more info please tell me.

Best

Lorenzo

Christen Fleming

unread,

Jan 12, 2026, 3:15:40 PMJan 12

to ctmm R user group

Hi Lorenzo,

If multiple binary variables constitute a single categorical variable, then you need to exclude a reference category from the formula and that needs to be a variable that is observed in the data. Otherwise, I don't know what you mean by complementary.

Another speedup that you could do would be to increase the numerical error from 1% to a larger value. The numerical variance of the log-likelihood should be recorded, so if there is some ambiguity in model selection from that error, then you would know and could make further adjustments.

Best,

Chris

Lorenzo Frangini

unread,

Jan 23, 2026, 10:22:16 AMJan 23

to ctmm R user group

Hi Chris,

sorry if I answer late but I didn't receive the notification of your response. I try to summarize some questions

1. Simpler and several models instead of a single and complicated one

Since I noticed only now that you answered to me, I started to do as follow.

I'm trying to run less complicated models first grouping the covariates based on their nature (e.g., one model with 3 binary variables, one model with 3 continuous variables), and within them doing separate models for day and night. Then, I would average among individuals to have a population estimates. Just to provide a simple example:

#binary variables are forest, grassland and cropland

RSF_day_binary_animal1<-rsf.select(...,formula=~day:forest+day:grass+day:crop,...,integrator="Montecarlo")

RSF_day_binary_animal2<-rsf.select(...,formula=~day:forest+day:grass+day:crop,...,integrator="Montecarlo")

RSF_night_binary_animal1<-rsf.select(...,formula=~night:forest+night:grass+night:crop,...,integrator="Montecarlo")
RSF_night_binary_animal2<-rsf.select(...,formula=~night:forest+night:grass+night:crop,...,integrator="Montecarlo")

population_day_binary<-mean(RSF_day_binary_animal1,RSF_day_binary_animal2)

population_night_binary<-mean(RSF_night_binary_animal1, RSF_night_binary_animal2)

Then I do the same for continuous variables and at the end I compare day an night estimates.

Actually, I'm working like this to cope with memory allocation and R crashing problems. It is still time consuming (from 7 to 24 hours per animal), but in most cases it works without warnings.

Therefore, my question is: does it make sense doing a workaround like this?

2. Complementary variables

To anwer to your question, I attatch figure 1: the three binary variables do not perfectly constitute a single categorical variable. Each binary is colored with a distinct colour (the 1 value), meanwhile the 0 value is transparet for all of them. White areas are areas where none of the three binary variables is present (i.e., is 0 for all of them).

As you can see, for some animals' home ranges they almost correspond to a single categorical variable (see sergio HR in the upper side of fig 1), but for other animals this is not the case (see isa HR in the lower side of fig 1).

So, is it faster to keep all of them as separate binary variables, or is it better to combine as a single categorical variable?
If the single categorical variable is a better way, should I create a fourth level in the categorical layer representing the white areas in fig 1 (level 1= forest, level 2= grass, level 3= crop, level 4= other) and use it as a reference layer?

3. Increase numerical error

Ok, thank you.

Just to better understand: if I already run some models with 1% error, and for slower models I increase the error (e.g., 5%), are 1% and 5% models comparable? And what do you mean with "adjustments"? Sorry for such questions, they may be stupid but working with the error argument is completely new to me.

Thank you again as always for the great support

Best

Lorenzo

Fig1.jpg

Christen Fleming

unread,

Feb 9, 2026, 10:20:35 PMFeb 9

to ctmm R user group

Hi Lorenzo,

1,3) The numerical error should be recorded in the model fit object. You can see what's in there with names(FIT). If the AIC/BIC of the top models is to within numerical error, then you could decrease the numerical error back down. I will put some code to help facilitate that on the TODO list, as that's what I would recommend for speeding things up. Even increasing the numerical error from 1% to 2% should make the computation 4x faster.

2) Those models should be equivalent.

I finally understood your question about the error with splitting your data. I'm guessing you get that error because the AKDE weights come from the unsplit dataset, so there are a different number of weights than the number of locations.