Random effects in occu(): error in solve.default & error in optim

264 views
Skip to first unread message

Gesa von Hirschheydt

unread,
Nov 26, 2021, 12:25:12 PM11/26/21
to unma...@googlegroups.com

Dear unmarked users

 

Together with a colleague, I am simulating detection/nondetection data for a single species and within one season under various combinations of true psi (occupancy probability) and true p (detection probability) for different habitat types.

 

We then want to fit an occupancy model using the function occu() from unmarked. Habitat is a categorical covariate on occupancy and we would like to fit it as a random effects factor. For each combination of true psi and true p, we simulate several datasets (5 for the moment, later it will be more). In some simulation replicates, the model-fitting works very well, but in some I get either of the following 3 errors:

 

1.Error in solve.default(full) :

system is computationally singular: reciprocal condition number = 1.05845e-46

 

2.Error in solve.default(full) :

Lapack routine dgesv: system is exactly singular: U[1,1] = 0

 

3.Error in optim(starts, fn = tmb_mod$fn, gr = tmb_mod$gr, method = method,:

gradient in optim evaluated to length 1 not 3

 

I suspect errors 1 and 2 are somehow related and that error 3 refers to an optimization gradient used by TMB to find the MLE…?

 

I have tried to manipulate the initial values in order to be closer to the true values by setting the intercept inits to either 0.25 or 0.75 (depending on whether the true value is > 0.5 or < 0.5). I also set the initial value for the sd of the random effect to some value between 0.3 and 1 to avoid extremely small inits for sd, but this did not solve the problem, errors still keep occurring. I nevertheless believe the problem is related to the initial values because when I encounter an error and try to manipulate the inits manually, I always end up finding some inits that work for that simulated dataset. But I have not found a set of inits that would work for the entire simulation (i.e. for all combinations of true psi and p values and for all 5-100 simulation replicates of each of those combinations).

 

Does anybody know what it is that leads to these specific error messages and how I can avoid them?

Thank you in advance for any input. The code is below.

 

Best wishes

Gesa

 

 

Code & notes:

Function simOccFac() serves to simulate the dataset. It results in detection histories (1/0) for M sites and J=2 visits where sites are sampled from 5 different habitats. The simulation of the dataset is repeated if 1) there are no detections at all, 2) the species was detected during at all sites during all visits, or 3) not all habitats were sampled.

 

simrep <- 5  # number of simulation replicates

M <- 200     # number of sites

pvals <- seq(0.1, 0.4, 0.1)  # levels of true psi and p to be tested

for(i in 1:length(pvals)){     # Loop over levels of psi

  for(j in 1:length(pvals)){   # Loop over levels of p

    cat(paste('\n*** Doing psi =', pvals[i], 'and p =', pvals[j], '***'))

    for(k in 1:simrep){

      repeat{

        # simulate data set

        dat <- simOccFac(M=M, J=2, mean.occupancy=pvals[i],

                         mean.detection=pvals[j],

                         nHab=5, range.HAB=4)

       

        # conditions to retain dataset: at least 1 detection; at least 1 site with < 2 detections; all factor levels occur

        if(sum(dat$y) > 0 & sum(dat$y) < 2*M & sum(table(dat$HAB)>0)==5){

          break

        }

      }

     

      # create unmarked dataframe

      umf <- unmarkedFrameOccu(y=dat$y,

                               siteCovs=data.frame(habitat=as.factor(dat$HAB)))

     

      # provide starting values

      int <- c(qlogis(runif(1)), runif(1),  # occupancy parameters: psi-intercept (mean), sd of random effect

               qlogis(runif(1)))            # detection parameters: p-intercept

      # int <- c(qlogis(ifelse(pvals[i]<0.5, 0.25, 0.75)), runif(1, 0.3, 1),

               # qlogis(ifelse(pvals[j]<0.5, 0.25, 0.75)))

     

      # fit model

      fm <- occu(~1 ~ (1|habitat), data=umf, starts=int, control=list(maxit=1000))

    }

  }

}

Ken Kellner

unread,
Nov 27, 2021, 12:59:03 PM11/27/21
to unmarked
I think you have diagnosed this correctly. I've seen all three of these types of optimization errors and I haven't been able to sort out exactly which conditions lead to each one. I'm not sure it matters since the solution, as you note, is either to examine the data or the initial values or both.

I've found that with sparse presence/absence datasets unmarked-TMB can have a very hard time getting good estimates for random effects, especially compared to considering the covariate as a fixed effect instead. I suspect there is not a single set of initial values that will solve optimization problems in all cases (if there was we should be using them as the defaults!). Given your simulation inputs, with J=2, it looks like you are probably generating some pretty sparse datasets (e.g. when p=0.1 and psi=0.1). It looks like you have some checks to reject certain datasets, which is good. However, realistically, you are probably not going to be able to fit the model with 1 (or even a handful) of detections, even without the random effect. So, you might want to adjust those checks to require more detections. You may also want to check not only whether all levels of HAB occur in the dataset but also whether some HAB levels have no detections at all as I suspect that could also cause optimization problems. 

Also you mention psi/p > 0.5 but it doesn't seem like the simulation procedure creates any datsets like that currently, but maybe I'm missing something  (pval=seq(0.1,0.4,0.1))

Ken

Gesa von Hirschheydt

unread,
Nov 27, 2021, 4:35:54 PM11/27/21
to unma...@googlegroups.com
Dear Ken

Thank you for your good idea to make even stronger conditions to retain a dataset, I will try out your 2 suggestions.
I wonder, however, whether it may not introduce some bias into the estimates if I only retain datasets that contain > n detections (whether n is 3 or 4 or whatever) because this might force the average number of detections to be too high with respect to the true psi/p value (e.g. in case of psi=0.1 and p=0.1). Similarly, I believe forcing all habitat types to have at least one detection would remove samples where some habitat would be highly unfavorable for the species, also creating some sort of bias... Could not this happen?
I will try your suggestions in any case. I was also told that the ubms-package might be more capable of dealing with these type of low-number-of-detections data. That would be my next option then if I cannot get it to work like this.

You are actually right, the code includes only pvals between 0.1 and 0.4, so the statement psi/p>0.5 does not apply here really. I used these pvals as my try-out settings only. Later (i.e. when the try-out settings work without error), I will run pvals between 0.1 and 0.9 at quite small intervals of 0.02 or 0.05.

Best wishes
Gesa

--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unmarked/804e268a-8ad6-48e8-beb1-0febe1a6dd03n%40googlegroups.com.

Ken Kellner

unread,
Nov 27, 2021, 5:23:30 PM11/27/21
to unmarked
Yes, I'd agree that it is not ideal to have to discard simulated datasets, and if you remove additional datasets you risk overestimating the ability of the model to recover parameter estimates (since you are ignoring the most challenging scenarios). Perhaps you could keep all simulated datasets and then keep track of both the accuracy of the estimates (for models you are able to successfully fit) and the proportion of simulations for which you were able to actually get the model to run, and use those two pieces of information together. I guess it depends on the goals of the simulation exercise.

Ken

Gesa von Hirschheydt

unread,
Nov 28, 2021, 4:23:30 PM11/28/21
to unma...@googlegroups.com
Dear Ken

The goal is to compare the reduction of the SE with increasing number of sites under different covariate settings. I will try to find a way to keep the loop running despite encountering an error so it would just save the information of an error for that simulation but then continue the sampling. I wasn't aware that was possible but I found some helpforums on that online, so I will try. Thank you for all your input!
I will get back to you if I make any interesting discoveries regarding the errors.

Best wishes
Gesa



Reply all
Reply to author
Forward
0 new messages