Singularity in Lapack routine

46 views
Skip to first unread message

Asier Rodríguez Larrinaga

unread,
Apr 16, 2025, 7:12:07 AM4/16/25
to distance-sampling
Hello, there.
In an experimental design comparing avian abundance in tapped and control pines, I am trying to fit a detection probability model to each one of my points, followd by a GLMM analyse of the number of birds detected, using the probability of detection as an offset (following suggestions by Steve Buckland in https://groups.google.com/g/distance-sampling/c/ZW4RRBC_65o/m/8nHlM7JgBQAJ).
Each point was sampled 16 times, in different months in two years. So, I am trying to fit this model:

ds(data, truncation = 80, transect = "point", dht_group=TRUE,  formula = ~ as.factor(Year) +Month, key = "hr", adjustment = NULL, convert_units=0.01)

For some of the points it works like a charm, but for others I am getting the following error:

Error in solve.default(m) : Lapack routine dgesv: system is exactly singular: U[3,3] = 0

The error appears after printing the estimated AIC, in case it helps.
I've doing some trials and now I know the error is produced by the covariate Month. I initially thought it might be created by a lack of sample within years and months, but checking the number of birds in one of my problematic samples gives the following table:
Year       Month   n_ind
2023 7 195 2023 8 327 2023 9 404 2023 10 100 2023 11 40 2024 3 163 2024 4 75 2024 5 172 2024 6 40 2024 7 110 2024 8 100 2024 9 165 2024 10 115 2024 11 77
 
The number and distribution of birds in this particular point is even better that for some of the points where I got good fits.
I've been searching for this Lapack error and did not find anything usefull, could someone give some hint on this issue?
Thanks you in advance.
Asier

Eric Rexstad

unread,
Apr 16, 2025, 7:22:57 AM4/16/25
to Asier Rodríguez Larrinaga, distance-sampling
Asier

Just two quick comments as I need to rush off.  Provide some additional information and perhaps I can provide a more comprehensive answer.

Why are you not treating month as a factor covariate? Treating it as continuous implies density change is monotonic with month. Why not treat month as a factor?

The impression I have from this comment "for some points it works like a charm" is that you are fitting a detection function model separately for each point? That doesn't seem right, have I misunderstood. Is the table of detections by year and month for your "problematic" sample just for a single point?

From: distance...@googlegroups.com <distance...@googlegroups.com> on behalf of Asier Rodríguez Larrinaga <asi...@gmail.com>
Sent: 16 April 2025 12:12
To: distance-sampling <distance...@googlegroups.com>
Subject: [distance-sampling] Singularity in Lapack routine
 
--
You received this message because you are subscribed to the Google Groups "distance-sampling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to distance-sampl...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/distance-sampling/75bb1ee6-4088-4af3-8ba7-c200fd1b5dbcn%40googlegroups.com.

Asier Rodríguez Larrinaga

unread,
Apr 16, 2025, 8:15:29 AM4/16/25
to distance-sampling
Thanks a lot for helping with this issue, Eric.
Well, actually I did some models with the whole dataset (all points) and tried to fit a quadratic model for month (with +Month +sqMonth), as I would expect abundance to initially increase and then decrease. I thought that model would be better than treating month as categoric (which does not fit well the temporal dependency among months and requires more parameters to be fitted). To may surprise, when analyzing all the points together, the model with only Month as linear was a little bit better than the model where the Month-detection relationship was quadratic.
Yes, I am trying to fit the model just to one point, sampled 16 times. In the post I quoted, by Steve, he said " The issue now is that we don’t have a good error model for the estimated densities at each point, but we do have suitable models for counts – e.g. Poisson or negative binomial.  So instead of taking the estimated density as the response, we take the count, and put the terms to convert it to a density estimate onto the right hand side of the equation model, as a so-called offset.  This only works if we us a GLM with a log link function - and the offset is actually the log of the terms taken onto the RHS, and is included in the exponent:  E(n)=exp(linear predictor + offset))." Maybe I misunderstood something in his suggestion, but from here, my idea was to estimate the detection probability for each point, and fit a GLM to the count of birds detected in each point, with the estimated probability as an offset. That does not imply any assumption about the similarity among detection functions in the different points, as would a common model with an stratum layer and an additional covariate. Is it too crazy?
Thanks!!
Asier

Asier Rodríguez Larrinaga

unread,
Apr 16, 2025, 8:26:01 AM4/16/25
to distance-sampling
Ups, I forgot to explain that, yes, the table I provide is for that particular point where the fitting gives me this error. This same model gives no error for other points...and, in fact, I can avoid the error by binning the data to 20 m bins. I'd like to undersand, however, why it fails for this particular point, and not for others (with frequency tables very similar to this one, or even worse).
I read somewhere that this error can appear when the matrix to be inverted is too large for my computer...but I don't understand the mathematical intricacies of the procedure to know which is the matrix to be inverted and why it could be too large (maybe in this case this option can be disregarded).
Best regards
Asier

El miércoles, 16 de abril de 2025 a las 13:22:57 UTC+2, Eric Rexstad escribió:

Eric Rexstad

unread,
Apr 16, 2025, 12:46:09 PM4/16/25
to Asier Rodríguez Larrinaga, distance...@googlegroups.com
Asier

I don't know what ecological question you are trying to answer with your analysis. Analyses based around single points reflect the environment around that small location, hard to generalise to a broader area. Let's take a step back.

I suggest you read Prof Buckland's original paper on the subject here:
  • Buckland, S. T., Russell, R. E., Dickson, B. G., Saab, V. A., Gorman, D. N., & Block, W. M. (2009). Analysing designed experiments in distance sampling. Journal of Agricultural, Biological, and Environmental Statistics, 14, 432–442. https://doi.org/10.1198/jabes.2009.08030

You should then consult other papers with a similar ecological objective:
  • Carlisle, J., & Chalfoun, A. (2020). The abundance of Greater Sage-Grouse as a proxy for the abundance of sagebrush-associated songbirds in Wyoming, USA. Avian Conservation and Ecology, 15(2). https://doi.org/10.5751/ACE-01702-150216
  • Carlisle, J. D., Chalfoun, A. D., Smith, K. T., & Beck, J. L. (2018). Nontarget effects on songbirds from habitat manipulation for Greater Sage-Grouse: Implications for the umbrella species concept. The Condor, 120(2), 439–455. https://doi.org/10.1650/CONDOR-17-200.1

The underlying statistical challenge with the count model is that conventionally, the offset (in this case effective area, a derived quantity from detection probability) is assumed a constant. Because it is estimated, the analysis needs to carry the uncertainty in detection probability into the GLM. That requires something like a bootstrap procedure, so some bespoke code needs to be written to do this.

After you've looked at this work, let me know if you want me to share some ugly code that performs the detection model fitting, followed by fitting the count model, followed by the bootstrap procedure to propagate the uncertainty in the detection probability estimate. The code is not pretty and is fairly unstable, susceptible
to crashing (not because of singularities).


From: distance...@googlegroups.com <distance...@googlegroups.com> on behalf of Asier Rodríguez Larrinaga <asi...@gmail.com>
Sent: 16 April 2025 13:15
To: distance-sampling <distance...@googlegroups.com>
Subject: Re: [distance-sampling] Singularity in Lapack routine
 

Asier Rodríguez Larrinaga

unread,
Apr 21, 2025, 2:30:39 AM4/21/25
to distance-sampling
Thanks a lot for your help Eric. Yes, I am aware of the lack of generalization of the intended analysis, but our hypothesis is bounded to this small space of inference. Our working hypothesis is that resin tapping attracts herbivore insects to the pines and we'd like to check if this, in turn, attracts insectivorous birds to tapped trees. So, our avian surveys were carried out from tapped and untapped groups of trees and we only want to check bird abundance in their immediate vicinity. It is not the best design to see the effect of tapping in bird communities (which would need forest stands as replicates, indeed), but birds are not the main target of the experiment.
You are right...I should have read all the material from the beginning. I misunderstood Steve's message and I left the reading for the GLM bootstrap. I now see there is a double bootstrap and one of them implies variance estimation of point densities.
I am still studying those references you suggested, but surely that script would be veery helpfull. To be honest, I did not get a clear idea of how to implement the double bootstrap strategy from Buckland et al. 2001.
In the meantime, I would like to report on my Lapack issue, which seems to be somehow independent of my error approaching the analysis of data, just in case someone with similar issues arrives here.
I did some triales with point densities and make some figures of point distances and I found very weird distance distributions, which I think could be impossible to fit with a monotonically decreasing detectability function (not 100% sure of it as I only make distance histograms). In most of the cases I was able to fit usable detectability functions by changing the truncation distance; in one case a needed an extreme truncation of 30% of the sample, in others as low as 0.1%. In some cases I needed to analyze the data by bins (and still look for a useful threshold)...and in two cases I was not able to fit any HR function. In this trials I only worked with HR without any adjustment (as this was clearly best when analyzing the whole dataset), maybe with some adjustments or using HN I would have get a good fit also for those two points.
Best regards,
Asier
Reply all
Reply to author
Forward
0 new messages