Question about uncertainty estimation in dsm

25 views
Skip to first unread message

aschi...@gmail.com

unread,
Dec 10, 2025, 2:17:29 PMDec 10
to distance-sampling
Hi people at the list

I have a small question related to how dsm estimates the CVs and reports it.  

When the output gives you a point estimate for all the CVs, it has one order of magnitude less than what can be seen from the map provided by plotting the object generated by dsm:var_gam.

So my questio is how dsm assess the point estimate of CVs and why there is such a differencebetween the estimate end the map

Eric Rexstad

unread,
Dec 11, 2025, 6:08:45 AMDec 11
to aschiavini, distance-sampling
Adrian

Rather than address the detail of how CVs are calculated, I've contrasted the values of CV in a data frame and the CV plot derived from that data frame. I base this upon the Gulf of Mexico pantropical spotted dolphin data distributed with dsm


Running the code for this vignette, a data frame cropped_grid is created with a field CV. A portion of the data frame is shown below.


The values of CV are on the order of 1-3.  The resulting plot of that data frame produces this legend:

consistent with the values in the data frame.

Note when you examine the code `dsm_var_gam()` does not return cell-specific CVs; rather what is returned is cell-specific variances in the `pred.var` element of the returned object, converted to CVs later in the code.

So I cannot reproduce the problem you describe with the example data set. If you wish to send more details off-list, I can have a look.

From: distance...@googlegroups.com <distance...@googlegroups.com> on behalf of aschi...@gmail.com <aschi...@gmail.com>
Sent: 10 December 2025 19:17
To: distance-sampling <distance...@googlegroups.com>
Subject: [distance-sampling] Question about uncertainty estimation in dsm
 
--
You received this message because you are subscribed to the Google Groups "distance-sampling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to distance-sampl...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/distance-sampling/bb1d41fb-b9b7-43f5-9d53-ffb1d624cb5an%40googlegroups.com.

aschi...@gmail.com

unread,
Dec 12, 2025, 1:45:46 PMDec 12
to distance-sampling
Hi everyone, I’m copying below Eric’s answer to my question. I think it might be useful to someone else.
A reflection upon reading Eric’s answer, and considering what we report to managers who may request results like these:
If I report an overall abundance for the entire study area, I could use the CV reported by summary.dsm.var.
But, if the question is “how many animals are in a portion of the study area?”, the answer would be different. We should not sum the number of animals across a series of cells and then use the overall reported CV. In that case, I think it would be more appropriate to report the CV maps. 
This is something that can be difficult to make understandable for a recipient not versed in statistics.

#############################################################################################

I now understand your original question.

What is printed by "summary.dsm.var" (that you have include) is the CV(abundance estimate in the study area)=0.4146. As the output shows it uses the delta method to sum the squared CVs associated with detection function uncertainty plus uncertainty from the GAM model. Overall abundance estimate for the study area is the sum of the cell-specific abundance estimates; each of the cell-specific abundance estimates have uncertainty. Summing the variances across all the cells in the prediction grid provides variance in the estimated total abundance.  That variance of the total abundance is the last thing computed by the "dsm_var_gam" function; have a look if you're curious:


What is contained within the object created by "dsm_var_gam" are the cell-specific density estimates along with the cell-specific variances of those density estimates. These eventually produce the uncertainty surface for the density surface map (the pretty plot).

Now to the question: why don't the CVs on the cell-specific level of resolution resemble the uncertainty in the estimated abundance over the prediction grid?  The reason is two-fold; by common sense, we should have less confidence when making a prediction at a particular piece of the ocean (or pampas) than across a much broader area. The more mathematical reason for the much, much greater CV at the cell level is the magnitude of estimated abundance at the cell level. Because those small estimated abundances are in the denominator of the CV, the resulting CVs can be quite large (in fact CV tends to infinity as estimated abundance goes to zero).  

Take the Mexico example. From the output you provided the estimated abundance in the study are is 27084. This comes from estimates made into 1374 prediction grid cells. We can therefore derive, that on average, the average abundance in each of the grid cells is about 20 individuals. We have more information when aggregated across the entire study area, therefore we are likely to have greater confidence about our estimates at the study area level, than we have at the grid cell level.
Reply all
Reply to author
Forward
0 new messages