Different data/obs_table for ds() and dht()

Catarina T. Fonseca

unread,

May 8, 2024, 8:52:39 AMMay 8

to distance-sampling

Hello,

I am using distance sampling to estimate the abundance of several cetacean species using a line transect survey dataset. My goal is to obtain abundance estimates of each species for three regions (North, Centre and South).

However, I am quite limited in terms of number of sightings and to deal with this issue I am mostly using pooled detection functions.

I pooled the following data:

- sightings from another survey that was carried in almost the same conditions (same protocol, study area and boat; few observers participated in both surveys)

- incidental and off-transect sightings (given that the effort off-transect was the same as when we were on-transect)

- species that are expected to have similar detectability (small dolphins, beaked whales,…)

I also tested multiple covariates that may affect detectability and included them only if they improved model fit:

- environmental factors (sea state, cloud cover,…)

- observer

- cluster size

- species

- region (N, C or S)

My current approach (example below) is to use the pooled dataset to fit a detection function and then apply it with dht using an observation table containing only the sightings on-transect of a single species.

df_hn <- ds(data=jointdata, key="hn", truncation = 1.1, adjustment=NULL, convert_units = conversion)

mb_trunc <- subset(mb, distance <= 1.1) # remove truncated sightings

N_df_hn_l <- dht(model=df_hn$ddf,

region.table=AreaDf,

sample.table=lifeEffDf,

obs.table=mb_trunc)

where jointdata is the pooled data of all sightings across surveys, regions and beaked whale species, and mb_trunc only has the on-transect sightings of a single species with a distance equal to or inferior to the truncated distance.

dht OUTPUT:

Abundance and density estimates from distance sampling
Variance : R2, N/L

Summary statistics

Region Area CoveredArea Effort n k ER se.ER cv.ER
1 C 7990.444 2357.1526 1071.433 2 5 0.001866659 0.001656072 0.8871851
2 N 13596.382 1991.9152 905.416 3 5 0.003313394 0.001328045 0.4008110
3 S 2800.754 427.0376 194.108 0 3 0.000000000 0.000000000 0.0000000
4 Total 24387.580 4776.1054 2170.957 5 13 0.002458858 0.000000000 0.0000000

Summary for clusters

Abundance:
Region Estimate se cv lcl ucl df
1 C 11.91980 10.62617 0.8914721 1.448883 98.06285 4.077857
2 N 36.00223 14.76858 0.4102128 12.500859 103.68572 4.388206
3 S 0.00000 0.00000 0.0000000 0.000000 0.00000 0.000000
4 Total 47.92203 18.37310 0.3833957 20.459492 112.24720 8.156874

Density:
Region Estimate se cv lcl ucl df
1 C 0.001491756 0.0013298593 0.8914721 0.0001813269 0.012272516 4.077857
2 N 0.002647927 0.0010862137 0.4102128 0.0009194254 0.007625979 4.388206
3 S 0.000000000 0.0000000000 0.0000000 0.0000000000 0.000000000 0.000000
4 Total 0.001965018 0.0007533793 0.3833957 0.0008389308 0.004602638 8.156874

Summary for individuals

Abundance:
Region Estimate se cv lcl ucl df
1 C 53.63909 47.81775 0.8914721 6.519971 441.2828 4.077857
2 N 144.00892 64.65848 0.4489894 45.333788 457.4639 4.320323
3 S 0.00000 0.00000 0.0000000 0.000000 0.0000 0.000000
4 Total 197.64801 81.14837 0.4105701 79.759109 489.7840 8.137832

Density:
Region Estimate se cv lcl ucl df
1 C 0.006712904 0.005984367 0.8914721 0.0008159711 0.05522632 4.077857
2 N 0.010591709 0.004755565 0.4489894 0.0033342538 0.03364600 4.320323
3 S 0.000000000 0.000000000 0.0000000 0.0000000000 0.00000000 0.000000
4 Total 0.008104453 0.003327446 0.4105701 0.0032704806 0.02008334 8.137832

Expected cluster size
Region Expected.S se.Expected.S cv.Expected.S
1 C 4.500000 0.000000 0.0000000
2 N 4.000000 1.054093 0.2635231
3 S 0.000000 0.000000 0.0000000
4 Total 4.124367 0.803449 0.1948054

However, I saw in other posts that in your experience the data used in ds() should be the same used in dht2...

Therefore my question is if i can use this approach and if not, what are my alternatives given my limited number of sightings?

Thank you in advance for your time!

Eric Rexstad

unread,

May 8, 2024, 11:01:35 AMMay 8

to Catarina T. Fonseca, distance-sampling

Catarina

Thanks for joining the list. You are faced with a challenging situation. From the output you have shared, you are hoping to make inference about the population size in three regions based on five detections of the "mb" species. Based solely upon information from five sightings, you are going to struggle to produce defensible estimates.

Carrying out the analysis you describe, the confidence interval around the number of animals in the central region is (6, 441) and for the northern region the interval is (45, 457). The coefficients of variation (0.89 and 0.45, respectively) also tell you there is little information from those 5 sightings to help you estimate the number of "mb" in your study area.

On to the finer points of your question: you are correct that I have cautioned against using the dht2 function in the manner you have used dht. This is because other writers to the list have encountered errors when doing so; and I suspect there maybe something in the depths of the dht2 code that is different that in the dht code.

In summary, if you are intent on estimating abundance of "mb" from this survey, I suggest you present the confidence intervals along with the point estimates. Those intervals will indicate to the consumers of your report that there is extreme uncertainty regarding the number of individuals of that species in your study area.

From: distance...@googlegroups.com <distance...@googlegroups.com> on behalf of Catarina T. Fonseca <catarina...@gmail.com>
Sent: 08 May 2024 12:07
To: distance-sampling <distance...@googlegroups.com>
Subject: [distance-sampling] Different data/obs_table for ds() and dht()

--
You received this message because you are subscribed to the Google Groups "distance-sampling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to distance-sampl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/d34e43fe-1063-447b-bc1b-ccf9941ecf7en%40googlegroups.com.

Catarina T. Fonseca

unread,

May 9, 2024, 12:04:43 PMMay 9

to distance-sampling

Hello Eric,

First, thank you for your response and advice.

I am thinking that maybe I should be a little less "greedy" and simplify my analysis.

Probably obtain estimates per group of species instead, e.g., small dolphins, by pooling the on-transect sightings of the same species I used for the detection functions. I have already tried it and this lowers significantly the coefficients of variation. Moreover, maybe I could still also present the results per species while stating that these aren't as reliable.

Another option I may test is to not stratify my analysis and just produce one single estimate for the whole study area. However I think that with this approach I still won't have enough sightings to produce defensible estimates for some species.

Reply all

Reply to author

Forward