dht2 error - zero detections in one replicate

Shanti Davis

unread,

Feb 12, 2024, 7:55:13 PM2/12/24

to distance-sampling

Hello,

We are analyzing a multi-species marine bird dataset, where data is collected in 4 seasons over 3 years. We are using dht2 to estimate abundance of a subset of the observed species in each season. Our data is formatted as a flatfile, and segments without detections are represented by rows that includes Sample.Label and Effort, while NA for Species, size and distance.

This is the call to dht2:

qrt_dht2 <- dht2(ddf = df, # detection function using data for all seasons

flatfile = qrt_data, # filtered for species and season

strat_formula = ~Year,

stratification = "replicate",

convert_units = conversion.factor,

er_est = er_est_choice, # S2

sample_fraction = 0.5,

innes = FALSE)

We are not having problems with common species. Our issue occurs when trying to estimate abundance per season for less common species, specifically when that species has zero detections in one of the years. For example, in the “FebMar” season, we did not have any observations for Marbled Murrelets in 2020, with 34 observations in 2021, and 42 in 2022. We recognize that these sample sizes are low.

This is the error we are getting:

Error in `$<-`: ! Assigned data `diag(dm$variance)` must be compatible with existing data. ✖ Existing data has 3 rows. ✖ Assigned data has 2 rows. ℹ Only vectors of size 1 are recycled. Caused by error in `vectbl_recycle_rhs_rows()`: ! Can't recycle input of size 2 to size 3. Backtrace: 1. Distance::dht2(...) 5. Distance::dht2(...) 6. base::lapply(ddf, varNhat, data = res) 7. Distance (local) FUN(X[[i]], ...) 9. tibble:::`$<-.tbl_df`(`*tmp*`, "df_var", value = `<dbl>`) 10. tibble:::tbl_subassign(...) 11. tibble:::vectbl_recycle_rhs_rows(value, fast_nrow(xo), i_arg = NULL, value_arg, call)

It seems that dht2 is expecting 3 replicate years (2020, 2021, 2022), but there are no observations in 2020. There are still rows in the dataset for 2020, representing the segments (samples) surveyed.

An important component of this issue is that we successfully ran dht2 on this same data in early 2022, when only 2 replicate years were completed (2020 and 2021). At that time, dht2 was able to compute average abundance over the 2 years in that season (with high uncertainty) without throwing an error so it seems that something has changed since the update to the Distance package in late 2022.

Thanks in advance for your input, happy to share the data off-list for troubleshooting if needed!

- Shanti

Eric Rexstad

unread,

Feb 13, 2024, 5:11:23 AM2/13/24

to shanti.davis, distance-sampling

Shanti

Thanks for your detailed email.

I have a general idea of what you are trying to do, but can you add further details: have you divided your data by species before doing detection function modelling or does your detection function model data pool across species and seasons? That might be detrimental to your inference as that modelling approach (which I understand you are performing because of few detections for some species) will only produce unbiased species/season-specific estimates if detectability does not differ by season or species.

Your use case, where you use dht2 to perform estimation with a portion of the data used in fitting the detection function, does not seem correct to me. Nevertheless, you are the second person to describe this use case. I've re-read the documentation for dht2 and I do not see this use case described. In my use of


dht2

, the same data frame that was sent to ds is the data frame specified by the flatfile argument in dht2. When the same data frame is not used in both cases, that gives rise to the errors you describe.

Without knowing your data set thoroughly, I can't state the following will work, but it is what I would attempt.

Subset data by species, fit detection function to those data, use season as a covariate in the detection function model. Duplicate the season field as the Region.Label field before sending data frame to ds, this way, the ds output will provide you with season-specific estimates (without the use of dht2). If you want an average abundance across seasons, do that bit manually. The average is a weighted average, using season-specific effort as the weighting factor. Variability between surveys is calculated per formula below:

From: distance...@googlegroups.com <distance...@googlegroups.com> on behalf of Shanti Davis <shanti...@gmail.com>
Sent: 12 February 2024 23:57
To: distance-sampling <distance...@googlegroups.com>
Subject: [distance-sampling] dht2 error - zero detections in one replicate

--
You received this message because you are subscribed to the Google Groups "distance-sampling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to distance-sampl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/e2486bd9-e590-46c6-aa7a-ca8021a32460n%40googlegroups.com.

Shanti Davis

unread,

Feb 13, 2024, 6:33:23 PM2/13/24

to distance-sampling

Hi Eric,

Thanks for the quick reply!

Yes, our detection function modelling was done for each species separately, using data pooled across seasons as well as geographic strata. We are now wanting to estimate abundance per season and per geographic strata for the data that is already divided by species. We did not include season or strata in our detection function as covariates, since we do not expect those factors to affect detectability; we used a "sightability" covariate to account for environmental conditions (eg. sea state and wave height) which we consider to better capture variability in detections than season alone.

Our understanding was that we should be using dht2 to compute abundance estimates over the required strata (in our case replicates of season). This approach appears to be currently working for the other species we are analyzing; the error described above only happens when one of the replicate years in the data given to dht2 has no detections (ie. rare species).

We previously ran an interim analysis (early 2022) on this same data, and dht2 generated a results table with zeroes for that year and was able to calculate a weighted average and variability. We either need to re-think the inclusion of season as a covariate as you suggested, or potentially something has changed in the dht2 function since the update that makes it no longer possible to run our analyses as planned.