need guidance in dealing with single observations and unidentified species

brother jaymar

unread,

Jun 25, 2024, 2:10:06 AM (4 days ago) Jun 25

to distance-sampling

Hello,

I am trying to generate density estimates for each species per site, but the data includes single observations of species, and observations identified to only genus, subfamily OR family. I need guidance on how I deal with single observations and unidentified species, using Distance in R. Context below:

I am new to distance sampling. I am tasked with analyzing data I did not collect. I would like to analyze this data in the best way, regardless of what my supervisor wants.

Across two consecutive summers, two observers went out to 15 different sites, and conducted line-transect distance sampling. Sampling was done several times per month over 3-4 months, for each summer. This was repeated recently, so three different years of data (2008, 2009, 2022).
The 2008-2009 data was analyzed and published by my supervisor - a comparison to recent data is desired.
Each observation in the dataset is of a butterfly at a time, on a day, at a site, at a certain distance from the transect line, along with other measurements. Most observations are of species, but ~15% are identified to only genera, subfamily, or family.
For some species, there are only single observations. The lowest number of observations for a higher taxon is two.
Density estimates for each species, per site are desired by my supervisor.
I am using the R package, Distance.

Due to the presence of rarer species, I understand that I can analyze the data together, and use species as a covariate. But how do I deal with the higher-taxon IDs?

I thought I might try proration, which essentially divides these higher taxon IDs amongst identified species. The relative abundance of each species remains the same after proration.
For the previously published paper, my supervisor says he looked at individual cases, and guessed what species it must have been based on the time of day, whether it looked similar to some other species, and if it made sense by site. He may have used other information - it is not clear. This is not explained in the paper, by the way.
I don't like either of these methods, but think proration introduces less bias.

Thank you for any guidance you can offer.

- JD

Eric Rexstad

unread,

Jun 25, 2024, 10:32:34 AM (4 days ago) Jun 25

to brother jaymar, distance-sampling

JD

Welcome to the group. You have asked a couple questions, but let's start with your question regarding uncertain species identification; how do you incorporate these detections with uncertain identity into your analyses and furthermore, properly take account of that uncertainty in your measures of precision for species abundance.

It is a fairly difficult problem, but one that has been encountered by other researchers. The proration approach is described in this paper:

Gerrodette, T. and J. Forcada. 2005.Non-recovery of two spotted and spinner dolphin populations in the eastern tropical Pacific Ocean. Marine Ecology Progress Series 291:1-21 DOI: 10.3354/meps291001

There also happens to be an R package that implements of proration method described in the paper above in a distance sampling context. The package description says:

Perform distance sampling analyses on a number of species at once and can account for unidentified sightings. Unidentified sightings refer to sightings which cannot be allocated to a single species but may instead be allocated to a group of species. The abundance of each unidentified group is estimated and then prorated to the species estimates. The multi-analysis engine can also incorporate model and covariate uncertainty. Variance estimation is via a non parametric bootstrap.

The package can be found in this Github repository

https://github.com/DistanceDevelopment/mads

GitHub - DistanceDevelopment/mads: Multi-Analysis Distance Sampling. Deals with unidentified sightings, covariate uncertainty and model uncertainty in Distance sampling.

Multi-Analysis Distance Sampling. Deals with unidentified sightings, covariate uncertainty and model uncertainty in Distance sampling. - DistanceDevelopment/mads

github.com

There is a simulated data set within the package that you can examine to understand the input and results.

From: distance...@googlegroups.com <distance...@googlegroups.com> on behalf of brother jaymar <brother...@gmail.com>
Sent: 24 June 2024 23:24
To: distance-sampling <distance...@googlegroups.com>
Subject: [distance-sampling] need guidance in dealing with single observations and unidentified species

--
You received this message because you are subscribed to the Google Groups "distance-sampling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to distance-sampl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/61bd7e42-8cde-4409-8879-2fd410625548n%40googlegroups.com.

brother...@gmail.com

unread,

Jun 25, 2024, 1:33:44 PM (4 days ago) Jun 25

to distance-sampling

Hi Eric,

Thank you for your response.

As I noted later in my message, I am aware of proration - I did not say so, but I already had tabs open for those two sources you mentioned!

I suppose I should be more specific. I want to know if the following strategy makes sense:

Subset the data by identified species, and each taxonomic level of unidentified species - four total subsets
Restructure each subset of observations into a nested list, for example:
- Site 1
  - Taxon 1
  - Taxon 4
- Site 2
  - Taxon 2
  - Taxon 4
  - Taxon 9
- Site 3
  - Taxon 2
  - Taxon 3
Now I have four sets of nested lists (identified species, genera, subfamilies, families)
For each list, use a loop to run distance models for each taxon for each site, but using taxon as a covariate because I have single observations for some taxa
Use the abundance estimates for each taxonomic level of unidentified species at each site to prorate abundance estimates for each species at corresponding sites
- For example, adjust species abundance estimate at Site 1 with corresponding genera abundance estimates, then with corresponding subfamily abundance estimate, lastly with corresponding family abundance estimate
Compute density estimate for each species at each site

Reply all

Reply to author

Forward