Post stratification on a low density species

Javier García Reynaud

unread,

Feb 19, 2021, 7:54:44 PM2/19/21

to distance-sampling

Hello! it is a pleasure to greet you.

A study with a very low population bird species (resplendent quetzal) is being conducted in a protected area in Central America.

Six transects (average lenght 1200m) have been surveyed simultaneously on a monthly basis for a year. The preliminary idea to analyze the data is to pool all transects observations and to include a column at the Observation level of the data table as a stratum to carry out a post-stratification that estimates density as the mean of the strata estimates weighted by total effort stating that the strata are replicas.

My questions are: 1. Is this the correct approach? The length of the transects would be the same for each visit, since the same transects have been covered. 2. Is the attached example of the data table adequate for that purpose? (it should be noted that it is NOT the real data, since part of the final data is currently being tabulated, but an example loosely based on the number of observations per month and the type of rounding done by the observers in the calculation of distances).

The differences between the observations during the reproductive and non-reproductive months are DRAMATIC, and there is a non-reproductive month in which we did not even have a single observation, but part of the objectives of the study is to visualize these differences in order to manage the protected area accordingly. Finally, 3. in case of performing post-stratification. Should all months in which no species was observed should be placed as empty cells?

Already very thankfull to everyone who could share some guidance regarding these issues.

J.

TableExample.csv

Eric Rexstad

unread,

Feb 20, 2021, 6:39:37 AM2/20/21

to Javier García Reynaud, distance-sampling

Javier

Welcome to the email list.

I think you might be making your analysis too complicated by incorporating post-stratification. What is your objective: do you want to produce monthly estimates of abundance, that seems problematic given your comment that there are months when you have no detections.

I recognise that the example data you provide is not the real data, but you say it is indicative of the number of monthly detections and the inaccurate distance measurements. It is challenging that you can produce defensible monthly estimates of abundance from this small number of replicate transects. It would have been preferable to have three times the number of transects one-third the length to improve the estimates of encounter rate variability.

You also note the observers round to favoured distances (0, 5, 10, 15, ...) with the spike (30% of all detections) at distance 0 being most problematic

The detections by month are also challenging

Apr Aug Dec Feb Jan Jun Mar March May Nov Oct Sep
35 2 2 2 1 34 4 1 40 2 2 2

If detectability of birds changes dramatically between breeding and nonbreeding season, resulting in the difference in number of monthly detections, you'll be hard pressed to produce abundance estimates for the months when there are so few detections.

Recognising the data you have provided is just an example, 119 of 127 detections were made in Apr/May/Jun. The remaining 18 detections will have little influence upon the fitted detection function. Perhaps best to use the 119 detections to make an estimate of the number of birds during the breeding season. As I say, there is little to say outside those three months.

--
You received this message because you are subscribed to the Google Groups "distance-sampling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to distance-sampl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/bf680ea2-3591-4337-88a8-334a042e808an%40googlegroups.com.

-- 
Eric Rexstad
Centre for Ecological and Environmental Modelling
University of St Andrews
St Andrews is a charity registered in Scotland SC013532

Javier García Reynaud

unread,

Feb 20, 2021, 1:23:07 PM2/20/21

to distance-sampling

Hi Eric:

Thank you very much for your prompt reply. Seeing how carefully you reviewed the data, I think I was very hasty in sending you my questions without having the actual table of field data. I have been typing in the pending data in the last hours and I am attaching the real data to this communication.

The main objective of the study is to estimate the density of quetzals in the sectors with tourist loads (which is the reason why 5 transects were selected, and also because of that, the selection was non-random, I should add), so, although the monthly data are available, the ones from the months with very few detections could be discarded, as you note, if not useful to achieve the density/number of birds. Thank you very much for that insight.

The question that arises when analyzing your answer is: if only the combined data of the reproductive season were used without labeling them as repetitions, wouldn't Distance interpret them as if all the observations were made at a single moment, and would thereby overestimate the density result? How can the data from the months with the highest number of detections be used without falling into this overestimation of density?

Finally, the attached data still shows, for clarity purposes, a column at the observation level where it is detailed to which month the detection corresponds, there is also a final column called A or B, and this is because the transect was traveled from " round trip" each visit, so the observations labeled A were made when traveling the transect first thing in the morning and those labeled B upon return. Is data B somehow usable or should it be discarded?

Thank you very much again.

LaTigraPharomachrusMocinno.csv

Eric Rexstad

unread,

Feb 21, 2021, 10:39:48 AM2/21/21

to Javier García Reynaud, distance-sampling

Javier

Thanks for sending along your complete data set. Let me address your question regarding the calculation of sampling effort, then I will make a few additional comments.

Your details regarding the sampling protocol are absolutely essential to properly computing effort. In your most recent message, you note that the transects were traversed "out and back" with detections recorded when travelling in both directions. This means that for each of your monthly surveys, the length of the transects should be doubled because effort was expended both going out and coming back along the transect. Note that the vast majority of the detections (100 vs 22) were on the outbound leg.

Furthermore, if we were to combine the data from the three monthly surveys during the breeding season, April, May, June; we would triple the effort computed previously to account for the transects being visited in three months.

In the R language, here is how I would adjust your original file to exclude non-breeding season detections and to properly compute effort:

quetzal <- read.csv("LaTigraPharomachrusMocinno.csv")
names(quetzal) <- c("TranID", "Sample.Label", "Effort",
"distance", "when", "outback")
quetzal.breed <- quetzal[quetzal$when %in% c("Abr-20", "May-20", "Jun-20"),]
quetzal.breed$Effort <- quetzal.breed$Effort*3*2

Now, for some general comments.

With a small number of replicate transects and with those transects being non-randomly placed; there is little you can say about the density of your birds beyond the transects you sampled. Non-random transect placement (and the small number of replicate transects) means you cannot imply that the places you sampled are representative of the places you did not sample. You can conduct this survey again in the 2021 breeding season on these transects and compare the abundance estimates between years within the area covered by your transects. But your inference cannot extend to areas outside your sampled transects.

I performed a quick analysis of the breeding season survey. As you noted, there is severe rounding to favoured distances, but here is the fit of a half-normal detection function to the exact distances:

The fitted half-normal model bends downward only slightly, suggesting about 90% of the birds within 35m of the transect are detected. Do you find that number biologically plausible?

Surprisingly, this half-normal model fitted to the exact distances adequately fits the data

Cramer-von Mises test P-value=0.064. The 122 detections divided by the 0.90 probability of detection results in an estimate of 136 birds in the covered area.

To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/2a267792-a303-45a4-a6f3-5105604d1e33n%40googlegroups.com.

Eric Rexstad

unread,

Feb 21, 2021, 11:18:01 AM2/21/21

to Javier García Reynaud, distance-sampling

To complete the story (pressed 'send' too soon)

How would you compute density by hand, from the previous calculations. Distance for Windows software will do this if you have your effort recorded properly.

How much area was covered by the survey effort? The maximum distance (truncation distance) for the breeding season is 35m. The total length of the five transects adjusted for the six traverses is 48840m. So the area (in meters) covered by the survey is length x width (on both sides)

2 * 35 * 48840 = 3,418,800 square meters

or 341.88ha

The density of birds in the covered area is the 136/341.88=0.398 birds per hectare.

To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/aedc0d69-169f-36fc-728a-b0c99fb0da27%40st-andrews.ac.uk.

Javier García Reynaud

unread,

Feb 21, 2021, 2:41:25 PM2/21/21

to distance-sampling

Professor Rexstad,

Thank you very much for your guidance, it has been been absolutely essential for a better understanding of this analysis.

Regarding your question about the biological plausibility of that high detectability of the birds up to 35m, the recorded observations consist of both sightings and calls. Despite always trying to minimize in the field the degree of error in the estimation of distances, I believe that the obtained detectability value (90% within truncation distance) and the roundings are, unfortunately, indicators of a considerable degree of imprecision. As you mentioned, it is surprising (happily, should I say?), that an adequate fit model has been achieved, with a CVM p-value just above 0.05. I will try to work with the data without the detections in the return of the trail, keeping only those of the first pass, due to the theory that there might have been bird disturbance causing a reduction in detection in the second pass.