using distance sampling to select cutoff point for truncating count data for N-mixture models

385 views
Skip to first unread message

markr...@gmail.com

unread,
Nov 14, 2016, 2:15:34 PM11/14/16
to distance-sampling
Hi all,

I am using N-mixture models to estimate site abundance for a grassland bird using a large, two-year data set. While I do have distances for each count, I am interested in using zero-inflated models in unmarked. 
The distances to birds in our data set ranges from 0-330 m; however, I'd like to truncate the data to only include the area where detection probability is reasonably high. 

I first truncated our data at the 95th percentile of distances, which leaves counts ranging from 0-200 m. Next, I fit half-normal and hazard-rate curves (figure attached). As you can see, detection for the lowest bin (0-25 m) is quite a bit less than 1.0. I realize that if I were to proceed with distance sampling for my analyses, I would want to omit the lower distances in order to meet the assumption that detection is 1.0 at 0 m. 

My question is: if I am only using this technique to quantitatively select a cutoff value for my subsequent analyses, is it necessary to discard all the distances on the far left where detection is clearly not 1.0? When I adjust bin sizes, or omit chunks of data, the plot just starts to get noisy and doesn't really provide me with any better information. I also get plots where some distance classes (bins) rise far higher than 1.0 on the y-axis, which I'm not quite sure how to interpret...

Hopefully this post doesn't seem vague. I'd appreciate any thoughts on how and whether to proceed with fitting a detection curve for my purposes. Ideally I would only include data within the range of distances where p is >0.5. Unfortunately, that would mean including only data from ~25-100 m, and discarding >1,000 of 3,000 counts. :/

Thoughts? Thanks in advance!
Best,
Mark 
grsp_det_function_revised.png

Tiago Marques

unread,
Nov 15, 2016, 6:05:18 AM11/15/16
to distance-sampling
Hi Mark, list,

There are a number of statements in your e-mail that require some feedback, even though this is not an answer to your question. It is really not clear to me why you would want to use an N-mixture model since you have distance sampling data, given that if the distance sampling assumptions hold, it should lead to abundance/density estimates without much effort. But since N-mixture models are not my thing, I'll leave you to it to understand if there might be issues in using this data set under that context. Others might be willing to comment further.

But I can tell a few things that don't seem to be adding up with your data and your suggested approach. You never mention the words line or point, so I do not know if you are talking about lines or point transects. But the difference is key. This could be good (or bad, not really possible to tell from a plot only!) point transect data with detection functions overlaid on it incorrectly or this could be line transect data with a severe assumption violation (and detection functions still overlaid incorrectly). I am assuming this plot was generated by you, not by software like R package Distance, because the scaling of the histogram bars does not add up with the data. The area of bars above the fitted lines should be equal to the area of the bars below the fitted line. Even then, regarding bars above 0, these do not really mean much. There have been previous posts on this topic, but in short, you cannot read probabilities from the the histogram bars, only the line gives you the probability of detection.

Further, you are mentioning the possibility of left truncating your data, and I would say that is rarely ever a good idea unless there is a good understanding why small distances are missing in the first place. Say a blind spot under a plane might be a good reason, but even then to use with care. Avoidance movement by birds, the usual culprit for low numbers close by, could lead to extremely biased inferences if left truncation is blindly used. So, again, I would say left truncation is a bad idea (this is a gross oversimplification of course, but saves about 95% of the problems!), unless carefully justified. Forgive the self-promotion of my own work, but you might want to look into this paper for an example of the issues involved in left truncation:

Marques, T. A. 2016 A comment on Horcajada-Sánchez and Barja (2015): a cautionary tale about left truncation and density gradients in distance sampling
Annales Zoologici Fennici 53: 52-54

Any way, it seems to me you have to understand your data first before doing any further modelling. Using only data from a range of distances from which p>0.5 seems ad hoc (why not p>0.42, p>0.6, p>0.9, or p>0.6783) And it is unclear to me how it will help you, and strong assumptions would be needed for using only the data from the 25-100m range. 

hope this helps,

Tiago
Reply all
Reply to author
Forward
0 new messages