Greetings Tim:
Ah yes, the "spike at zero" challenge. You've presented your analysis according to the standard rules (create models, use GOF to retain models consistent with the data (retaining those whose significance level exceeds some Type I error rate threshold, use AIC
for model selection).
If this was the full extent of distance sampling analysis, we would code those rules into the software and leave little decision making for the human. But there is more to analysis than this. The human must also assess the plausibility of the resulting model.
Plausibility in the sense of whether the detection process has a shape shown by the fitted model.
In most cases with a surplus of detections at zero-distance, the plausibility test is not passed by the fitted hazard rate key function model. What typically occurs this situation is a precipitous decrease in detectability over a fairly short perpendicular
distance range. This results in estimated detection probability of, say, 0.2 at distances of 20-40m. The interpretation of this is for each animal detected at 30m, there are 4 more animals at that distance you failed to detect. This often does not pass the
sniff test.
The further consequence of fitting hazard rate models that "fit this spike" and fall away very rapidly, is the overall estimate of probability of detecting animals within the truncation distance of the transect is also very small. Because this detection probability
is in the denominator of the classical distance sampling estimator of abundance or density, the resulting estimates are unrealistically large. Stated more simply: your estimated detection probability is too small, causing your estimated abundance or density
to be too large.
Here is a passage from Section 5.2 of Buckland et al. (2015). You can substitute "hazard rate" for "negative exponential" in the following:
Often, assumption failure leads to spiked data, for which there are many detections close to the line or point, with a sharp fall-off with distance. For example, in line transect sampling with poor estimation of distance, many detected animals may be recorded
as on the line (zero distance). In such cases, a model selection tool such as AIC might select the negative exponential model because of its spiked shape. However, we should consider whether the negative exponential is a plausible model a priori for a detection
function. If all animals on the line are certain to be detected, it is implausible that many animals just off the line will be missed.
What to do in this situation? The choices include a) redo the fieldwork (as you suggest) or b) use a model inconsistent with the data that might more reliably capture the essence of the detection process you are trying to model, perhaps the half normal key
function.