Model selection and interpreting summary data

Luke Emerson

unread,

Dec 1, 2016, 2:42:00 AM12/1/16

to distance-sampling

Hi Everyone,

Please forgive my ignorance as I am very much a novice when it comes to using R and performing distance sampling analysis.

Firstly, can anyone help me with model selection when all AIC values are very similar - should the lowest AIC value model always be selected if Goodness of Fit for all models are also similarly desirable? When plotting and selecting the best model does shoulder width and the value of the endpoint of the line also come into consideration?

These are the following AIC values I obtained from my data of 357 observations total.

                                                       Truncation 0%                    Truncation 5%               Truncation 10%
Half-normal cosine    AIC: 3154.2 AIC: 2872.3 AIC: 2646
Half-normal hermite polynomial        AIC: 3154.2 AIC: 2872.3 AIC: 2646
Uniform cosine    AIC: 3152      AIC: 2871.6 AIC: 2644.4
Hazard-rate simple polynomial         AIC: 3163.7     AIC: 2873.3 AIC: 2646.8

The following plots of my data are all truncated at 5% using the following models: hazard rate with simple polynomial, half-normal cosine and uniform cosine. BAsed on this information if I decided to go with a 5% truncated model, should the uniform cosine model be selected because it has the lowest AIC value?

Secondly, how do I interpret some of the summary data values? What does Average P mean and how should the associated average P values be interpreted? I have included an example of the summary data for one of the models below.

Any help is much appreciated.
Many thanks, Luke.

Uniform cosine- 5% truncation summary data
Number of observations : 339
Distance range : 0 - 76

Model : Uniform key function with cosine adjustment term of order 1

Strict monotonicity constraints were enforced. AIC : 2871.622

Detection function parameters
Scale coefficient(s): NULL

Adjustment term coefficient(s):

estimate se

cos, order 1 0.6084322 0.06430927

Estimate SE CV

Average p 0.6217235 0.02485811 0.03998258

N in covered region 545.2585052 28.40826471 0.05210054

Summary statistics:

Region Area CoveredArea Effort n k ER se.ER cv.ER

1 greater_glider 4.0128 4.0128 26.4 339 15 12.84091 1.352816 1 0.105352

Density:

Label Estimate se cv lcl ucl df

1 Total 135.8798 15.31146 0.1126839 107.3464 171.9976 18.30757

Glider plot haz 5%.png

Glider plot hncos 5%.jpg

Glider plot unifcos 5%.jpg

Eric Rexstad

unread,

Dec 9, 2016, 7:21:25 AM12/9/16

to Luke Emerson, distance-sampling

Luke

Apologies for the delayed response.

A couple of general comments about model selection in relation to distance sampling. It is usually the case that for data collected conscientiously (therefore containing a reasonable shoulder--drop off in detections is not too sudden; no evidence of rounding of distances to favoured values; good sample sizes) there may be little difference in fit between key function models half-normal and hazard rate. The histograms you sent along suggests your data are reasonably good.

You note that the difference in AIC values among the models you fitted is small with modest truncation. Conventional wisdom suggests there is little evidence to support one model over another if AIC<2 (as is the case with your 5 and 10% truncation). Hence your decision process then moves to the absolute measures of fit (goodness of fit tests) and if they indicate good fit of both candidate models, you are still unconstrained in choosing between models. The uniform key with cosine adjustment is also known as the Fourier series model and it is know to perform well. You will notice the shape of the fitted uniform cosine and the fitted half-normal cosine are nearly identical; particularly at small distances. This suggests that the inference drawn from both models will be nearly identical.

Moral of the story, when you have decent data several models will fit those data equally well and model selection is not a cause for concern.

Your second question about the "p" reported in output from the function ds(). This is interpreted as the probability of an individual inside the truncation distance being detected. It is the ratio of area under the fitted detection function to the area under a rectangle of height 1 out to the truncation distance. It is the quantity by which the number of detections is divided to produce the other quantity in the output: "N in covered region". In your case you had 339 detections after truncation; dividing 339 by 0.622 results in the estimate of "N in covered region" of 545.

I hope this is useful.

--
You received this message because you are subscribed to the Google Groups "distance-sampling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to distance-sampl...@googlegroups.com.
To post to this group, send email to distance...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/8ca3f7b1-c896-48d4-baf5-1a8e1e16a5d1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
Eric Rexstad
Research Unit for Wildlife Population Assessment
Centre for Research into Ecological and Environmental Modelling
University of St. Andrews
St. Andrews Scotland KY16 9LZ
+44 (0)1334 461833
The University of St Andrews is a charity registered in Scotland : No SC013532

Luke Emerson

unread,

Dec 12, 2016, 6:22:44 PM12/12/16

to distance-sampling, lukedaniel...@googlemail.com, eric.r...@st-andrews.ac.uk, er...@st-andrews.ac.uk

Hi Eric,

Thank you so much for taking the time to provide me with an explanation and some advice. It is very much appreciated and it has allowed me to better understand the distance sampling analysis I have performed and I am now confident in choosing a model/s and reporting my results.

Thank you and kind regards,

Luke Emerson.

Reply all

Reply to author

Forward