Combining Data

Anna Kirby

unread,

Jun 23, 2023, 6:13:59 PM6/23/23

to distance-sampling

Hello,

Apologies for the Friday afternoon email!

I am interested in comparing density estimates for line transect data before and after the emergence of a disease. The "before" data was collected a few yeas prior to my study, but we followed the same procedures as the previous study to reduce bias between our study periods. The same transects were visited in both study periods, but were overall visited less times after the disease emerged than before (e.g., "before" study period transects were visited 16-18 times each, but in my study were only visited 5-8 times each). I would like to combine the two datasets together and use study period as a covariate in my detection function, but know that comes with the assumption both datasets have the same detection function. I have a few questions:

1. Is it appropriate to model each dataset separately to determine which detection function best fits individually, and if the top model according to AIC is the same (e.g., hr or hn), combine the data into one dataset?

2. In some cases the number of observations between the study periods for the same site were significantly reduced. For example one transect had 140 observations in the "before" study period, but only 10 after the disease emerged. If the datasets are combined and assuming the same detection function, is it sufficient to still include this transect in the analysis?

Thank you for any insight!

Anna

Stephen Buckland

unread,

Jun 24, 2023, 2:31:31 AM6/24/23

to Anna Kirby, distance-sampling

Anna, by including study period as a covariate, you assume the same detection function model, but allow the scale parameter to be different for the 2 periods. You can compare that AIC with the sum of the 2 AICs from doing 2 independent analyses. Also with the AIC obtained without including study period as a covariate. Presumably you’ll want to compare densities before and after, and that’s a simpler analysis if you do 2 independent analyses- see the distance sampling books. You should certainly retain all transects unless there was for example a major change of habitat between the 2 periods - that is, a known cause other than disease.

Steve Buckland

Stephen T. Buckland

CREEM, The Observatory, Buchanan Gdns, St Andrews KY16 9LZ, Scotland

e-mail st...@st-andrews.ac.uk

The University of St Andrews is a charity registered in Scotland:No SC013532

From: distance...@googlegroups.com <distance...@googlegroups.com> on behalf of Anna Kirby <anna.k...@gmail.com>
Sent: Friday, June 23, 2023 11:13:59 PM
To: distance-sampling <distance...@googlegroups.com>
Subject: [distance-sampling] Combining Data

--
You received this message because you are subscribed to the Google Groups "distance-sampling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to distance-sampl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/c8cc9006-dc2f-48ee-9789-d77822c55e86n%40googlegroups.com.

Anna Kirby

unread,

Jun 27, 2023, 6:04:21 PM6/27/23

to distance-sampling

Hi Steve,

Thank you for your response! Given your advice I've been building and running models for the data separately for each study period as well as combined together to compare AIC. In each of my model sets I have also included covariates such as Region.Label (my transect names, 17 total), vegetation type (factor), shrub cover, herb cover, and moon illumination (all continuous), as well as study.period (factor) for my combined data. I selected models using a stepwise process where I chose the univariate model with the lowest AIC and continued to add covariates until the AIC no longer decreased (I also checked these models against the models with no covariates).

For my Pre study period data the top model for the detection function was the hazard rate key function with Region.Label, moon, herb cover, and shrub cover included (AIC= 9949.484).

In my Post study period the top model for detection function was the half-normal key function with only Region.Label included as a covariate (AIC = 3467.127).
In the combined data the top model was the hazard rate key function with all covariates included (AIC = 13429.586)

Since the sum of AIC for the models where study periods are separate (9949.484 + 3467.127 = 13416.61) is less than the AIC of the combined data, then I should keep the datasets separate, correct? Looking at options for comparing density estimates of the two study periods with different detection functions, would a t-test be my best option? I reviewed the two-stage models in the Distance Sampling Methods and Applications book, but these models require only 1 detection function if I am understanding them correctly.

Thank you again for your assistance!

Anna

Eric Rexstad

unread,

Jun 28, 2023, 3:03:02 AM6/28/23

to Anna Kirby, distance-sampling

Anne

Let me jump into this conversation; Steve may have a view as well.

Before getting to the density estimate comparison, I encourage you to look closely at the model diagnostics associated with your models that retained a high number of covariates. How many parameters are being estimated in the pre-study model; how many parameters in your combined data model? Convergence might be an issue with parameter-rich models. Check that the standard errors of the detection function model parameters are sensible as well as the standard errors of the density estimates.

You want to make sure the density estimates you are comparing make sense before moving to the inferential step.

From: distance...@googlegroups.com <distance...@googlegroups.com> on behalf of Anna Kirby <anna.k...@gmail.com>

Sent: 27 June 2023 23:04
To: distance-sampling <distance...@googlegroups.com>
Subject: Re: [distance-sampling] Combining Data

To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/aed2d517-b600-4e50-a044-7e51b34c7df1n%40googlegroups.com.

Stephen Buckland

unread,

Jun 28, 2023, 8:40:38 AM6/28/23

to Anna Kirby, distance-sampling

Anna, if you do independent analyses, then you can use a t-test or a large sample z-test. See for example p85 of the 2001 distance sampling book.

Adding to Eric’s comments, I would try to figure out why AIC suggests Region.Label should be in your model. That’s adding a lot of parameters to be estimated. Can transect lines be grouped into homogeneous lines I wonder? Why does detectability vary a lot among transects? You may be able to model the variation you’re seeing with fewer parameters.

Steve

To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/aed2d517-b600-4e50-a044-7e51b34c7df1n%40googlegroups.com.

Anna Kirby

unread,

Jun 28, 2023, 12:25:52 PM6/28/23

to distance-sampling

Thank you Eric and Steve!

Looking at the model outputs of the top Pre and Post study period models. The standard error for the density estimates in the Pre models seem reasonable to me, they are mostly all less than 1.0 and all less than the estimated density. For the Post models, there are a few transects where the standard error does exceed the density. I suspect this is a result of a small number of observations on those transects (presumably a result of the pathogens emergence). The detection function parameters on the other hand for both models have higher standard errors compared to the estimates for a number of the transects. On the plus side, the herb cover, shrub cover, and moon illumination covariates produced reasonable estimates and standard errors.

I suspect the reason why AIC suggests including Region.Label in the model is because of the large area the transects are distributed across. The intent of the study.period we're using as our "pre" data was to measure prey availability of Golden eagles, so the transects were created within known eagle home ranges, meaning some transects are much further apart from others. There is some grouping with where the transects are located across the state, that I can try using this as a covariate rather than Region.Label and see if this improves AIC and the standard errors of the detection function parameters.

Anna

Anna Kirby

unread,

Jun 28, 2023, 2:57:41 PM6/28/23

to distance-sampling

Follow up after experimenting with grouping transects that are geographically closer together into 3 different groups. In all cases (Pre, Post, and combined Pre/Post data models) AIC still prefers including Region.Label as a covariate over my new Study.Region covariate. Standard error of the detection function parameters did improve for the top Pre model that included study.region, shrub cover, herb cover, and moon illumination as covariates but AIC was ~4 points greater than the top model that included Region.Label in place of Study.Region. For Post models, AIC prefers the model with with only Region.Label and AIC was 9 points less than the model with only Study.Region (the model with Study.Region also had greater AIC than the no covariate model). However, the standard error of the density estimates did improve in the Study.Region model. All models produced reasonable estimates of density and did not differ much between the model sets with Region.Label vs Study.Region.

In this case is it better to choose the model with fewer parameters but greater AIC to improve standard error?

Thank you,

Anna

Eric Rexstad

unread,

Jun 29, 2023, 4:54:25 AM6/29/23

to Anna Kirby, distance-sampling

Anna

I would not use precision as a model selection tool. Remember the bias-variance trade-off at the heart of AIC and related metrics. If you choose to reduce variance, then you are taking on greater bias in your estimates.

From: distance...@googlegroups.com <distance...@googlegroups.com> on behalf of Anna Kirby <anna.k...@gmail.com>

Sent: 28 June 2023 19:57

To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/6d810809-7f70-4996-88cc-a15365ee3146n%40googlegroups.com.

Reply all

Reply to author

Forward