Covariate model with missing observations

149 views
Skip to first unread message

OlivierD

unread,
Jul 11, 2017, 4:30:31 AM7/11/17
to distance-sampling

Hi all,

I'm curious as to how to fit a model with a sex covariate on line transect data for which some transects have no observation. Without the covariate, I can keep distance = NA for these transect and everything works fine. But ds() from (R) Distance does not like the model with the covariate (ie sex = NA for those empty transects) and I get the "No models could be fitted" error. I considered making the NA in sex explicit (eg as "unknown"): the model fits ok, but I am not really sure the estimates I get are really what I'm after. Any advice on how I should deal with this situation?

Thanks

Olivier

OlivierD

unread,
Jul 13, 2017, 2:37:57 PM7/13/17
to distance-sampling
As a work around, it is  possible to manually insert both the dummy variables (as 0, 1) for sexF and sexM in the data set and use them together... Then there is no more missing value in the sex covariate and fitting the model is possible again... I don't find that entirely satisfying but I guess it can do the trick...


jjr...@gmail.com

unread,
Jul 13, 2017, 6:09:36 PM7/13/17
to distance-sampling
Hi Olivier,

I am not on the Distance team but do have a fair bit of experience fitting detection functions with the R packages. My experience matches yours: ds() and ddf() will fail if the value of a detection function covariate is NA.

The problem of missing values is common to many types of models, not just detection functions, and as far as I know the Distance packages do not provide any special magic for dealing with this generic problem. I think you are left with the usual options you have when faced with the "missing data" problem elsewhere. For example, as you mentioned, you could invent a new factor level of "unknown" to go with "male" and "female". Or, prior to fitting detection functions, you could use other covariates or some external data to model or guess the sex of records that don't have it. Or remove sex as a covariate from your detection function. Or fit two detection functions: one with sex for the records that have it, and one without sex for the records that don't. Etc.

These are all solutions you could imagine when faced with missing data in other situations, e.g. when fitting a linear regression model, and I'm sure you have thought of them already. If someone knows of techniques specifically for dealing with missing values for detection functions, I'd be very interested to hear about them!

All the best,

Jason

Tiago Marques

unread,
Jul 14, 2017, 7:05:22 AM7/14/17
to Jason Roberts, distance-sampling
Hi all,
 
I might have missed the point, but Olivier message mentions the covariate is missing for transects without observations, and if that is the case, then this is not really a case of missing covariates as Jason described it (and which is exactly what most folks would consider a case of missing covariate values).

I would not anticipate any issues in fitting models with covariates in that case, as the covariate is not really missing, a covariate can only be missing if there is an observation, but in a transect with no observations, there's no observations to begin with by definition.

I would assume that using the right data structure for a data set with transects without observations should work. So I am wondering if this is just a bug in Distance or if there's something else that I might be missing?

cheers
T

--
You received this message because you are subscribed to the Google Groups "distance-sampling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to distance-sampling+unsubscribe@googlegroups.com.
To post to this group, send email to distance-sampling@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/47fe3151-ceca-4b72-9885-cbb38f1b43e2%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Eric Rexstad

unread,
Jul 17, 2017, 7:49:46 AM7/17/17
to OlivierD, distance-sampling

Olivier

Sorry for the slow response, I was travelling last week.  I performed an experiment using the "minke" dataset that is included in the Distance package.  That dataset contains several transects with no sightings (distances recorded as NA), but no covariates.  Consequently, I manufactured a covariate to see if I could get `ds()` to use that covariate in the detection function when there were transects without sightings.

Here is my reproducible code:

library(Distance)
data(minke)
eric <- minke  # my toy copy
eric$cov <- runif(min = 0.01, max = 1, n=99)
trouble <- ds(data=eric, key="hn", formula=~cov)
real <- ds(data=eric, key="hn")
summary(trouble)
plot(trouble)
summarize_ds_models(real, trouble, output="plain")
eric.covNA <- eric  # another copy
eric.covNA$cov <- ifelse(is.na(eric.covNA$distance), NA, eric.covNA$cov)
real.trouble <- ds(data=eric.covNA, key="hn", formula=~cov)
summarize_ds_models(real, trouble, real.trouble, output="plain")

I produce identical results (see summarize_ds_models() results) whether I set the covariate on empty transects to NA or to random values; the covariate value is ignored when distance=NA (i.e. empty transects).  Note in this toy example, the covariate is continuous rather than discrete as in your case.

I am using these versions of the relevant packages:

other attached packages:
[1] Distance_0.9.7 mrds_2.1.18  
--
You received this message because you are subscribed to the Google Groups "distance-sampling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to distance-sampl...@googlegroups.com.
To post to this group, send email to distance...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/distance-sampling/4f8e50fb-976b-436f-8a00-da787250eea9%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

-- 
Eric Rexstad
Research Unit for Wildlife Population Assessment
Centre for Research into Ecological and Environmental Modelling
University of St. Andrews
St. Andrews Scotland KY16 9LZ
+44 (0)1334 461833
The University of St Andrews is a charity registered in Scotland : No SC013532

Virus-free. www.avast.com

OlivierD

unread,
Jul 31, 2017, 5:19:43 AM7/31/17
to distance-sampling, olivier....@gmail.com, eric.r...@st-andrews.ac.uk, er...@st-andrews.ac.uk
Sorry for the late return... I was on holidays.
Eric, I reproduced your code with a categorical covariate and the output is indeed as expected. The covariate is ignored when distance is missing (transects travelled with no observation). So no problem here.
At the time of my initial post, I actually (and rather dumbly I must add) missed that I am fitting a "by year" model... which means that some Year / Sex combinations do not have enough observations to properly fit a model. This explains why I cannot fit the models including the sex covariate.
If anything, this proves that my holidays were useful...  Thanks everyone for you input and apologies for the noise!
Olivier

Eric Rexstad

unread,
Jul 31, 2017, 7:44:50 AM7/31/17
to OlivierD, distance-sampling

Glad you got it sorted Olivier.  Multiple factor covariate models often lead to data gaps that cause problems with optimisation.

--
You received this message because you are subscribed to the Google Groups "distance-sampling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to distance-sampl...@googlegroups.com.
To post to this group, send email to distance...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages