Structuring Missing Data

643 views
Skip to first unread message

Dan

unread,
Nov 28, 2012, 12:37:30 PM11/28/12
to unma...@googlegroups.com
Hello Group,

I have two survey, double-observer data and am having difficulty running models because of missing data.

I was under the impression that even if count data was present and an observation-level covariate was missing for a particular site, that that occasion would be omitted. E.g.:

 

Site      y1.1     y1.1     y.2.1    y2.2     obscov1.1        obscov1.2        obscov2.1        obscov2.2

1          X         X         X         X         X                     X                     X                     X

2          X         X         X         X         NA                  NA                  X                     X

     - where obsvar1.1 is the variable value for the first survey and first observer

However, this produces the following warning when I run p(obsvar1):
Warning message:
Some observations have been discarded because corresponding covariates were missing.

Thinking that this just notified me that some rows were dropped, I went on to put models in a FitList and received this error:
Error in validityMethod(object) :
Data are not the same among models due to missing covariate values. Consider removing NAs before analysis.
In addition: Warning message:
Some observations have been discarded because corresponding covariates were missing.


If I code the corresponding ‘y’ data to NA’s, p(obsvar) works fine, e.g.:

Site      y1.1     y1.1     y.2.1    y2.2     obscov1.1        obscov1.2        obscov2.1        obscov2.2

1          X         X         X         X         X                     X                     X                     X

2          NA      NA      X         X         NA                  NA                  X                     X

Although this may work for “obsvar”, what if I had a second, third, fourth… obs-level covariate that I actually had data for. By coding the ‘y’ data to NA’s, it would omit these data despite the fact that I may have data for the other covariates. I don’t believe this method is correct. For example, if I had 5 obs-level covariates and at each site 1 of the 5 was always missed, I would not be able to model any of the obs-level covariates. This is a bit of an extreme example, but you can see how quickly you would lose a lot of information. Or am I wrong?

So the question is how should this be structured so that I can model multiple observation-level covariates?

Any insight is greatly appreciated.

Cheers,
Dan

Richard Chandler

unread,
Nov 29, 2012, 9:13:54 PM11/29/12
to Unmarked package
Hi Dan,

Can you send a simple reproducible example of the problem? I'm not sure what you mean when you say : "when I run p(obsvar1".

Richard

Paul J Taillie

unread,
Dec 4, 2012, 7:56:32 AM12/4/12
to unma...@googlegroups.com
Dan,

Unfortunately, I think Unmarked is unable to fit detection models without covariate information corresponding to EVERY observation.  This is the reason that in one of Richard's examples, he has to mean-impute the covariate for a couple observations with missing data (this could be an option for you).  

Best,
Paul

Dan

unread,
Dec 7, 2012, 3:18:31 PM12/7/12
to unma...@googlegroups.com
Thanks for the replies,

I have been in contact with Richard off-list and he provided the following comments:

"Basically, anytime you have a NA in an obsCov, the corresponding "y" value needs to be removed. This is because there is no likelihood contribution for such an observation. The warning message you see is just telling you this. It isn't a problem except when you have covariates with different patterns of missing values. In such a case, you need to ensure that the missing values are consistent among covariates if you want to compare them using AIC. This isn't an unmarked issue, it is an issue related to the use of AIC in general."

In regards to imputing the missing obs-level covariates, Richard advised against this:

"I don't think that imputing (making up) values for the missing observations is a good idea. But if you do it, you should assess how sensitive the results are to different made-up values."

Cheers,
Dan

Pablo García Díaz

unread,
Dec 10, 2012, 6:26:47 AM12/10/12
to unma...@googlegroups.com
Dear people,

I want to construct a function for calculating and store the log-likelihoods of all the models I'm running, but using the function log-Lik in unmarked just allow me to calculate the values one-by-one (i.e., I cannot do logLik(c(garduna, garduna2, garduna4))

Any suggestion to get the job done?

Many thanks in advance, best regards

Pablo

Richard Chandler

unread,
Dec 10, 2012, 9:22:09 AM12/10/12
to Unmarked package
Hi Pablo,

Your question isn't really about unmarked, but here is one of many way to do what you want:

c(logLik(garduna), logLik(garduna2))

Richard

MP

unread,
Jul 29, 2014, 3:00:02 PM7/29/14
to unma...@googlegroups.com
I apologize for resurrecting an old thread, but I have a question that follows this discussion and I thought it better to keep all the info together for future researchers.

I am working in pcount with avian point-count data that include some missing obs. covariate values.  I would like to know if there's a way to directly remove rows containing one or more NA values from the Formal class UnmarkedFramePCount.
Ideally, I'd like to scan each row in the umf for NA values, then omit the row when they're located.  In a regular dataframe, I'd use na.omit(), but I'm not sure how best to go about this in a umf. 
I am separately analyzing multiple subsets of data, so I need something I can automate. 

I guess my other option would be to locate and remove the NAs in the initial covariate dataset, make note of their rows, and then remove rows from the initial bird dataset using that list of rows (all before creating a umf)?


(I hope this table stays properly formatted in this post.)

> umf1
Data frame representation of unmarkedFrame object.
       y.1 y.2 y.3 DBT.1 DBT.2 DBT.3 JD.1 JD.2 JD.3 ST.1 ST.2 ST.3   PCTSHRB.1   PCTSHRB.2   PCTSHRB.3 YYYY.1 YYYY.2 YYYY.3
V01881   4   0   1  18.3  27.0  30.6  135  154  179  910  812  816 49.40393519 49.40393519 49.40393519   2011   2011   2011
V01884   1   0   1  28.9  26.0  25.0  157  174  180  800  836  700  0.00000000  0.00000000  0.00000000   2011   2011   2011
V01904   2   0   1  26.1  24.0  29.4  157  174  180  656  703  857          NA          NA          NA   2011   2011   2011
V01906   1   0   1  16.7  30.0  30.0  134  156  177  736  714  818 59.21296296 59.21296296 59.21296296   2011   2011   2011
V02542   1   1   0  15.0  25.0  26.0  134  156  177  614  613  627 20.41087963 20.41087963 20.41087963   2011   2011   2011


-Michael

Richard Chandler

unread,
Jul 29, 2014, 3:08:26 PM7/29/14
to Unmarked package
Hi Michael,

You should be able to use standard bracket indexing methods such as:

umf2 <- umf1[-3,]

or

umf2 <- umf1[c(1,2,4,5),]


Richard


--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Richard Chandler
Assistant Professor
Warnell School of Forestry and Natural Resources
University of Georgia
Reply all
Reply to author
Forward
0 new messages