Missing Covariates Error with fitList

Katie Davis

unread,

May 20, 2025, 5:55:12 PMMay 20

to unmarked

Hi All,

I'm running into what seems to be a common problem, that the fitList function gives me the following error:

Error in validityMethod(object) :
Data are not the same among models due to missing covariate values. Consider removing NAs before analysis.

I understand that the cause of this is that sites with any missing covariate values are removed from analysis, however, this is fundamentally problematic for me as the covariates that are relevant to a survey vary depending on which survey it is, as my method varies. Essentially, surveys 1 and 2 are visual surveys and affected by covariates such as weather. Survey 3 is eDNA and is unaffected by weather (therefore has missing values for those covariates) but impacted by water chemistry (which conversely has missing values for visual surveys). I know that some folks use multi-scale models when working with multiple methods, but this is not an appropriate approach for my data. Additionally, my questions of interest are more focused on a multi-state model approach, which is what I'm trying to use.

Is there any way around this issue? I did not have this problem when implementing the same models in RPresence.

Thanks,

Katie

Ken Kellner

unread,

May 20, 2025, 6:56:07 PMMay 20

to unma...@googlegroups.com

I'm a bit confused by the structure here. It seems like you have a detection model that's something like

~ weather + chemistry

(setting aside that technically you can have different detection models for each true state in the multi-state model which I think isn't relevant here).

Then you have 3 replicate surveys per site. For the first two surveys, the value of chemistry is always NA. For the last one, weather is always NA. Is that right?

The way the model works in unmarked is that every detection probability for every survey has to come from the same regression model, e.g. in this case

logit(p[i]) = intercept + beta_weather * weather[i] + beta_chemistry * chemistry[i]

If any covariate value is NA for a given survey, that survey is dropped from the analysis. Since all surveys will have at least one NA (for either weather or chemistry), then all surveys will be dropped. I am surprised you were able to fit this model at all because it seems like every single data point will be dropped. Maybe I'm misunderstanding?

Is the structrure you are looking for more like

logit(p[i|surveyType[i] = visual]) = intercept + beta_weather * weather[i]

and

logit(p[i|surveyType[i] = eDNA]) = intercept + beta_chemistry * chemistry[i]

?

I think possibly some kind of interaction model could do this in unmarked, something like

logit(p[i]) = intercept + beta_eDNA[i] * is_eDNA[i] + beta_weather * weather[i] * is_visual[i] + beta_chemistry * chemistry[i] * is_eDNA[i]

Ken

> --
> *** Three hierarchical modeling email lists ***
> (1) unmarked (this list): for questions specific to the R package unmarked
> (2) SCR: for design and Bayesian or non-bayesian analysis of spatial capture-recapture
> (3) HMecology: for everything else, especially material covered in the books by Royle & Dorazio (2008), Kéry & Schaub (2012), Kéry & Royle (2016, 2021) and Schaub & Kéry (2022)
> ---
> You received this message because you are subscribed to the Google Groups "unmarked" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/unmarked/acb8b991-90fa-42ed-bdd4-4dcc3d792ae6n%40googlegroups.com.

Katie Davis

unread,

May 20, 2025, 8:06:49 PMMay 20

to unmarked

I believe you've understood quite well, actually.

The reason I've been able to fit the models thus far is that I didn't yet fit a global model (where both weather variables and water chemistry variables were included in the same) as I do have an explicit set of hypotheses regarding detection for each survey type. Essentially, I'm holding occupancy constant to first select a detection model, and comparing several detection models, including a null, one that varies just by method, one where detection varies by method and one method is also affected by covariates, and a global model where both methods are affected by covariates. The global model is in the works, I just haven't got to modifying that part of my RPresence code.

In RPresence, state could be added as a variable to the detection model, but wasn't by default included as two separate detection models in the implementation of the multi-state model. So in unmarked, I currently am using the same model for the probability of detection of both state 1 and state 2.

An interaction model would make sense - I had originally explored that in RPresence but the design matrix was not making sense, and because that package handles NAs differently, it actually appeared to be redundant (it's been months since I explored this issue, I don't remember the exact details).

How exactly do I write the interaction model in unmarked? I took a very brief stab at it before posting my question but got an odd error. Would it be:

detformulas = c("~ method * pH + method * precipitation", "~ method * pH + method * precipitation")

or

detformulas = c("~ method + eDNAdum * pH + visualdum * precipitation", "~ method + eDNAdum * pH + visualdum * precipitation")

^in this second version, method is a factor of 1 and 2, where 1 corresponds to visual and 2 to eDNA, whereas the "dum" variables are coded as 1 for their respective method and 0 for others. Back when I was testing out RPresence, this was something I had messed around with, and I just held onto both formats in case they were useful at a later point.

Thanks for the help!

Katie

Ken Kellner

unread,

May 21, 2025, 7:35:12 AMMay 21

to unma...@googlegroups.com

I'd start by using only dummy variables for eDNA/visual, using R factors here could work but is harder to get correct.

* Create a dummy variable for eDNA survey: 1 if it used eDNA, 0 if not
* Create a dummy variable for visual survey: 1 if it used visual, 0 if not

then the formula would be (here I am assuming that visual is the baseline for comparison)

~ 1 + eDNA + pH:eDNA + precip:visual

note the use of ":" instead of "*". If you use "*" for the interaction R automatically also inserts the corresponding main effects in addition to the interaction, but we don't want those in this case.

Now, when eDNA = 1 (an eDNA survey), visual = 0 and the formula technically simplifies to

~ 1 + eDNA + ph

and when eDNA = 0, visual = 1

~ 1 + precip

Then in your data, replace the NAs in ph and precip with some other value (maybe 0) so that unmarked doesn't drop them. You could try a few different values to replace the NAs and make sure this choice doesn't affect the output. It shouldn't since they should be getting zeroed out.

Ken

> To view this discussion visit https://groups.google.com/d/msgid/unmarked/40a08eae-333b-42cb-9acc-66a029609ed6n%40googlegroups.com.

Reply all

Reply to author

Forward