Different sites surveyed in different seasons (colext and dynamic models)

238 views
Skip to first unread message

James Swingler

unread,
Feb 2, 2022, 7:31:26 AM2/2/22
to unmarked
Dear all

I am currently running a two-year single-species dynamic occupancy model (DOM) using unmarked. The data is collected from the SABAP2 citizen science project, in which different sites are surveyed a different number of times each year. The examples I have seen all consist of balanced datasets with the same sites surveyed each year, however, this is not the case for my situation.

My situation is as follows: I have 3935 sites surveyed in 2010 and 3636 sites surveyed in 2011. Over these two years there at 5492 unique sites surveyed and 2079 of these sites are survyed in BOTH 2010 and 2011. The number of surveys per season are capped at 50 for each site.

This leads me to ask questions regarding how to format my data and some more technical questions about how the package treats sites that were surveyed in one year and not another.

I was hoping to set up my data in terms of the 5492 unique sites surveyed over these two years. This means that my detection history, siteCovs, obsCovs and yearlySiteCovs are all objects with 5492 rows representing each unique site.

But, since there were only 3935 sites surveyed in 2010, I am concerned about how the initial occupancy (psiformula) is estimated for sites that were not surveyed in 2010. I cannot have differing rows for the siteCov and other arguments, otherwise I get an error message regarding the dimensionality.

For the first primary period, does the model work by estimating coefficients using the data from the sites that were surveyed in that first period, and then project the fitted model on to the sites that were not surveyed in year 1? And then similarly for sites with no data in year 2, I would imagine the model uses the sites that have data in year 2 to estimate colonisation and extinction and then projects this on to sites that were not surveyed in year 2 in order to estimate occupancy? Or I suppose it may estimate these parameters using only the subset of sites surveyed both times?

Should I rather set up my data to include sites that are common to every season in a study (2079), or to set it up based on the sites observed in the first year of study (3935)?

I hope my question is making sense!

Thanks
James 


Marc Kery

unread,
Feb 2, 2022, 8:15:49 AM2/2/22
to unmarked
Dear James,

my intuition would be to set up your data set with 5492 unique sites. Covariate files are filled up with NA's so they match in their dimensions. I am not entirely sure how unmarked deals with this sort of imbalance (which, however, is common, and is the rule for such citizen-science projects). In particular, I don't know whether perhaps unmarked tosses out any sites that have any missing values anywhere ? I would just try this out.

I would bet that initial occupancy is estimated from all the 2010 data and that colonization and extinction are estimated only from those sites that have at least 1 survey in each year: it's from these sites only that we have any information about the dynamics rates.

However, sites only surveyed in one of the years are not useless: the 2010-only sites contribute information towards estimation of initial occupancy and sites exclusive to either year contribute information towards estimation of detection probability if they have at least 2 surveys in that year.

Best regards  --- Marc


From: unma...@googlegroups.com <unma...@googlegroups.com> on behalf of James Swingler <swingl...@gmail.com>
Sent: Wednesday, February 2, 2022 13:14
To: unmarked <unma...@googlegroups.com>
Subject: [unmarked] Different sites surveyed in different seasons (colext and dynamic models)
 
--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unmarked/2cb5378f-6c51-462f-9431-861f9dc70a3fn%40googlegroups.com.

James Swingler

unread,
Feb 2, 2022, 8:40:44 AM2/2/22
to unmarked
Dear Marc

Thank you very much for the timely response. I appreciate the insight and have a follow up question if you don't mind.

Regarding my siteCovs, I formatted the data to include the measured covariates from 2010 for all 5492 sites (as mentioned before, only 3935 sites were actually surveyed in this year), but I'm not sure if this is an incorrect thing to do?

If, however, I added NA's in the sitecovs for the sites that were not surveyed in 2010, I get the following warning message if I include the relevant covariate in the psiformula or pformula:

Warning messages:
  1: Some observations have been discarded because correspoding covariates were missing.
  2: 1557 sites have been discarded because of missing data.


Also, I've noticed that NA's in the yearlySiteCovs, which are included in the col/ext formula, gives an outright error in addition to the above warning message:

Error in optim(starts, nll, method = method, hessian = getHessian, ...) :  initial value in 'vmmin' is not finite 
In addition: Warning message:  Some observations have been discarded because correspoding covariates were missing.

If you have the time and would be willing, I could send you my R workspace and the script I've used to create this situation if it would give you a better understanding of what I am asking. 

Thanks for your time
James

Marc Kery

unread,
Feb 2, 2022, 9:11:38 AM2/2/22
to unmarked
Dear James,


Sent: Wednesday, February 2, 2022 14:40
To: unmarked <unma...@googlegroups.com>
Subject: Re: [unmarked] Different sites surveyed in different seasons (colext and dynamic models)
 
Dear Marc

Thank you very much for the timely response. I appreciate the insight and have a follow up question if you don't mind.

Regarding my siteCovs, I formatted the data to include the measured covariates from 2010 for all 5492 sites (as mentioned before, only 3935 sites were actually surveyed in this year), but I'm not sure if this is an incorrect thing to do?
No, that is correct.

If, however, I added NA's in the sitecovs for the sites that were not surveyed in 2010, I get the following warning message if I include the relevant covariate in the psiformula or pformula:

Warning messages:
  1: Some observations have been discarded because correspoding covariates were missing.
  2: 1557 sites have been discarded because of missing data.

This is just a warning message of course and not an error. And it means that the 5492-3935 = 1557 sites without surveys in 2010 are in fact tossed out by unmarked. (I also note that there is a typo in the unmarked warning...)
Also, I've noticed that NA's in the yearlySiteCovs, which are included in the col/ext formula, gives an outright error in addition to the above warning message:

Error in optim(starts, nll, method = method, hessian = getHessian, ...) :  initial value in 'vmmin' is not finite 
In addition: Warning message:  Some observations have been discarded because correspoding covariates were missing.
Here, you will have to experiment with providing initial values such that the initial value in the optimisation in 'vmmin' (whatever that is...) becomes finite. For choosing inits, it can be helpful to use the estimates from a "neighbouring" model, typically a simpler one, for instance without covariates.

So, perhaps you could fit the model first without covariates and then use its estimates as inits for the intercepts of the model. To obtain a guess for the slopes of psi on any covariates, you could first run a logistic regression of a binary indicator of whether the species was ever detected in 2010 on those covariates; then put these in the vector of inits. For obtaining inits for slopes of col/ext on covariates, you could separate out sites with visits in both years and then do a similar logistic regression of the observed state in 2011 only for the sites where species was never detected in 2010 (this is a detection-naive analysis of colonization rate) and of the observed state in 2011 only for those sites where the species was detected at least once in 2011 (this is the detection-naive analysis of extinction rate).

A simpler approach might just be to set inits at values of the intercepts that appear most plausible to you and all the inits for slopes at 0 (if your covariates are continuous and standardized).

Or, you could run a variant of the analysis for a data set that only contains the sites surveyed in both years at least once. Hopefully, that would give fewer or no numerical problems. And then use the solutions from the analysis of this data set as starting values for the analysis of the full data.

Best regards  --- Marc
Reply all
Reply to author
Forward
0 new messages