Hi Jeff
I am using PGOcc/spPGOcc to assess mammal responses to fire using a dataset comprising 2410 sites-surveys (rows), which we have corresponded about over email previously. The 2410 rows are made up of 862 unique sites that each have 1-12 surveys in different seasons or years. I am using a stacked approach because the data come from different projects over many years (site and year are used as random effects).
My models almost always fail GOF tests (Bayesian p-value typically 0 or close to it). I have tried many possible fixes, including different random effects structures, different occupancy and detection covariates, aggregating survey days, increasing the number of iterations, and spatial and non-spatial models across multiple species.
The only thing I have found to produce acceptable p-values (0.1-0.9) across multiple species is to thin the dataset to one survey per site (862 rows). I also tried thinning the data to a maximum of two or three surveys per site (1316 and 1544 rows, respectively), but the GOFs fail in those cases.
I also noticed the following behaviours:
1) if I take blocks of surveys (e.g., rows 1:100, 301:400, …, summing to 862 rows) the GOFs fail. In that instance, the dataset still contains multiple surveys per site (1-12), but not all sites are analysed. This may suggest the GOFs are failing due to repeat surveys of sites.
2) if I independently randomise every covariate in the blocked dataset such that the covariate dependency within sites (i.e. repeat values across surveys) is removed, the GOFs still fail.
3) for the dataset that was thinned to one survey per site and the GOFs were good, if I remove the site random effect from the models, the GOFs fail for group = 1. This is surprising to me since there is only one row of data per site.
These are my model formulae:
occ.formula <- ~ scale(YSF) + I(scale(YSF)^2) + scale(TWI) + scale(rain) + scale(freehold) + (1|LocationName_numeric) + (1|Year)
det.formula <- ~ Track + Season + (1|LocationName_numeric) + (1|Year)
*LocationName_numeric is the ‘site’ variable
I have plotted the discrepancy values and inspected the underlying data. The rows with high discrepancy values appear to span a wide range of covariate values and include both sites that were surveyed once only and sites surveyed multiple times. When I looked at the offending rows for two different species, there was some overlap in the problematic rows, but also many rows with high discrepancy values for one species but not the other.
Any advice you have about diagnosing/fixing the problem is greatly appreciated!
Thanks for your help.
Tim
--
You received this message because you are subscribed to the Google Groups "spOccupancy and spAbundance users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spocc-spabund-u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/spocc-spabund-users/a9446f20-b10c-4cb7-a148-ddedb5f464e1n%40googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/spocc-spabund-users/CAGz0RZCTfJ7XVKuLY9eC1CQmH4oQO0OQpWDRSVi4DN6arxbujQ%40mail.gmail.com.