How are fitted values calculated from a colext fit?

Jorge A

unread,

Mar 20, 2013, 7:53:09 PM3/20/13

to unma...@googlegroups.com

Hi,

I am trying to calculate a chi-square-type GOF test in a multiseasonal occupancy model. I noticed this can be done in unmarked with parboot() but I have not been able to understand how the fitted() function works. Following the example from the colext vignette, I can see that fitted(m1) yields a matrix of fitted values that are compared to the data in the chisq() function. I get the first three columns as psi1*p1, but cant figure out how to derive the other columns. I thought columns 4:6 would (psi1*col1 + (1-psi1)*ext1)*p1, but that doesnt work. Fitted a null model to simplify, but still cannot figure out.

Thanks in advance,

Jorge

Kery Marc

unread,

Mar 21, 2013, 4:01:24 AM3/21/13

to unma...@googlegroups.com

Hi Jorge,

the GOF test for colext in our vignette is wrong: it always indicates a fitting model. We will correct this.

It is my understanding that no GOF test can be conducted directly for a binary response. The deviance and other measures of fit are uninformative about fit in the case of a binary response (this can be read somewhere in the GLM bible by McCullagh & Nelder for instance). One has to aggregate the response in some way before the usual GOF techniques can be applied.

What I did in the context of a dynamic occupancy model fitted in BUGS to binary data with three secondary reps was to aggregate the data to detection frequencies and conducted a Bayesian posterior predictive check of GOF to this binomial variant of the response, based on a chisquare discrepancy. (Note that I still fitted the model to the binary response though.) This indicated my model did not fit; hence, that appeared like an improvement ;)

I hope that something similar can be done for a likelihood analysis in unmarked. We have to check this out.

Kind regards ----- Marc

--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

erikw...@gmail.com

unread,

Mar 22, 2013, 6:40:19 PM3/22/13

to unma...@googlegroups.com

Hi Marc,

Just a quick followup on this topic. You mentioned the chi-squared GOF test you used in the vignette consistently (and incorrectly) results in a well-fit model to simulated data with parboot(). In the vignette, however, your GOF test of the null model (m0) resulted in a lack of fit to the simulated data, as seen in Figure 3, page 17. I ran your simulated code and indeed Pr(t_B > t0) < 0.01. In my data, a chi-square GOF test for my best-supported model fit rather well. Do you maintain the chi-squared approach is an incorrect method to use for GOF tests in colext()? Aside from the BUGS solution you discuss, would something like the Free-Tukey test (Brooks 2000) that you used in the island scrub jay paper (Sillett et al. 2012) be a suitable alternative? McCullagh and Nelder (1983:page 86) mention examining residuals is a useful technique to investigate how and in which direction the model departs from simulated data. In lieu of a formal test, would residuals be something to look at to investigate fit? In the figure below, I used your simulated data, but with nonparboot() to plot residuals. Do you think something like this is a useful diagnostic tool or rubbish?

Thanks for your opinions and helpful vignette, Erik

Brooks, S. P., E. A., Catchpole, and B. J. T. Morgan. 2000. Bayesian animal survival estimation. Statistical Science 15:357–376.

McCullagh, P., and J. A. Nelder. 1983. Generalized linear models. Chapman and Hall.

Sillett, T. S., R. B. Chandler, J. A. Royal, M. Kery, and S. A. Morrison. 2012. Hierarchical distance sampling models to estimate population size and habitat-specific abundance of an island endemic. Ecological Applications 22:1997–2006.

Kery Marc

unread,

Mar 26, 2013, 11:03:50 AM3/26/13

to unma...@googlegroups.com

Dear Eric,

thank you for remining me that in „my own“ vignette, there is a counter example to my claim that the GOF test we give there does not work ... To be honest, I had forgotten this. In a couple of other examples with simulated data for a binary response, the test failed to indicate a non-fitting model. I don’t understand why it does show lack of fit in our example, because from my reading of the literature (for instance some parts in McCullagh and Nelder), we can’t do any GOF test directly for a binary response, but we always have to aggregate the response first. I will check this in our case and report back sometime.

Couple of more remarks:

(1) GOF for binary response: the key thing to me seems to be whether your response is strictly binary (i.e., only 0 or 1) or whether it is some aggregation of binary numbers. If you have counts with values other than strictly zeroes and ones, then Chisquare and similar GOF tests should be OK. In the Island jay paper (Sillett et al. 2012), the response was a vector containing the counts of birds observed per distance class, and so this should be OK with our test. If you have a logistic regression (i.e., binomial GLM) with non-binary data (i.e., where the sample size or binomial index, N, is greater than 1 for some or most responses), then the test should also be OK, because then you have naturally „aggregated“ data (remember the binomial is the sum of N Bernoullis). However, it is my understanding that with binary responses, no discrepancy measure can be directly used to inform about lack of fit.

(2) Residuals: residuals have an important place in model criticism also in GLMs, and I like to think about the hierarchical models that we fit using unmarked (or BUGS, JAGS etc) as 2 or more nested GLMs. So of course, there should be a way how we can compute residuals and inspect them to see whether they truly look like a starry sky or whether there is some remaining structure perhaps related to some covariates. I don’t have much experience in this in the context of GLMs, but I have long thought that I should explore residual checking in these HMs much more.

(3) Quantifying the magnitude of lack of fit: Let’s assume you fit a hierarchial GLM with a reponse other than binary and do a parametric boostrap GOF test using some discrepancy measure such as Chisquare. Then, rather than using the parametric boostrap GOF to simply give you a “yes-no“ answer to the question of whether the model fits, it may be more interesting to give you an indication of how far your model is from a model that fits. I think that one could perhaps compute some „lack of fit statistic“ by dividing the GOF measure for your model by that of a perfectly fitting model (which would be the mean of the bootstrap distribution). ---- In the capture-recapture literature (e.g., CJS model), people have long quantified the lack of fit of a model using a c-hat statistics and often inflate their SEs by the square root of that to account for „overdispersion“ (assuming that all lack of fit is some sort of unstructured noise rather due to a structural failure of the model). It appears to me that if this is a sensible thing to do for CJS models, then perhaps something like that could be done in the kinds of HMs we fit in unmarked as well ?

Kind regards ---- Marc

Giancarlo Sadoti

unread,

Jun 22, 2014, 5:28:16 AM6/22/14

to unma...@googlegroups.com

Greetings all,

I know this is a long-dormant thread, but I thought a good place to ask a (I hope) relevant question. In reading (or what I can gather from) Mackenzie and Bailey (2004), they express concern about missing data in the calculation of bootstrapped chi-square goodness-of-fit statistics (using parboot()) from fitted values (psi(ij)*p(ijk)) and suggest "An extra level of summation would therefore be required in (2.7) to aggregate the test statistic over cohorts" of detection histories. I'm curious what folks think about using fitted values from the site-year level rather than the detection (site-year-survey) level in parboot() to get around the disparate-detection-history issues M and B raise and (possibly) some of issues Marc raised earlier.

Making these fitted site-year values comparable to the observed (naive) site-year occupancy was first (I believe) suggested by Moore and Swihart (2005) for calculating site-year residuals to assess spatial autocorrelation and has been applied since by several (including myself) in assessing spatial autocorrelation and generating model AUC. For the unfamiliar, this approach calculates the probability of detecting a species at least once given predicted occupancy in site i and year j (which M and S called D-hat).

D(ij)^ = psi(ij) x p(ij)* where

p(ij)* = 1 - [product of, from k=1 to K] (1-p(ijk)

I've written a (somewhat onerous) function to generate these fitted site-year values (via the M and S approach) that can be called by parboot() within a function like "fitstats" in the unmarked manual or "chisq" in Marc's colext vignette. I'm happy to post this function if the fitted site-year-values approach seem more or as appropriate as the current approach using fitted().

I'm also curious to hear how Marc or others feel (over a year later) about the general validity of a g.o.f. test for a binary-response model like occu or colext.

Thanks,

Giancarlo

Moore, J. E. and R. K. Swihart (2005). "Modeling patch occupancy by forest rodents: Incorporating detectability and spatial autocorrelation with hierarchically structured data." Journal of Wildlife Management 69(3): 933-949.

Kery Marc

unread,

Jun 22, 2014, 8:41:01 AM6/22/14

to unma...@googlegroups.com

Hi Giancarlo et al.,

unfortunately, I haven't had the time to investigate further GOF testing with occupancy models since last year. However, I am certain that in general the approach shown in our colext vignette is not valid: you can't test GOF for binary data without aggregating in some way, e.g., over reps to get at site-year totals. Hence, what you suggest makes much sense to me.

What I had wanted to do for a while to better understand the GOF issue in occupancy models is to analyse an array of data that were simulated to contain various assumption violations relative to the model used to analyse them (e.g., have a site covariate in occupancy when a Null-Null model is fit) and then see how different fit stats look like.

In general, I find it very valuable to start thinking more about the fit of our hierarchical models fit to distribution and abundance data and come up with model diagnostics. We should do residual checks in the same way we can do with simple Normal or generalised linear models, where we can look, especially visually, for any remaining pattern in the residuals. I would think that such residual checks would be more important that simple omnibus tests of fit.

Kind regards -- Marc

From: unma...@googlegroups.com [unma...@googlegroups.com] on behalf of Giancarlo Sadoti [gcsa...@gmail.com]
Sent: 22 June 2014 11:28
To: unma...@googlegroups.com
Subject: Re: [unmarked] How are fitted values calculated from a colext fit?

For more options, visit https://groups.google.com/d/optout.

Richard Chandler

unread,

Jun 22, 2014, 9:34:26 AM6/22/14

to Unmarked package

Hi Marc,

Just to offer a dissenting opinion, I think you can test GOF without aggregating data in dynamic models, at least for simple models without covariates and random effects. I say this (and I might be wrong) because these models predict that the metapopulation will reach equilibrium, and I can imagine scenarios in which the observed values don't agree with the expected equilibrium values. I'm thinking of an extreme example in which you have many sites with data like this:

111 000 111 000 111 111 000 000 000 000 000 000 000 000

Regardless, in typical situations, I'm sure it is hard to detect lack of fit without aggregating.

Richard

--

Richard Chandler

Assistant Professor

Warnell School of Forestry and Natural Resources

University of Georgia

706-542-5815

chandlerlab.wordpress.com

Kery Marc

unread,

Jun 22, 2014, 10:32:12 AM6/22/14

to unma...@googlegroups.com

Dear Richard,

thanks for your dissent and you must be right ! This would explain why in the vignette we DID find a lack of fit when we tested a constant model for data generated with time-dependent params ! For this very reason, I said "I am certain that in general ...." ;)

I think that GOF is understudied in most hierarchical models.

Kind regards -- Marc

From: unma...@googlegroups.com [unma...@googlegroups.com] on behalf of Richard Chandler [rcha...@warnell.uga.edu]
Sent: 22 June 2014 15:34
To: Unmarked package

David Miller

unread,

Jun 22, 2014, 11:28:58 AM6/22/14

to unma...@googlegroups.com

Hi all -

Interesting discussion on goodness of fit. I agree with Marc that there is lots of room to improve methods in this area.

The goodness of fit test in the colext vignette has parallels to the parametric bootstraps typically done to calculate c-hat in mark-recapture models. For that test, the parametric bootstrap simulation is key to be able to estimate the distribution of the test statistic used to assess fit. It seems like a good and useful omnibus assessment of fit, but not diagnostic as to the cause of lack of fit. It also is not clear to me that it would catch something such as Richard's example where transitions are non-Markovian. It would be more useful to try a focused GOF test such as examining the proportion of times that a species is detected when it was not detected in the previous year versus not detected for 2 or more years (since occupancy models generally assume that colonization only depends on the last time step). Of course this is still confounded with the detection process but if the two groups differ, it would suggest non-Markovian transitions are occurring (either heterogeneity among sites, or memory at the site level).

Dave

David Miller

411 Forest Resources Building

Department of Ecosystem Science and Management

Pennsylvania State University

University Park, PA 16802

814.863.1598 (office)

dx...@psu.edu

http://ecosystems.psu.edu/directory/dxm84

Reply all

Reply to author

Forward