Goodness of fit and NB models (pcount)

Shane Siers

unread,

Jan 23, 2014, 5:04:50 PM1/23/14

to unma...@googlegroups.com

Dear sages,

How would you address the following scenario in a manuscript, and what approach would you take?:

I have a very well-behaved Poisson pcount model that gives very reasonable estimates. However, it fails all GOF tests (parboot SSE, chi-quare, Freeman-Tukey)
Alternative NB and ZIP mixtures pass GOF with flying colors; however, the AIC results are identical (deltaAIC=2, penalty for inclusion of dispersion term) and dispersion terms are not significant
Estimates provided by Poisson and NB models are identical (data is great, 30 to 175 visits at each site).

Should I present the NB model and report that the dispersion parameter is non-significant, then indicate the GOF test as the reason I chose the mixture, OR present the Poisson model (in which case, how do I deal with GOF failure)?

Thanks in advance,

Shane

--

Shane Siers

Please watch my underwater videos at www.macronesia.net

Richard Chandler

unread,

Jan 24, 2014, 9:56:45 AM1/24/14

to Unmarked package

Hi Shane,

Can you show the results of the Poisson and NB model? The P-value for dispersion parameter (alpha) is not what you want to look at; what you want to assess is how large the dispersion parameter is. The NB model approximates the Poisson model as the dispersion parameter tends toward infinity, so if alpha is very high, then you probably don't need to worry about overdispersion. It sounds as though your estimate of alpha is close to 1, which would indicate overdispersion and therefore suggest that the NB model might be better than the Poisson, in this case.

Richard

--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--

Richard Chandler

Assistant Professor

Warnell School of Forestry and Natural Resources

University of Georgia

706-542-5815

chandlerlab.wordpress.com

Shane Siers

unread,

Jan 24, 2014, 12:07:30 PM1/24/14

to unma...@googlegroups.com

Hi, Richard. Eschewing the estimates, which are all well-behaved, reasonable, and very nearly identical, here are the results:

Poisson:

AIC: 3779.83772938

Number of sites: 36

optim convergence code: 0

optim iterations: 261

Bootstrap iterations: 0

NB:

Dispersion (log-scale):

Estimate SE z P(>|z|)

13.4 32.6 0.41 0.682

AIC: 3781.83971906

Number of sites: 36

optim convergence code: 0

optim iterations: 271

Bootstrap iterations: 0

If I understand correctly, the dispersion parameter is theta on the log scale, so alpha = exp(1/theta) = 1.0799, aka "c-hat", correct?

Richard Chandler

unread,

Jan 24, 2014, 2:46:24 PM1/24/14

to Unmarked package

alpha = exp(13.4) = 660003, so there appears to be no overdispersion. This is strange because your GoF results should be virtually the same for the NB and Poisson models. Can you post a simple reproducible example to demonstrate the problem?

Shane Siers

unread,

Jan 25, 2014, 12:25:02 PM1/25/14

to unma...@googlegroups.com

Hi, Richard. Not sure how I'd simplify my data to reproduce the problem. Here is my GOF process and results:

Poisson:

p.full <-pcount(~min+hab+rpast+rnow+wind+gust+moon+pr+obs

~site, data=umf, se=FALSE, control=list(maxit=1e4))

top<-p.full; pb.p <- parboot(top, fitstats, nsim=25, report=1)

Negative Binomial:

nb.full <-pcount(~min+hab+rpast+rnow+wind+gust+moon+pr+obs

~site, data=umf, mixture="NB", se=FALSE, control=list(maxit=1e4))

top<-nb.full; pb.nb <- parboot(top, fitstats, nsim=25, report=1)

Model selection:

nPars AIC delta AICwt cumltvWt

P 54 3766.54 0.00 0.73 0.73

NB 55 3768.54 2.01 0.27 1.00

Does this help? Probably not.

image.png

Jeffrey Royle

unread,

Jan 27, 2014, 11:55:23 AM1/27/14

to unma...@googlegroups.com

hi Shane,

With this kind of discrepancy between the two, given the indicated "no overdispersion" I'm wondering if we have a bug in either getY() or fitted() for the negative binomial model. This will have to be tracked down at some point but you might consider testing that out yourself. It would be easy to write your own fitstats() function to compute the fitted values directly and, in that case, you should get essentially the same result.

regards

andy

image.png

Shane Siers

unread,

Jan 27, 2014, 1:14:00 PM1/27/14

to unma...@googlegroups.com

Hi, Andy. Can you clarify something for me? It is the Poisson that fails the GOF. That failure is the reason I considered the NB, which does pass the fitstats. Are you suggesting that with no overdispersion the NB should also fail the GOF? If it is the NB giving the wrong results, I'm back to no fit. What is my recourse then? BTW, ZIP models give very similar GOF results to the NB (they both pass, while the Poisson fails) though the zero inflation parameters are not significant and there is no improvement in AIC fits.

Shane

image.png

Jeffrey Royle

unread,

Jan 27, 2014, 1:34:42 PM1/27/14

to unma...@googlegroups.com

hi Shane,

Yes, the NB should also fail the GOF evaluation.

My point was that the fit statistics should be roughly the same for the 2 models since your NB result suggests "no overdispersion", so I'm really curious about whats going on there --

Therefore I'm worried about whether there is an error in one of the unmarked functions that is being used by fitstats()

Its hard to say what recourse you have unless and until we know if there's an error lurking here, but it doesn't make sense that the Poisson model doesn't fit and the NB does fit, given the indicated lack of over-dispersion.

regards

andy

image.png

Richard Chandler

unread,

Jan 27, 2014, 5:12:01 PM1/27/14

to Unmarked package

I'll look into this. Something does seem wrong.

image.png

Richard Chandler

unread,

Jan 27, 2014, 5:28:53 PM1/27/14

to Unmarked package

I just did a quick simulation, and everything seemed fine. Shane, can you send me (rcha...@warnell.uga.edu) your data and a small script to demonstrate the problem?

image.png

Jeffrey Royle

unread,

Jan 27, 2014, 6:45:37 PM1/27/14

to unma...@googlegroups.com

hey Shane,

CC me too if you don't mind and I'll take a look in the morning .

andy

image.png

jk3...@gmail.com

unread,

Nov 10, 2016, 6:23:21 AM11/10/16

to unmarked, shane...@gmail.com

Hi,

Was this issue resolved? I'm getting some unintuitive results when computing fitted values for a negative binomial N-mixture model that may be related to this. Here's a simple example:

set.seed(45)
yS = matrix(rnbinom(50*5, mu = 1, size = 1), nrow = 50, ncol = 5)
fmNB = pcount(~1~1, data = unmarkedFramePCount(yS), mixture = "NB", K = 500)
fitted(fmNB)[1,1]
[1] 0.0007500683
exp(coef(fmNB)[1]) * plogis(coef(fmNB)[2])
[1] 1.04

Looking at the source the fitted values for the negative binomial mixture seem to be computed using a numerical computation of the means. The K value for this computation is taken as an argument to the 'fitted' function and not from the fitted object. In the example above the default value of K is much too low. Supplying a larger K works better:

fitted(fmNB, K = 500)[1,1]
[1] 1.036696

Jonas

Richard Chandler

unread,

Nov 11, 2016, 8:46:40 AM11/11/16

to Unmarked package, Shane Siers

Hi Jonas,

You simulated y~NB(lam,alpha) instead of N~NB(lam) and y~Bin(N,p). The first model could be fitted using a standard glm. The second seems to work fine in unmarked:

> set.seed(45)

> nSites <- 50

> nVisits <- 5

> lam <- 1

> p <- 0.5

> N <- rnbinom(50, mu = lam, size = 1)

> yS <- matrix(rbinom(nSites*nVisits, rep(N,nVisits), p), nrow = nSites, ncol = nVisits)

> fmNB <- pcount(~1~1, data = unmarkedFramePCount(yS), mixture = "NB", K = 500)

> backTransform(fmNB, type="state")

Backtransformed linear combination(s) of Abundance estimate(s)

Estimate SE LinComb (Intercept)

1.07 0.225 0.068 1

Transformation: exp

> backTransform(fmNB, type="det")

Backtransformed linear combination(s) of Detection estimate(s)

Estimate SE LinComb (Intercept)

0.538 0.0548 0.153 1

Transformation: logistic

> backTransform(fmNB, type="alpha")

Backtransformed linear combination(s) of Dispersion estimate(s)

Estimate SE LinComb alpha

1.09 0.608 0.0824 1

Transformation: exp

>

--

You received this message because you are subscribed to the Google Groups "unmarked" group.

To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Richard Chandler

Assistant Professor

Warnell School of Forestry and Natural Resources

University of Georgia

706-542-5815

Lab website

jk3...@gmail.com

unread,

Nov 11, 2016, 11:14:47 AM11/11/16

to unmarked, rcha...@warnell.uga.edu

Hi Richard,

Thanks for your response! You're right that the data I simulated are standard Poisson. My point was just to try to show what seems to me to be a bug in the 'fitted' function for NB models.

In my example, I would have expected the values from fitted(fmNB) to be equal to exp(coef(fmNB)[1]) * plogis(coef(fmNB)[2]), but that's not the case. This can happen also for data simulated from a proper N-mixture model. In your example 'fitted' gives the correct answer, but if we change lam to e.g. 10 and size to 1/5, fitted(fmNB) is no longer equal to exp(coef(fmNB)[1]) * plogis(coef(fmNB)[2]):

set.seed(45)
nSites <- 50
nVisits <- 5

lam <- 10
p <- 0.5
N <- rnbinom(50, mu = lam, size = 1/5)

yS <- matrix(rbinom(nSites*nVisits, rep(N,nVisits), p), nrow = nSites, ncol = nVisits)
fmNB <- pcount(~1~1, data = unmarkedFramePCount(yS), mixture = "NB", K = 500)

Now fitted gives:

> fitted(fmNB)[1,1]
[1] 4.144114

but this is not the same as the product of the back transformed coefficients:

> exp(coef(fmNB)[1]) * plogis(coef(fmNB)[2])

lam(Int)
5.647976

Like above, if a larger K is supplied to the fitted function we get a better agreement:

> fitted(fmNB, K = 500)[1,1]

[1] 5.646496

I hope this clarifies my point a little.

Jonas

Richard Chandler

unread,

Nov 11, 2016, 12:24:39 PM11/11/16

to Unmarked package

Ah yes, now I see. The problem was that fitted() used a default K of max(y)+20. I will change this so that it uses K from the fitted model. Thanks for the bug report!

Richard

--

You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jk3...@gmail.com

unread,

Nov 14, 2016, 5:27:50 AM11/14/16

to unmarked

Thanks for fixing this! Is there a problem with calculating the fitted values directly from the coefficients as is done for the Poisson and ZIP mixtures? It seems that approximating the means of the negative binomial with a finite sum may not be necessary?

Jonas

To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Richard Chandler

unread,

Nov 14, 2016, 8:56:53 AM11/14/16

to Unmarked package

Hi Jonas,

You may have noticed the comment I have about this in the source code:

https://github.com/rbchan/unmarked/blob/master/R/unmarkedFit.R#L1334

Ian Fiske probably had a good reason for using the summation to compute the expected value of N, but I'm not sure what his reason was. I'll probably remove the summation eventually, but in the meantime, the result should be fine as long as K was set high enough when the model was fitted.

Richard

To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

jk3...@gmail.com

unread,

Nov 14, 2016, 9:35:00 AM11/14/16

to unmarked

Ok, thanks. Yes, I saw the comment and was curious about why it was implemented this way. As you say, it shouldn't make a big difference in practice.

Jonas

Reply all

Reply to author

Forward