Goodness of fit and NB models (pcount)

601 views
Skip to first unread message

Shane Siers

unread,
Jan 23, 2014, 5:04:50 PM1/23/14
to unma...@googlegroups.com
Dear sages,

How would you address the following scenario in a manuscript, and what approach would you take?:

  • I have a very well-behaved Poisson pcount model that gives very reasonable estimates.  However, it fails all GOF tests (parboot SSE, chi-quare, Freeman-Tukey)
  • Alternative NB and ZIP mixtures pass GOF with flying colors; however, the AIC results are identical (deltaAIC=2, penalty for inclusion of dispersion term) and dispersion terms are not significant
  • Estimates provided by Poisson and NB models are identical (data is great, 30 to 175 visits at each site).
Should I present the NB model and report that the dispersion parameter is non-significant, then indicate the GOF test as the reason I chose the mixture, OR present the Poisson model (in which case, how do I deal with GOF failure)? 
 
Thanks in advance,
Shane


--
Shane Siers

Please watch my underwater videos at www.macronesia.net
 

Richard Chandler

unread,
Jan 24, 2014, 9:56:45 AM1/24/14
to Unmarked package
Hi Shane,

Can you show the results of the Poisson and NB model? The P-value for dispersion parameter (alpha) is not what you want to look at; what you want to assess is how large the dispersion parameter is. The NB model approximates the Poisson model as the dispersion parameter tends toward infinity, so if alpha is very high, then you probably don't need to worry about overdispersion. It sounds as though your estimate of alpha is close to 1, which would indicate overdispersion and therefore suggest that the NB model might be better than the Poisson, in this case.

Richard


--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Richard Chandler
Assistant Professor
Warnell School of Forestry and Natural Resources
University of Georgia

Shane Siers

unread,
Jan 24, 2014, 12:07:30 PM1/24/14
to unma...@googlegroups.com
Hi, Richard.  Eschewing the estimates, which are all well-behaved, reasonable, and very nearly identical, here are the results:

Poisson:
AIC: 3779.83772938 
Number of sites: 36
optim convergence code: 0
optim iterations: 261 
Bootstrap iterations: 0

NB:
Dispersion (log-scale):
 Estimate   SE    z P(>|z|)
     13.4 32.6 0.41   0.682

AIC: 3781.83971906 
Number of sites: 36
optim convergence code: 0
optim iterations: 271 
Bootstrap iterations: 0

If I understand correctly, the dispersion parameter is theta on the log scale, so alpha = exp(1/theta) = 1.0799, aka "c-hat", correct? 

Richard Chandler

unread,
Jan 24, 2014, 2:46:24 PM1/24/14
to Unmarked package
alpha = exp(13.4) = 660003, so there appears to be no overdispersion. This is strange because your GoF results should be virtually the same for the NB and Poisson models. Can you post a simple reproducible example to demonstrate the problem? 

Shane Siers

unread,
Jan 25, 2014, 12:25:02 PM1/25/14
to unma...@googlegroups.com

Hi, Richard.  Not sure how I'd simplify my data to reproduce the problem.  Here is my GOF process and results:

Poisson:
p.full  <-pcount(~min+hab+rpast+rnow+wind+gust+moon+pr+obs
                 ~site, data=umf, se=FALSE, control=list(maxit=1e4))
top<-p.full;   pb.p   <- parboot(top, fitstats, nsim=25, report=1)

Inline image 2

Negative Binomial: 
nb.full <-pcount(~min+hab+rpast+rnow+wind+gust+moon+pr+obs
                 ~site, data=umf, mixture="NB",  se=FALSE, control=list(maxit=1e4))
top<-nb.full;  pb.nb  <- parboot(top, fitstats, nsim=25, report=1)

Inline image 3

Model selection:
   nPars     AIC delta AICwt cumltvWt
P     54 3766.54  0.00  0.73     0.73
NB    55 3768.54  2.01  0.27     1.00

Does this help?  Probably not.








image.png
image.png

Jeffrey Royle

unread,
Jan 27, 2014, 11:55:23 AM1/27/14
to unma...@googlegroups.com
hi Shane,
 With this kind of discrepancy between the two, given the indicated "no overdispersion" I'm wondering if we have a bug in either getY() or fitted() for the negative binomial model. This will have to be tracked down at some point but you might consider testing that out yourself. It would be easy to write your own  fitstats() function to compute the fitted values directly and, in that case, you should get essentially the same result.
 
regards
andy
 
 

 
image.png
image.png

Shane Siers

unread,
Jan 27, 2014, 1:14:00 PM1/27/14
to unma...@googlegroups.com
Hi, Andy.  Can you clarify something for me?  It is the Poisson that fails the GOF.  That failure is the reason I considered the NB, which does pass the fitstats.  Are you suggesting that with no overdispersion the NB should also fail the GOF?  If it is the NB giving the wrong results, I'm back to no fit.  What is my recourse then?  BTW, ZIP models give very similar GOF results to the NB (they both pass, while the Poisson fails) though the zero inflation parameters are not significant and there is no improvement in AIC fits.  

Shane
image.png
image.png

Jeffrey Royle

unread,
Jan 27, 2014, 1:34:42 PM1/27/14
to unma...@googlegroups.com
hi Shane,
 Yes, the NB should also fail the GOF evaluation.
 My point was that the fit statistics should be roughly the same for the 2 models since your NB result suggests "no overdispersion", so I'm really curious about whats going on there --
 Therefore I'm worried about whether there is an error in one of the unmarked functions that is being used by fitstats()
 
 Its hard to say what recourse you have unless and until we know if there's an error lurking here, but it doesn't make sense that the Poisson model doesn't fit and the NB does fit, given the indicated lack of over-dispersion.
 
regards
andy
 
image.png
image.png

Richard Chandler

unread,
Jan 27, 2014, 5:12:01 PM1/27/14
to Unmarked package
I'll look into this. Something does seem wrong.
image.png
image.png

Richard Chandler

unread,
Jan 27, 2014, 5:28:53 PM1/27/14
to Unmarked package
I just did a quick simulation, and everything seemed fine. Shane, can you send me (rcha...@warnell.uga.edu) your data and a small script to demonstrate the problem?
image.png
image.png

Jeffrey Royle

unread,
Jan 27, 2014, 6:45:37 PM1/27/14
to unma...@googlegroups.com
hey Shane,
 CC me too if you don't mind and I'll take a look in the morning .
andy
 
image.png
image.png

jk3...@gmail.com

unread,
Nov 10, 2016, 6:23:21 AM11/10/16
to unmarked, shane...@gmail.com
Hi,

Was this issue resolved? I'm getting some unintuitive results when computing fitted values for a negative binomial N-mixture model that may be related to this. Here's a simple example:

set.seed(45)
yS = matrix(rnbinom(50*5, mu = 1, size = 1), nrow = 50, ncol = 5)
fmNB = pcount(~1~1, data = unmarkedFramePCount(yS), mixture = "NB", K = 500)
fitted(fmNB)[1,1]
[1] 0.0007500683
exp(coef(fmNB)[1]) * plogis(coef(fmNB)[2])
[1] 1.04

Looking at the source the fitted values for the negative binomial mixture seem to be computed using a numerical computation of the means. The K value for this computation is taken as an argument to the 'fitted' function and not from the fitted object. In the example above the default value of K is much too low. Supplying a larger K works better:

fitted(fmNB, K = 500)[1,1]
[1] 1.036696

Jonas

Richard Chandler

unread,
Nov 11, 2016, 8:46:40 AM11/11/16
to Unmarked package, Shane Siers
Hi Jonas,

You simulated y~NB(lam,alpha)  instead of N~NB(lam) and y~Bin(N,p). The first model could be fitted using a standard glm. The second seems to work fine in unmarked:

> set.seed(45)
> nSites  <- 50
> nVisits <- 5
> lam <- 1
> p <- 0.5
> N <- rnbinom(50, mu = lam, size = 1)
> yS <- matrix(rbinom(nSites*nVisits, rep(N,nVisits), p), nrow = nSites, ncol = nVisits)
> fmNB <- pcount(~1~1, data = unmarkedFramePCount(yS), mixture = "NB", K = 500)
> backTransform(fmNB, type="state")
Backtransformed linear combination(s) of Abundance estimate(s)

 Estimate    SE LinComb (Intercept)
     1.07 0.225   0.068           1

Transformation: exp 
> backTransform(fmNB, type="det")
Backtransformed linear combination(s) of Detection estimate(s)

 Estimate     SE LinComb (Intercept)
    0.538 0.0548   0.153           1

Transformation: logistic 
> backTransform(fmNB, type="alpha")
Backtransformed linear combination(s) of Dispersion estimate(s)

 Estimate    SE LinComb alpha
     1.09 0.608  0.0824     1

Transformation: exp 


--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Richard Chandler
Assistant Professor
Warnell School of Forestry and Natural Resources
University of Georgia

jk3...@gmail.com

unread,
Nov 11, 2016, 11:14:47 AM11/11/16
to unmarked, rcha...@warnell.uga.edu
Hi Richard,

Thanks for your response! You're right that the data I simulated are standard Poisson. My point was just to try to show what seems to me to be a bug in the 'fitted' function for NB models.

In my example, I would have expected the values from fitted(fmNB) to be equal to exp(coef(fmNB)[1]) * plogis(coef(fmNB)[2]), but that's not the case. This can happen also for data simulated from a proper N-mixture model. In your example 'fitted' gives the correct answer, but if we change lam to e.g. 10 and size to 1/5, fitted(fmNB) is no longer equal to exp(coef(fmNB)[1]) * plogis(coef(fmNB)[2]):


set.seed(45)
nSites  <- 50
nVisits <- 5
lam <- 10
p <- 0.5
N <- rnbinom(50, mu = lam, size = 1/5)

yS <- matrix(rbinom(nSites*nVisits, rep(N,nVisits), p), nrow = nSites, ncol = nVisits)
fmNB <- pcount(~1~1, data = unmarkedFramePCount(yS), mixture = "NB", K = 500)


Now fitted gives:

> fitted(fmNB)[1,1]
[1] 4.144114

but this is not the same as the product of the back transformed coefficients:


> exp(coef(fmNB)[1]) * plogis(coef(fmNB)[2])
lam(Int)
5.647976


Like above, if a larger K is supplied to the fitted function we get a better agreement:


> fitted(fmNB, K = 500)[1,1]
[1] 5.646496

I hope this clarifies my point a little.

Jonas

Richard Chandler

unread,
Nov 11, 2016, 12:24:39 PM11/11/16
to Unmarked package
Ah yes, now I see. The problem was that fitted() used a default K of max(y)+20. I will change this so that it uses K from the fitted model. Thanks for the bug report!

Richard


--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jk3...@gmail.com

unread,
Nov 14, 2016, 5:27:50 AM11/14/16
to unmarked
Thanks for fixing this! Is there a problem with calculating the fitted values directly from the coefficients as is done for the Poisson and ZIP mixtures? It seems that approximating the means of the negative binomial with a finite sum may not be necessary?

Jonas
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Richard Chandler

unread,
Nov 14, 2016, 8:56:53 AM11/14/16
to Unmarked package
Hi Jonas,

You may have noticed the comment I have about this in the source code:


Ian Fiske probably had a good reason for using the summation to compute the expected value of N, but I'm not sure what his reason was. I'll probably remove the summation eventually, but in the meantime, the result should be fine as long as K was set high enough when the model was fitted. 

Richard


To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

jk3...@gmail.com

unread,
Nov 14, 2016, 9:35:00 AM11/14/16
to unmarked
Ok, thanks. Yes, I saw the comment and was curious about why it was implemented this way. As you say, it shouldn't make a big difference in practice.

Jonas
Reply all
Reply to author
Forward
0 new messages