Poisson regression with overdispersion

269 views
Skip to first unread message

burakb

unread,
Sep 30, 2010, 7:01:31 AM9/30/10
to pystatsmodels
Is there the equivalent of R call glm(.. , family=quasipoisson) in SM
that can be used to fit Poisson regression with overdispersion? I did
a search in the code, in family.py, there was a comment about
quasipoisson being a todo item. Is this correct?

josef...@gmail.com

unread,
Sep 30, 2010, 9:52:06 AM9/30/10
to pystat...@googlegroups.com

Depends on what kind of overdispersion you want. I have been reading
up on it a little bit, and the recommendation I have often seen is to
use negative binomial instead of (quasi)poisson for increased
variance.
So my current opinion is that we don't need quasi-poisson, but I only
looked at the maximum likelihood and not at the GLM literature. And I
haven't looked at what the MLE equivalent of a GLM-quasi-poisson is.

However, I just finished yesterday the basic version of zero-inflated
Poisson, and Poisson with offset using GenericLikelihoodModel, and I
would like to add a few more, e.g. hurdle Poisson and zero-inflated
negative binomial.
These are the models for which I found more information in the literature.

The basic tests for zero-inflated Poisson pass, but there are some
numerical problems with the Hessian if there is no zero-inflation in
the data. This still needs to be made more robust.

An example script that I used to check the results is in
scikits\statsmodels\examples\es_misc_poisson2.py
the model class is in scikits\statsmodels\miscmodels\count.py

If you have a test case, I can look at it tonight or tomorrow.

Josef

josef...@gmail.com

unread,
Sep 30, 2010, 9:53:00 AM9/30/10
to pystat...@googlegroups.com

I forgot to add: this is in my and the devel branch.

Josef

burakb

unread,
Sep 30, 2010, 11:00:22 AM9/30/10
to pystatsmodels
One test case I have is from here:

http://www.stat.wisc.edu/courses/st572-larget/Spring2007/handouts24-2.pdf

It's the so-called Aberrant Crypt Foci example. R test code is here:

http://dl.dropbox.com/u/1570604/quasi.zip

Test code first runs glm using poisson, then quasipoisson, residual
deviance is 28.369 and 24.515 respectively; quasipoisson improving the
fit a little.

I looked at ex_misc_poisson2.py; I guess what I need is PoissonZiGMLE
call ?

On Sep 30, 4:53 pm, josef.p...@gmail.com wrote:
> On Thu, Sep 30, 2010 at 9:52 AM,  <josef.p...@gmail.com> wrote:

burakb

unread,
Sep 30, 2010, 11:06:14 AM9/30/10
to pystatsmodels
One note: For glm.2 the example adds I(endtime^2), trying to increase
variance I assume for testing purposes.

On Sep 30, 6:00 pm, burakb <burakbayra...@gmail.com> wrote:
> One test case I have is from here:
>
> http://www.stat.wisc.edu/courses/st572-larget/Spring2007/handouts24-2...

burakb

unread,
Sep 30, 2010, 11:18:55 AM9/30/10
to pystatsmodels
Actually ignore the last two messages; it seems there is no difference
between poisson and quasipoisson calls. I need to find a better
example.

josef...@gmail.com

unread,
Sep 30, 2010, 11:22:44 AM9/30/10
to pystat...@googlegroups.com
On Thu, Sep 30, 2010 at 11:00 AM, burakb <burakb...@gmail.com> wrote:
> One test case I have is from here:
>
> http://www.stat.wisc.edu/courses/st572-larget/Spring2007/handouts24-2.pdf
>
> It's the so-called Aberrant Crypt Foci example. R test code is here:
>
> http://dl.dropbox.com/u/1570604/quasi.zip
>
> Test code first runs glm using poisson, then quasipoisson, residual
> deviance is 28.369 and 24.515 respectively; quasipoisson improving the
> fit a little.
>
> I looked at ex_misc_poisson2.py; I guess what I need is PoissonZiGMLE
> call ?

Yes, it should work without an offset, look at mod4 at the end. There
are similar examples in miscmodels/tests for all 3 poisson cases.

Josef

burakb

unread,
Sep 30, 2010, 11:45:57 AM9/30/10
to pystatsmodels
You are right; negative binomial gave much better results.

http://dl.dropbox.com/u/1570604/burg.zip

Is there negative binomial based regression on statsmodels?

On Sep 30, 6:22 pm, josef.p...@gmail.com wrote:
> On Thu, Sep 30, 2010 at 11:00 AM, burakb <burakbayra...@gmail.com> wrote:
> > One test case I have is from here:
>
> >http://www.stat.wisc.edu/courses/st572-larget/Spring2007/handouts24-2...

josef...@gmail.com

unread,
Sep 30, 2010, 11:57:49 AM9/30/10
to pystat...@googlegroups.com
On Thu, Sep 30, 2010 at 11:45 AM, burakb <burakb...@gmail.com> wrote:
> You are right; negative binomial gave much better results.
>
> http://dl.dropbox.com/u/1570604/burg.zip
>
> Is  there negative binomial based regression on statsmodels?

there is NegativeBinomial as a GLM family

discretemod has also negative binomial, but there are some code
comments that it is not fully working. I haven't tried it yet or not
recently enough to remember. .

Josef

burakb

unread,
Sep 30, 2010, 12:20:22 PM9/30/10
to pystatsmodels
Hi Josef, there was an error with Poisson fit. BUT Negative Binomial
worked fine, and gave results very close to MASS glm.nb.

http://dl.dropbox.com/u/1570604/burg.zip

The error message says I should report it the error from GLM Poisson,
so I am

Warning: divide by zero encountered in log
Warning: invalid value encountered in multiply
Traceback (most recent call last):
File "burglary.py", line 11, in <module>
res = glm.fit()
File "/usr/local/lib/python2.6/dist-packages/
scikits.statsmodels-0.2.0-py2.6.egg/scikits/statsmodels/glm.py", line
378, in fit
returned a nan. This could be a boundary problem and should be
reported."
ValueError: The first guess on the deviance function returned a nan.
This could be a boundary problem and should be reported.

On Sep 30, 6:57 pm, josef.p...@gmail.com wrote:

josef...@gmail.com

unread,
Sep 30, 2010, 12:40:48 PM9/30/10
to pystat...@googlegroups.com
On Thu, Sep 30, 2010 at 12:20 PM, burakb <burakb...@gmail.com> wrote:
> Hi Josef, there was an error with Poisson fit. BUT Negative Binomial
> worked fine, and gave results very close to MASS glm.nb.

good to hear


>
> http://dl.dropbox.com/u/1570604/burg.zip
>
> The error message says I should report it the error from GLM Poisson,
> so I am
>
> Warning: divide by zero encountered in log
> Warning: invalid value encountered in multiply
> Traceback (most recent call last):
>  File "burglary.py", line 11, in <module>
>    res = glm.fit()
>  File "/usr/local/lib/python2.6/dist-packages/
> scikits.statsmodels-0.2.0-py2.6.egg/scikits/statsmodels/glm.py", line
> 378, in fit
>    returned a nan.  This could be a boundary problem and should be
> reported."
> ValueError: The first guess on the deviance function returned a nan.
> This could be a boundary problem and should be reported.

This has been fixed during summer and is not in 0.2.0. If you run it
with the devel branch or trunk, then it should work.

It's a problem if there are zeros, 0*log(0)

burakb

unread,
Sep 30, 2010, 12:41:11 PM9/30/10
to pystatsmodels
When I clean out data with '0' values, Poisson fit works too.

josef...@gmail.com

unread,
Sep 30, 2010, 8:00:37 PM9/30/10
to pystat...@googlegroups.com
On Thu, Sep 30, 2010 at 12:41 PM, burakb <burakb...@gmail.com> wrote:
> When I clean out data with '0' values, Poisson fit works too.
>
> On Sep 30, 7:20 pm, burakb <burakbayra...@gmail.com> wrote:
>> Hi Josef, there was an error with Poisson fit. BUT Negative Binomial
>> worked fine, and gave results very close to MASS glm.nb.
>>
>> http://dl.dropbox.com/u/1570604/burg.zip

We also have a summary for glm in development:

>>> res.summary()
Generalized linear model
===================================================================================
Model Family: NegativeBinomial # of obs:
500
Method: IRLS Df residuals:
498
Dependent Variable: Y Df model:
1
Date: Thu, 30 Sep 2010 Scale:
0.3543
Time: 19:46:20 Log likelihood:
-1482.3815
=================================================================================
coefficient stand errors t-statistic Conf.
Interval
---------------------------------------------------------------------------------
x0 5.5857 42.1032 [ 5.326,
5.846]
x1 -0.0608 -27.9251 [-0.065,
-0.057]
=================================================================================

example doesn't look zero-inflated from the histogram, so ZIP won't
help in this case.

Do you know what goodness-of-fit and diagnostic test statistics are
commonly used for count data?

I haven't looked at your links yet, is the burglary data also from the
wisconsin website ?

A note: in trunk (and devel) the module "family" has been renamed to
"families", e.g. family=sm.families.Poisson()

Josef

burakb

unread,
Oct 1, 2010, 4:56:31 AM10/1/10
to pystatsmodels

> example doesn't look zero-inflated from the histogram, so ZIP won't
> help in this case.
>
> Do you know what goodness-of-fit and diagnostic test statistics are
> commonly used for count data?
>

There is a overdispersion test in R qcc package called
qcc.overdispersion.test, can return '1' or '0'. Its code most likely
standardizes residuals from a calculated mean, then checks the
statistic on pchisq().. I actually code this myself, have not used qcc
call. I dont know about others.

> I haven't looked at your links yet, is the burglary data also from the
> wisconsin website ?

I took this data from statpower.net site, the lecturer had it along
with his lecture notes, I assume it is a classic, well-known data set.
Wisconsin is a common repo for this kinda stuff right? It could be
from there..
Reply all
Reply to author
Forward
0 new messages