example heteroscedasticity, overdispersion, Poisson

josef...@gmail.com

unread,

Apr 13, 2016, 1:08:36 AM4/13/16

to pystatsmodels

Here is a quickly made-up example where standard Poisson doesn't work, and NegativeBinomial would work only partially.

We have robust standard errors, but we don't have yet any facility in GLM or discrete to model heteroscedasticity, i.e. overdispersion or variance that varies with explanatory variables.

It's also an example where two sample comparison for Poisson rates and all the "exact" hypothesis tests will break down because the assumptions are not satisfied.

Standard GLM Quasi-Poisson (in R) also wouldn't work for varying overdispersion.

https://gist.github.com/josef-pkt/ff08f8c446576faa3654d17694da01fc

-----------------------

Overdispersion, Poisson, two-sample comparison¶

The following is a simple example in preparation of modelling dispersion in GLM.

The setup is a simple case of comparing the mean in two samples using Poisson. The difficulty is that the assumption that the samples come from a Poisson distribution is violated. Instead, the samples are created with Negative Binomial distribution with different amounts of overdispersion in the two samples. This is analogous to a t-test where the two samples have different variances.

The results:

Poisson underestimates the standard errors of the parameters

Using QuasiGLM adjusts the standard errors in the right direction, but uses an average adjustment instead of a sample specific adjustment for overdispersion

Using heteroscedasticity robust sandwich estimator corrects for both overdispersion and for different amounts of overdispersion in the two samples

Adjusting the standard errors for each sample separately by the amount of sample specific overdispersion produces standard errors that are very close to the HC standard errors.

---------------------------------------

Josef

josef...@gmail.com

unread,

Apr 13, 2016, 1:25:52 AM4/13/16

to pystatsmodels

I added an example for how to use GLM with the Quasi-Poisson overdispersion correction.

I just discovered a few days ago that this is already implemented. I have no idea what the status for this is.

This is not an advertising for it, because I need to review it.

https://github.com/statsmodels/statsmodels/issues/2888

I've never seen an example where somebody used this (or I didn't pay attention).

Josef

josef...@gmail.com

unread,

Apr 13, 2016, 12:23:37 PM4/13/16

to pystatsmodels

Another followup without issue yet.

That's finally a good case for adding specification tests for heteroscedasticity to GLM and discrete models similar to what's available for the linear model.

GLM/exponential family has builtin heterescedasticity through the variance function, but that might be not the correct one for a specific dataset.

(BTW: stata has a heteroscedastic probit model)

Josef

josef...@gmail.com

unread,

Apr 13, 2016, 8:05:52 PM4/13/16

to pystatsmodels

On Wed, Apr 13, 2016 at 12:23 PM, <josef...@gmail.com> wrote:

On Wed, Apr 13, 2016 at 1:25 AM, <josef...@gmail.com> wrote:

On Wed, Apr 13, 2016 at 1:08 AM, <josef...@gmail.com> wrote:
Here is a quickly made-up example where standard Poisson doesn't work, and NegativeBinomial would work only partially.

We have robust standard errors, but we don't have yet any facility in GLM or discrete to model heteroscedasticity, i.e. overdispersion or variance that varies with explanatory variables.

It's also an example where two sample comparison for Poisson rates and all the "exact" hypothesis tests will break down because the assumptions are not satisfied.

Standard GLM Quasi-Poisson (in R) also wouldn't work for varying overdispersion.

https://gist.github.com/josef-pkt/ff08f8c446576faa3654d17694da01fc

-----------------------
Overdispersion, Poisson, two-sample comparison¶

The following is a simple example in preparation of modelling dispersion in GLM.

The setup is a simple case of comparing the mean in two samples using Poisson. The difficulty is that the assumption that the samples come from a Poisson distribution is violated. Instead, the samples are created with Negative Binomial distribution with different amounts of overdispersion in the two samples. This is analogous to a t-test where the two samples have different variances.

The results:

Poisson underestimates the standard errors of the parameters

Using QuasiGLM adjusts the standard errors in the right direction, but uses an average adjustment instead of a sample specific adjustment for overdispersion

Using heteroscedasticity robust sandwich estimator corrects for both overdispersion and for different amounts of overdispersion in the two samples

Adjusting the standard errors for each sample separately by the amount of sample specific overdispersion produces standard errors that are very close to the HC standard errors.
---------------------------------------

Josef

I added an example for how to use GLM with the Quasi-Poisson overdispersion correction.

I just discovered a few days ago that this is already implemented. I have no idea what the status for this is.
This is not an advertising for it, because I need to review it.
https://github.com/statsmodels/statsmodels/issues/2888

I've never seen an example where somebody used this (or I didn't pay attention).

I didn't pay enough attention and did only selective readings in the past.

Kerby has used it in a notebook already last year for overdispersed Poisson

http://nbviewer.jupyter.org/urls/umich.box.com/shared/static/0lgt5635uo6rhakjj9d5.ipynb

Which means it's ancient code subject to backwards compatibility policy.

(When I saw this initially I thought it's only for continuous distributions with scale estimates, I didn't know much about GLM back when.)

Josef

Reply all

Reply to author

Forward