Are GLM models equivalent to their non-GLM models?

Stuart Reynolds

unread,

Apr 25, 2017, 6:54:36 PM4/25/17

to pystatsmodels

Should I expect different output from:

sm.GLM(y, X, family=sm.families.Poisson()).fit(method="lbfgs").predict(X)

and

sm.Poisson(y, X).fit(method="lbfgs").predict(X)

I seems to get very different answers for the same inputs.

Thanks,

- Stuart

josef...@gmail.com

unread,

Apr 25, 2017, 7:28:47 PM4/25/17

to pystatsmodels

The models are theoretically the same, so there should not be any
difference beyond convergence criteria.

When I compared most or several of the GLM models with their discrete
counterpart, then they basically agreed in params and standard error
(and other items I checked), except that the Hessian in Probit differs
from the Hessian in GLM Binomial with gaussian link at a few decimals.
I didn't manage to find a reason for it.

Nevertheless, the code and implementation differ quite a bit, and
default convergence criteria might differ. I compared mainly GLM IRLS
with discrete, but AFAIK we have unit tests comparing GLM with scipy
optimizers with IRLS for most families and links.

AFAIK, these comparisons are mostly with "nice" cases.

So the main sources for possible differences are in numerical
precision, including edge behavior predictions close to zero in case
of Poisson, and optimization and convergence problems.
For the latter, a quick check is `results.model.score(results.params)`
to check whether the gradient is close to zero, which it should be if
the optimization has converged. And related trying different
optimizers, e.g. newton methods are best in the neighborhood of the
optimum because it uses gradient and hessian information and provides
locally the most accurate convergence. One big difference is that they
have different starting values for the parameters and I think GLM
should be more robust in most cases.

Another theoretical difference: It is possible to have convergence
problem because of perfect separation also in Poisson. But I only saw
some comments for it and never ran into an example.
However, even in perfect separation cases the prediction should not
differ by much even if the parameters didn't converge to the same
values.

Check convergence, and if that is not the problem, then you could
provide a dataset that illustrates it so we can investigate.

Bugs are never ruled out, but are very very unlikely in this case
(except for corner case behavior) because Poisson is a popular case
for unit tests.

Josef

>
> Thanks,
> - Stuart

josef...@gmail.com

unread,

Apr 25, 2017, 7:43:41 PM4/25/17

to pystatsmodels

(not relevant for predict)
To add a bit more on general differences:
AFAIK, GLM IRLS uses the standard expected information matrix for
cov_params. GLM with scipy optimizers and all discrete models use the
hessian, i.e. observed information matrix. The two differ in cases
with non-canonical link.
Extra options for cov_params are on the wishlist but I haven't gotten
back to this yet to see how to add them.

NegativeBinomial are only partially the same model. GLM
NegativeBinomial uses fixed, user provided dispersion parameter.
discrete.NegativeBinomial has 3 versions built-in and one of them
corresponds to the GLM version, and the discrete version estimates the
dispersion parameter as part of the maximum likelihood estimate.

Scale in GLM NegativeBinomial corresponds to excess dispersion. I
don't know if this is a bug or feature, or just an imitation of R.
https://github.com/statsmodels/statsmodels/issues/2888 (It's
inconsistent with the rest because the default in GLM is MLE and not
Quasi-MLE.)

Josef

>
>
>
>
>
>>
>> Thanks,
>> - Stuart

josef...@gmail.com

unread,

Apr 25, 2017, 9:40:08 PM4/25/17

to pystatsmodels

one more difference between GLM and discrete that I just remembered

predict is fine, but fittedvalues in discrete models are a design bug.
fittedvalues in discrete_model are the linear predictor and not the
expected value of the response as in GLM and which should be the
"standard convention" for statsmodels.

recommendation: don't use discrete_models fittedvalues, because
eventually we have to change it.

Josef

>
>
> Josef
>
>
>
>>
>>
>>
>>
>>
>>>
>>> Thanks,
>>> - Stuart

Reply all

Reply to author

Forward