Possion regressor usage

14 views
Skip to first unread message

Johannes Moeller

unread,
Jun 21, 2023, 7:16:57 AM6/21/23
to pystatsmodels

I am tying to replicate a fit that looks like this in R:

DLmodel <- glm(outcome ~ comparison + A + B + C + D + delta2 + delta3,family = poisson, data = expanded_data)
or supposedly equivalent but more efficient:
DLmodel <- gnm(outcome ~ A + B + C + D + delta2 + delta3, eliminate = comparison,
family = poisson, data = expanded_data)

I have put the same Data in a pandas frame and called
model2 =  sm.formula.glm("outcome ~ comparison + A + B + C + D + delta2 +delta3",family=sm.families.Poisson(), data=data)

My Results are orders of magnitude wrong, and I suspect that I am lacking some statistics knowledge, about what i am actually doing.

It is from this paper, complete example in Appendix A:
https://arxiv.org/pdf/1909.07123.pdf

josef...@gmail.com

unread,
Jun 21, 2023, 10:00:19 AM6/21/23
to pystat...@googlegroups.com
Is comparison supposed to be treated as factor or categorical variable?

In that case you will need "C(comparison)" in the formula. Check that the formula creates the same design matrix as in R.
Based on a brief look I don't see any other reason.

The basic Poisson in GLM and discrete works in the same way as in other packages, R, Stata, in regular cases.
If there are numerical problems or (near) singular design matrix, then the behavior can differ by design.


Josef

--
You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/e9caf5b1-cc71-4fa6-af57-25b8c651307en%40googlegroups.com.
Message has been deleted

josef...@gmail.com

unread,
Jun 21, 2023, 10:56:00 AM6/21/23
to pystat...@googlegroups.com


On Wed, Jun 21, 2023 at 10:18 AM Johannes Moeller <ggf...@gmail.com> wrote:
Thanks for your Answer!
And for the hint about categorial variables: Unfortunately I already tried that.
I think it should be one, but declaring it such does not make any difference.

I thought I might have missed a default option that differs between the Implementations.
But since I cannot even produce significantly different (wrong) output when changing options, I do not even know where to look.

Paper example:

## Coefficients of interest:
## A B C D delta2 delta3
## 2.071 6.864 2.071 NA 2.390 3.249

coefficient for D is `NA` here.
This means most likely that the design matrix is singular, and R drops "D" as column.
statsmodels uses pinv in the singular case, so params will differ.
I.e. the packages choose a different solution among the set of unidentified solutions.

AFAIU your variables below, you should drop "alt_4" as a reference category to match R 

 

My results:
                           coef    std err          z      P>|z|      [0.025      0.975]
----------------------------------------------------------------------------------------
Intercept              -32.1278   4627.716     -0.007      0.994   -9102.285    9038.029
C(comparison)[T.2.0]    -0.0001      1.683  -6.05e-05      1.000      -3.299       3.298
C(comparison)[T.3.0]    -0.4904      1.595     -0.307      0.758      -3.617       2.636
C(comparison)[T.4.0]     0.2562      1.455      0.176      0.860      -2.596       3.108
alt_1                   30.9017   4627.716      0.007      0.995   -9039.255    9101.058
alt_2                   30.9016   4627.716      0.007      0.995   -9039.255    9101.058
alt_3                   29.5082   4627.716      0.006      0.995   -9040.649    9099.666
alt_4                   28.2816   4627.717      0.006      0.995   -9041.876    9098.439
delta2                   0.5230      1.042      0.502      0.616      -1.519       2.565
delta3                   1.0617     46.297      0.023      0.982     -89.678      91.802
========================================================================================

Johannes Moeller

unread,
Jun 25, 2023, 6:40:34 AM6/25/23
to pystatsmodels
Thanks for your answers, Josef.

The real problem was me misunderstanding what the fit should actually look like ^^
Reply all
Reply to author
Forward
0 new messages