Python/statsmodels solutions for "An Introduction to GLMs", by Dobson & Barnett

143 views
Skip to first unread message

Thomas Haslwanter

unread,
May 15, 2013, 11:50:10 AM5/15/13
to pystat...@googlegroups.com
I have now finished the Python/statsmodels implementation of all the code that is provided in R or Stata in the book

Dobson AJ & Barnett AG: "An Introduction to Generalized Linear Models"
3rd ed
CRC Press(2008)

You can find the results on
https://github.com/thomas-haslwanter/dobson

As far as I can tell, there are three points where the statsmodels results are inconsistent with the results provided in the book.
An additional three problems can not be solved with the current functions in statsmodels. Solutions seem to be in progress/planned for the missing functions.
One point in the book is unclear to me, so I was not sure what should be implemented there.
Also missing are systematic tests, which should be added before this code goes into a book.

As for asking Annette Dobson if she would be willing to include the Python/statsmodels code into the next version of her book, I think the three "Errors" below should first be fixed, or the differences at least logically justified (if they represent different assumptions for the underlying functions.)

- [Error 1] cloglog values in "logistic_regression" are wrong

- [Unclear] in "senility_and_WAIS" I don't understand what the "grouped response"
  is supposed to mean

- [Error 2] the signs of the paramters in "nominal_logistic_regression" are
  incorrect

- [Missing 1] "ordinal_logistic_regression" is not yet implemented in statsmodels

- [Error 3] the standard errors in "poisson_regression" are wrong

- [Missing 2] Cox proportional hazards are not yet implemented in statsmodels

- [Missing 3] Repeated measures models are not yet implemented in statsmodels


Skipper Seabold

unread,
May 15, 2013, 12:25:24 PM5/15/13
to pystat...@googlegroups.com
On Wed, May 15, 2013 at 11:50 AM, Thomas Haslwanter
<thomas.h...@gmail.com> wrote:
> I have now finished the Python/statsmodels implementation of all the code
> that is provided in R or Stata in the book
>
> Dobson AJ & Barnett AG: "An Introduction to Generalized Linear Models"
> 3rd ed
> CRC Press(2008)
>
> You can find the results on
> https://github.com/thomas-haslwanter/dobson
>

This is great. Thanks for doing this.
Correct me if I'm wrong, but I thought the conclusion of the various
threads was that none of these errors are indeed errors. cloglog is
the only one I'm not sure about because I didn't follow the thread,
though if anything it seemed to be a different (and incorrect)
convention in R vs. what we use which is what stata uses.

Skipper

josef...@gmail.com

unread,
May 15, 2013, 12:34:34 PM5/15/13
to pystat...@googlegroups.com
On Wed, May 15, 2013 at 11:50 AM, Thomas Haslwanter
<thomas.h...@gmail.com> wrote:
> I have now finished the Python/statsmodels implementation of all the code
> that is provided in R or Stata in the book
>
> Dobson AJ & Barnett AG: "An Introduction to Generalized Linear Models"
> 3rd ed
> CRC Press(2008)
>
> You can find the results on
> https://github.com/thomas-haslwanter/dobson
>
> As far as I can tell, there are three points where the statsmodels results
> are inconsistent with the results provided in the book.
> An additional three problems can not be solved with the current functions in
> statsmodels. Solutions seem to be in progress/planned for the missing
> functions.
> One point in the book is unclear to me, so I was not sure what should be
> implemented there.
> Also missing are systematic tests, which should be added before this code
> goes into a book.
>
> As for asking Annette Dobson if she would be willing to include the
> Python/statsmodels code into the next version of her book, I think the three
> "Errors" below should first be fixed, or the differences at least logically
> justified (if they represent different assumptions for the underlying
> functions.)
>
> - [Error 1] cloglog values in "logistic_regression" are wrong

my guess is that loglog is missing (and cloglog is right)

>
> - [Unclear] in "senility_and_WAIS" I don't understand what the "grouped
> response"
> is supposed to mean

I don't have the book, so I don't know what this is.

>
> - [Error 2] the signs of the paramters in "nominal_logistic_regression" are
> incorrect

not incorrect, they don't match because of different reference category
needs work around until we have an option to choose the reference category

>
> - [Missing 1] "ordinal_logistic_regression" is not yet implemented in
> statsmodels

needs volunteers

>
> - [Error 3] the standard errors in "poisson_regression" are wrong

see thread, need to use exposure as Skipper showed

>
> - [Missing 2] Cox proportional hazards are not yet implemented in
> statsmodels

Is in a branch in a refactoring queue

>
> - [Missing 3] Repeated measures models are not yet implemented in
> statsmodels

big gap that hopefully will be closed within half a year (at least for
the basic models)


Thanks again


Josef

>
>

josef...@gmail.com

unread,
May 15, 2013, 1:02:27 PM5/15/13
to pystat...@googlegroups.com
(needed to run before finishing)

I think this is *very* useful, both in terms of documentation (and
advertising), but also for us to see where we stand with statsmodels.
I wish we had more work in this direction, also for some of the
application fields.

One useful result is, that we see where the major gaps are. In this
case, the only one where we don't have specific plans or where I don't
know anyone working on it, is ordered logit (but we might get more
discrete choice during summer as GSOC project).

I tried to go through Greene in a bit similar way (I wasn't patient
enough to write all the examples)
http://www.amazon.com/Econometric-Analysis-7th-William-Greene/dp/0131395386
But Greene is now at 1232 pages, and I think we cover at most half of it.

Josef

>
>
> Josef
>
>>
>>
Reply all
Reply to author
Forward
0 new messages