Generalized linear models

630 views
Skip to first unread message

Douglas Bates

unread,
Apr 4, 2012, 12:45:00 PM4/4/12
to juli...@googlegroups.com
I have an initial implementation of generalized linear models (GLMs) in Julia at https://github.com/dmbates/julia/extras/glm.jl

The generalized linear model is a type of regression model that provides for different distributions of the response, such as Bernoulli (for binary data), Poisson (for count data), etc.  It is a general class of models that includes specific types such as logistic regression, Poisson regression, probit regression, etc.

In R the glm function for fitting such models calls glm.fit to perform the model fitting using an algorithm called Iteratively Reweighted Least Squares (IRLS).  R also provides "family" objects that represent the distribution being used and the link function.  What I have written are composite types for the distribution and for the link plus instantiations for common distributions and links.  I have also written an initial implementation of glmFit.  A simple test file is enclosed.  Running that on my system produces

Coefficient vector for simulation: [-0.54105, 0.308874, -1.02009]
Converged to coefficient vector: [-0.551101, 0.301489, -1.02574]
Deviance at estimates: 6858.466827205941
Elapsed time for fitting binary response with X 10000 by 3: 0.05460500717163086 seconds

A corresponding R test produces similar results and comparable timing (about 0.1 seconds).  (Because R and Julia are using different random number generators the two tests are on problems of the same size but with different data.)

As a first cut at Julia code for fitting such models this is encouraging.  To me the code in Julia is clearer than the R code but also has a few traps for the unwary.  (I'm actually modifying the contents of the response and predictor objects during the iterations, which may cause some surprises).  To be fair, the R code in glm.fit is very old and not the finest example of R programming in existence.

I am still intrigued by the idea of using distributed arrays for such algorithms (set up the model matrix X as an array distributed on the first dimension and all the vectors: y, mu, eta, etc. as distributed.  Then the update and accumulate steps can be performed in parallel.)  I haven't yet been able to figure out enough of the distributed arrays stuff to do that, however.

glmTest.jl
glmTest.R

Patrick O'Leary

unread,
Apr 4, 2012, 1:01:57 PM4/4/12
to juli...@googlegroups.com
On Wednesday, April 4, 2012 11:45:00 AM UTC-5, Douglas Bates wrote:
I have an initial implementation of generalized linear models (GLMs) in Julia at https://github.com/dmbates/julia/extras/glm.jl
Reply all
Reply to author
Forward
0 new messages