Generalized linear models

630 views

Skip to first unread message

Douglas Bates

unread,

Apr 4, 2012, 12:45:00 PM4/4/12

to juli...@googlegroups.com

I have an initial implementation of generalized linear models (GLMs) in Julia at https://github.com/dmbates/julia/extras/glm.jl

The generalized linear model is a type of regression model that provides for different distributions of the response, such as Bernoulli (for binary data), Poisson (for count data), etc. It is a general class of models that includes specific types such as logistic regression, Poisson regression, probit regression, etc.

In R the glm function for fitting such models calls glm.fit to perform the model fitting using an algorithm called Iteratively Reweighted Least Squares (IRLS). R also provides "family" objects that represent the distribution being used and the link function. What I have written are composite types for the distribution and for the link plus instantiations for common distributions and links. I have also written an initial implementation of glmFit. A simple test file is enclosed. Running that on my system produces

Coefficient vector for simulation: [-0.54105, 0.308874, -1.02009]

Converged to coefficient vector: [-0.551101, 0.301489, -1.02574]

Deviance at estimates: 6858.466827205941

Elapsed time for fitting binary response with X 10000 by 3: 0.05460500717163086 seconds

A corresponding R test produces similar results and comparable timing (about 0.1 seconds). (Because R and Julia are using different random number generators the two tests are on problems of the same size but with different data.)

As a first cut at Julia code for fitting such models this is encouraging. To me the code in Julia is clearer than the R code but also has a few traps for the unwary. (I'm actually modifying the contents of the response and predictor objects during the iterations, which may cause some surprises). To be fair, the R code in glm.fit is very old and not the finest example of R programming in existence.

I am still intrigued by the idea of using distributed arrays for such algorithms (set up the model matrix X as an array distributed on the first dimension and all the vectors: y, mu, eta, etc. as distributed. Then the update and accumulate steps can be performed in parallel.) I haven't yet been able to figure out enough of the distributed arrays stuff to do that, however.

glmTest.jl

glmTest.R

Patrick O'Leary

unread,

Apr 4, 2012, 1:01:57 PM4/4/12

to juli...@googlegroups.com

On Wednesday, April 4, 2012 11:45:00 AM UTC-5, Douglas Bates wrote:

I have an initial implementation of generalized linear models (GLMs) in Julia at https://github.com/dmbates/julia/extras/glm.jl

Github doesn't seem to like that; try https://github.com/dmbates/julia/blob/master/extras/glm.jl.

Reply all

Reply to author

Forward

0 new messages