I have an initial implementation of generalized linear models (GLMs) in Julia at
https://github.com/dmbates/julia/extras/glm.jl
The generalized linear model is a type of regression model that provides for different distributions of the response, such as Bernoulli (for binary data), Poisson (for count data), etc. It is a general class of models that includes specific types such as logistic regression, Poisson regression, probit regression, etc.
In R the glm function for fitting such models calls glm.fit to perform the model fitting using an algorithm called Iteratively Reweighted Least Squares (IRLS). R also provides "family" objects that represent the distribution being used and the link function. What I have written are composite types for the distribution and for the link plus instantiations for common distributions and links. I have also written an initial implementation of glmFit. A simple test file is enclosed. Running that on my system produces
Coefficient vector for simulation: [-0.54105, 0.308874, -1.02009]
Converged to coefficient vector: [-0.551101, 0.301489, -1.02574]
Deviance at estimates: 6858.466827205941
Elapsed time for fitting binary response with X 10000 by 3: 0.05460500717163086 seconds
A corresponding R test produces similar results and comparable timing (about 0.1 seconds). (Because R and Julia are using different random number generators the two tests are on problems of the same size but with different data.)
As a first cut at Julia code for fitting such models this is encouraging. To me the code in Julia is clearer than the R code but also has a few traps for the unwary. (I'm actually modifying the contents of the response and predictor objects during the iterations, which may cause some surprises). To be fair, the R code in glm.fit is very old and not the finest example of R programming in existence.
I am still intrigued by the idea of using distributed arrays for such algorithms (set up the model matrix X as an array distributed on the first dimension and all the vectors: y, mu, eta, etc. as distributed. Then the update and accumulate steps can be performed in parallel.) I haven't yet been able to figure out enough of the distributed arrays stuff to do that, however.