I think you mean GLMMs (generalized linear mixed models) not GLMs which, not surprisingly, are found in the GLM.jl package.
What is it particularly about glmer2stan that you want to accomplish?
I have thought of MCMC methods for linear mixed models and generalized linear mixed models but always end up stumbling over updates of the covariance matrices for the random effects. Also I am not entirely sure I know how to define priors for covariance matrices in a way that would be comfortable for me.
I guess I should tread carefully when discussing MCMC to avoid stepping on other's toes, but it seems peculiar to me to want to perform a Bayesian analysis with MCMC when the maximum likelihood estimates are relatively easy to obtain. Look at the first example from the glmer2stan package in Julia
julia> using DataFrames, MixedModels, RCall
julia> sleep = rcopy("lme4::sleepstudy");
julia> @time m2 = fit!(lmm(Reaction ~ 1 + Days + (1+Days|Subject), sleep))
0.020823 seconds (32.90 k allocations: 1.277 MB, 54.73% gc time)
Linear mixed model fit by maximum likelihood
logLik: -875.969672, deviance: 1751.939344, AIC: 1763.939344, BIC: 1783.097086
Variance components:
Variance Std.Dev. Corr.
Subject 565.51066 23.780468
32.68212 5.716828 0.08
Residual 654.94145 25.591824
Number of obs: 180; levels of grouping factors: 18
Fixed-effects parameters:
Estimate Std.Error z value
(Intercept) 251.405 6.63226 37.9064
Days 10.4673 1.50224 6.96781
Using glmer2stan in R the results are
# fit with glmer2stan
m1_g2s <- lmer2stan( Reaction ~ Days + (Days | subject_index), data=sleepstudy )
# [timings for 3GHz i5, no memory limitations]
#Elapsed Time: 46.1606 seconds (Warm-up)
# 32.3429 seconds (Sampling)
# 78.5035 seconds (Total)
julia> ml1m = rcopy("Timings::ml1m");
julia> @time fit!(lmm(Y ~ 1 + (1|G) + (1|H), ml1m))
25.253732 seconds (39.95 M allocations: 1.399 GB, 0.93% gc time)
Linear mixed model fit by maximum likelihood
logLik: -1331986.005811, deviance: 2663972.011622, AIC: 2663980.011622, BIC: 2664027.274500
Variance components:
Variance Std.Dev.
G 0.12985210 0.36034996
H 0.36879694 0.60728654
Residual 0.81390867 0.90216887
Number of obs: 1000209; levels of grouping factors: 6040, 3706
Fixed-effects parameters:
Estimate Std.Error z value
(Intercept) 3.33902 0.0114624 291.302
A naive MCMC formulation, such as produced by glmer2stan, would involve nearly 10,000 parameters.
To take a problem that is as considerably simplified as this is and throw it into general MCMC iterations over all the variance parameters, fixed-effects parameters and random effects in the hope that it will converge after a long period of time doesn't seem sensible to me.
Am I somehow missing the point of why MCMC analysis would be worthwhile?