More thorough follow-up. Hopefully this isn't too cryptic. Please let me know what I leave unclear.
(1) I really like the idea of pushing on Julia as a tool for MCMC, but would like to encourage us to start small. A full MCMC system like JAGS or Stan requires a team whose job is exclusively to maintain the system. For example, Stan (whose sole purpose is to be faster than JAGS) occasionally discovers that JAGS has a clever trick for special classes of models that lets JAGS beat out Stan. See
https://groups.google.com/forum/#!topic/stan-users/QDumPPbRa4Q/discussion for one example. Maintaining a system that's competitive with JAGS/Stan will be a major undertaking on the same scale as building Stan, which had three full time developers who were all postdocs in Gelman's group working on the project for more than one year.
(2) One way we could start small is to focus on translating a JAGS-like syntax into Julia code that uses the samplers in Distributions. (Or, as the StrangeLoop crowd would say, we need a JAGS-to-Julia transpiler.) I think this is promising because it seems like we could very slowly build up a mini-language that can be translated into very specific Julia code. For example, we could just implement the standard conjugate pairs and provide a brief language for working with those that is translated into Julia. I would really like something like that, because it would make the system much more transparent than JAGS, which is very hard to debug. If a modeling language were translated into Julia, we would not have to worry as much about performance because the end-user could customize the produced code when they realize that the translator isn't exploiting domain knowledge. The problem with JAGS/Stan is that the efficiency they achieve is largely a black-box, which makes it very hard for outsiders to contribute to the project. In other words, we could try to do better than JAGS/Stan by following Julia's strategy of doing everything in Julia, rather than dropping to a C-language for speed.
(3) Before we could even start doing (2), we'll need two things: a nice parsing library and a nice automatic differentiation library. Perhaps those should be our original focus? I know there's work on automatic differentiation already. I'm less aware of parsing, although it's come up in the context of Formula syntax for GLM's in JuliaData discussions. Maybe thinking more about that is the best place to start, especially because I have so little knowledge of how to write a decent parser.
(4) I think BUGS is much less like SAS/SPSS than it might seem. BUGS is a full programming language (it might actually be Turing complete) that allows one to implement essentially any model imaginable. It's just horribly slow and clunky when you start to write models that it wasn't designed to handle. Mixture models, for example, can be very hard to express in BUGS. Stan is, linguistically, as expressive as BUGS, except that the Hamiltonian MCMC tricks that power it only apply to models with continuous variables, because the system increases statistical efficiency by using the gradient of the log posterior of a model. In short, it's basically the same language, but translates continuous models into better algorithms for sampling the model parameters.
MCMC-style computations are what drew me to Julia originally, so I hope we can find useful places for abstraction here.
-- John
> --
>
>
>