Here are some more thoughts on various topics. Bottom line
is we should stay on our toes, but I think we'll be OK for
the foreseeable future.
This isn't so much just PyMC3, of course --- there are lots of
other packages out there with functionality very much overlapping
Stan's (BUGS, JAGS, Laplace's Demon, Church and its progeny,
Figaro, emcee, NONMEM, GPstuff, lme4, NIMBLE, INLA, etc. etc.).
METAPROGRAMMING
The key drawback for Stan is that we can't do any
metaprogramming relative to the Stan object language because
the Stan program itself isn't an object.
I'm genuinely curious what people would use it for.
I think this is an area that's really worth exploring.
It really struck me looking at Jeromy Anglin's workflow
for building and comparing multiple models in JAGS. He
was reduced to cutting and pasting strings to swap in
and out priors, etc., which is pretty painful.
One thing we could do is define a graphical language within
Python and translate to Stan. Perhaps even PyMC's if it's
decent --- I don't quite understand its boundaries from
the examples. I was hoping we could piggyback on NIMBLE for
that in R.
FLEXIBILITY
Exactly what I'm trying to understand is how flexible
the PyMC modeling language is given that it's backed by
Theano. Rob was digging around in the Python source today
trying to figure out which probability functions they
support.
I have the same question about Church, etc.
HARD WORK
Learning the Stan language involves work --- a lot of work to
learn it well. I'd compare that to learning the PyMC structures
you need in Python. In the end, assuming Python competence in
users and reasonable design choices, PyMC3 should be
less work to learn.
PyMC3 is also completely in-workflow --- you don't need to call
another language and have source files for it. This is good, but
to the extent that we could define a model by a simple Python
structure, we might be able to swing something along these lines
ourselves. In which case, it'd just come down to ease of installation
and who designed the better Python interface.
From what I've seen, Stan's language is more direct for statistical
modeling, but is obviously way more limited in what it can do than
Python. We can't stop and open up the Twitter API for more data in
the middle of a run, for instance --- I'm not sure if PyMC3 would
give you that kind of control (defining a variable in the model
as doing external work or getting updated asynchronously).
SPEED
Someone should measure this. We've put a ton of effort into
our derivative speed and multivariates. It'd be interesting
to see how PyMC3 compares.
FEATURES
Also consider things like constrained variables, covariance priors,
Cholesky factors, ODE solvers, etc. These should all be possible
in something like PyMC3, but I have no idea to tell what they've
already implmented.
They certainly already have optimization, but I don't know if the
do the lm-like uncertainty estimation. Of course, that'd be pretty
easy.
PORTABILITY
Stan models are portable across R, Python, command-line, etc.
That's good from a model-building community point of view, but
sub-optimal from an infrastructure perspective and it means we
have to do a ton of work as a team to field all these interfaces (the
point of the ongoing Stan 3 design and Allen's visit was to figure out
simpler interfaces without losing any power).
The same goes for platforms. Windows is an ongoing pain, but there
are a lot of users there. R's also been restricting what we can
do, as is Python to some extent in terms of moving to C++11, updating
dependent packages, structuring interface data types, etc.
For all we know, Julia's the future and Python and R are soon-to-be
zombies! I'd rather hedge our bets on the platform choices.
- Bob