DEV: a toy package for experimenting with MCMC package design

88 views
Skip to first unread message

John Salvatier

unread,
Mar 16, 2011, 2:56:11 PM3/16/11
to py...@googlegroups.com
Hi Guys,

I've made a new package for experimenting with MCMC design (https://github.com/jsalvatier/mcex). It's short (about 300 lines total) and outsources its computational core to Theano. I the advantages and disadvantages of some of the design decisions I've made in the readme (https://github.com/jsalvatier/mcex/blob/master/readme.rst). I encourage others to fork it can play around with different designs. It currently only has one distribution (normal), but its easy to add more provided they use functions which Theano has or you add any extra functions you need (like gammaln) (I don't think this is difficult, but I haven't done it).

John

Thomas Wiecki

unread,
Mar 16, 2011, 4:01:02 PM3/16/11
to py...@googlegroups.com
Hi John,

that looks really interesting, thanks for sharing. Since it uses
theano, does that mean that all likelihoods can be computed on the GPU
transparently? That seems like a major advantage!

-Thomas

> --
> You received this message because you are subscribed to the Google Groups
> "PyMC" group.
> To post to this group, send email to py...@googlegroups.com.
> To unsubscribe from this group, send email to
> pymc+uns...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/pymc?hl=en.
>

John Salvatier

unread,
Mar 16, 2011, 4:12:56 PM3/16/11
to py...@googlegroups.com
I'm not well versed in Theano, but I haven't done anything very weird with Theano, so yes, that should work.

Anand Patil

unread,
Mar 17, 2011, 7:24:56 AM3/17/11
to py...@googlegroups.com
John,

Thanks for the announcement. PyMC is a much better package because of your efforts over the past several months, and I'm looking forward to seeing what new ideas you've expressed in this code.

Anand

Flavio Coelho

unread,
Mar 17, 2011, 10:31:38 AM3/17/11
to py...@googlegroups.com
Hi John,

It's a great initiative! I don't know much about theano, but will look into it.

On the map module, have you considered using openopt? For some applications, I have found its optimizers more robust than the ones from scipy.

thanks for sharing,

Flávio Codeço Coelho
================




--

John Salvatier

unread,
Mar 17, 2011, 11:14:07 AM3/17/11
to py...@googlegroups.com
I have not, but robustness has been an issue in the past, so I will look into it. Any optimizer in particular that you have had good experiences with? 

Flavio Coelho

unread,
Mar 17, 2011, 11:21:41 AM3/17/11
to py...@googlegroups.com, John Salvatier
I have used NLP sucessfuly for a variety of problems and to a lesser extent GLP. They are also more flexible (than the scipy ones) in the way of reporting convergence, and overall specification of the optimization approach.

best,


Flávio Codeço Coelho
================




Anand Patil

unread,
Mar 24, 2011, 11:00:43 AM3/24/11
to py...@googlegroups.com
Hi John,


I've hacked together a bare-bones, experimental version of PyMC in Theano also, which is at https://github.com/apatil/pymc-theano . "In theory", I think it should be much faster than PyMC... but as it stands it's much slower. However, some bits of it might still be useful in the future.

The ways I tried to improve on PyMC were: 
  - leave all graphical analysis to Theano. Step methods don't worry about their variables' markov blankets, they just ask for the joint log-probability of the entire model and let Theano figure out which terms it doesn't need to compute. 
  - Make fine-grained optimizations available. If you change the mean of a normal you don't have to update the -.5*log(2*pi*v) term. In PyMC it's counterproductive to cache at such a fine level, but Theano doesn't need to cache; it can cancel like terms at compile time. 
  - Separate the model from the state, so that multiple actors, possibly in multiple threads, can work on the model without interfering with each other.
  - Make it easier to seed computations for reproducibility.
  - Compile entire MCMC loops together.


To do that, I represented a model as the following data structure:

{'stream': A random stream, used by all the model's variables and the fitting methods.
'variables': The variables in a probability model, represented as Theano expressions. Variables can be either stochastic or deterministic conditional on their parents.
                If these expressions are evaluated directly, you get a draw from the model's joint prior.
'factors':  The individual terms in the joint log-probability of the model, broken down as finely as possible. These are also Theano expressions whose inputs are in 'variables'.}


Cheers,
Anand

John Salvatier

unread,
Mar 24, 2011, 12:23:16 PM3/24/11
to py...@googlegroups.com
We seem to have made similar choices, which I think is encouraging.

Some similarities:

MCEx also leaves the DAG analysis up to Theano, and I have also separated the state from the model.

Some differences:

I leave the evaluation of the model completely up to the step method (multiple step methods are handled via a CompoundStep method). Step methods are free to step however they like. I expect most step methods to use some kind of "model view" object which has some kind of interface for getting information about the model (such as the logp and gradient_logp at a certain point, or the conditional distribution for some free variable). The advantage of this approach is that no single Model object must support multiple ways of looking at the model and users are free to come up with their own ways of looking at the model (for example, analytically evaluated hamiltonian terms).

In the future, when I further separate the model from it's evaluation, either Gibbs sampling or Draw From Prior should be possible to incorporate as a different kinds of ModelView like objects.   

Some comments about pymc-theano:

I don't think you want .stream to be part of your model data structure. It will interfere with your separation between model and chain, and therefore parallelism.

I like the notion of compiling an MCMC loop together; I will have to look into that. I also like the idea of using random streams for randomization, reproducibility is a good thing.

Some comments about MCEx:

I am pleased with MCEx's design, it ended up very simple and modular. Each of the moving parts would be simple to replace. For example, sample() only knows about a step_method, a chain_state and a history object, and interacts with each of these only in one way. Writing this has made me realize that keeping the representation of the model and the evaluation of the model separate is a good general principle.

John

Thomas Wiecki

unread,
Mar 24, 2011, 12:28:28 PM3/24/11
to py...@googlegroups.com
Hi Anand,

that looks great. You don't give a reason as to why it's slower. Any ideas?

-Thomas

Anand Patil

unread,
Mar 24, 2011, 12:38:15 PM3/24/11
to py...@googlegroups.com
On Thu, Mar 24, 2011 at 4:28 PM, Thomas Wiecki <thomas...@googlemail.com> wrote:
Hi Anand,

that looks great. You don't give a reason as to why it's slower. Any ideas?

 None at all at the moment, I'm still just on the first rung of understanding Theano.

Anand

Anand Patil

unread,
Mar 24, 2011, 1:03:26 PM3/24/11
to py...@googlegroups.com
Hi John,


I just had a look at mcex also. As you said, we seem to have chosen similar designs in some ways. It's refreshing to see how concisely you could implement the derivative-based step methods by piggybacking on Theano's automatic differentiation.

We differed in that you compiled the logp's and gradients and then wrote all of your fitting functionality in straight Python, whereas I tried to compile entire MCMC loops, meaning step methods had to take Theano expressions and return new Theano expressions. Your approach definitely wins at the moment because mine is so slow, for reasons as yet unexplained. Your code is also much easier to understand, and it breaks less of PyMC. Even before the MCMC starts, Theano takes a long time to build and compile the declarative MCMC loops, which are really complicated expression graphs.

Good point about the random streams, ideally those would be held by the fitting methods rather than stuck in the model, but it's not immediately clear how to do that. Random theano nodes have to be created from streams.


The only change you've made that I think isn't a good idea is separating free variables from priors. The statement on mcex's readme page, that it makes it easier to have things like f(x) ~ d, is not quite right. 

If you have

f(x) ~ d,

then 

p(x) = d(f(x)) |f'(x)|

not d(f(x)) as you might obtain using potentials. For a stark illustration of the difference, run the code below. If you want to have f(x) ~ d, the way to do it is to rearrange your model so that y ~ d (d is the prior of y) and x = f^{-1}(y).

Another advantage to priors is that if you tag some potentials as being priors you can see when you can use the DrawFromPrior step method for parts of models that are just forward simulations.


Cheers,
Anand



---


import pymc as pm
import pylab as pl

f = pm.invlogit
finv = pm.logit

def model1():
    "p(x) = phi(f(x))"
    x = pm.Uninformative('x',value=0)
    
    @pm.deterministic
    def y(x=x):
        return f(x)
    
    @pm.potential
    def phi(y=y):
        return pm.uniform_like(y,0,1)
        
    return locals()
    
def model2():
    "f(x) ~ phi"
    y = pm.Uniform('y',0,1)

    @pm.deterministic
    def x(y=y):
        return finv(y)
        
    return locals()
    
    
M1 = pm.MCMC(model1())
M2 = pm.MCMC(model2())

M1.isample(10000,5000,5)
M2.isample(1000)

pl.clf()
pl.subplot(2,1,1)
pl.plot(M1.trace('x')[:])
pl.title('model 1')
pl.subplot(2,1,2)
pl.plot(M2.trace('x')[:])
pl.title('model 2')

John Salvatier

unread,
Mar 24, 2011, 4:29:35 PM3/24/11
to py...@googlegroups.com
You're absolutely correct. I have to think about how to rework the design.

John Salvatier

unread,
May 6, 2012, 9:25:43 PM5/6/12
to py...@googlegroups.com
Hello, 

I don't know if people are still interested in this, but I've recently started working on this experimental package again. Still located here: https://github.com/jsalvatier/mcex

I've reworked the package a bit, giving it a more functional design (as in functional programming). I've eliminated the design problem with separating random variables and priors (by simply requiring that they be added to the model at the same time) while still separating them conceptually.

I've added an example of a hierarchical model and a logistic regression model. I've also added a metropolis sampler. 

I invite people to play around with the package, make criticisms and suggestions, etc.

There are a couple interesting papers on Hamiltonian Monte Carlo that I've discovered recently and that I'll be implementing in the near future.

Split Hamiltonian Monte Carlo. A technique for sampling from part of the distribution exactly, leading to faster mixing. I think this will also have the effect of making HMC scale closer to O(1) with the number of dimensions. I recall Anand was interested in this.

Quasi-Newton Methods for Markov Chain Monte Carlo. A technique for adapting the covariance matrix continuously to reduce the need to tune HMC significantly.

Chris Fonnesbeck

unread,
May 7, 2012, 9:53:07 AM5/7/12
to py...@googlegroups.com
Hi John,

This is very cool. I'm amazed at how concise the code is. So, do you think this is the way forward? Your advantages/disadvantages list seem to suggest so. After I release PyMC 2.2 (sometime this week), I was going to focus on switching much of the Fortran code to cython, perhaps breaking the distributions out as a stand-alone package, upon which PyMC would depend, and that could be used for other packages. However, farming out the heavy lifting to Theano is attractive on some levels (particularly since this is not my area of expertise) and would allow the focus to stay on implementing new MCMC algorithms and other statistical problems.

Thanks for doing this,
cf

John Salvatier

unread,
May 7, 2012, 2:00:43 PM5/7/12
to py...@googlegroups.com
Hi Chris, 

I do think this is the way forward, but it's a good idea to do more experimentation to figure out if there are big problems when doing different things. For example, I'm not sure how to do Gibbs sampling in this framework.

If this is the way forward, I'm not sure what would be most productive:

1) Incorporate just the Theano likelihoods into PyMC
2) Rework the design of PyMC to use a more Functional approach and incorporate the Theano likelihoods
3) Maintain PyMC and start a new package based on the Functional/Theano approach

I think there's value in a Functional design as it allows for easier swapping out of different components, but maybe I am overly biased by my sense of aesthetics. I do think that Theano likelihoods are the clearest source of value.
Options 2 and 3 would require recreating or reworking many PyMC features like missing value imputation, database storage, and graph printing (Theano can print the computational graph, but not the model graph). 

What ends up being most productive of course depends on what other people are interested in working on.

Cheers,
John 

--
You received this message because you are subscribed to the Google Groups "PyMC" group.
To view this discussion on the web visit https://groups.google.com/d/msg/pymc/-/klYYV_mjNAIJ.

Chris Fonnesbeck

unread,
May 7, 2012, 2:23:57 PM5/7/12
to py...@googlegroups.com
On Monday, May 7, 2012 at 1:00 PM, John Salvatier wrote:
> Hi Chris,
>
> I do think this is the way forward, but it's a good idea to do more experimentation to figure out if there are big problems when doing different things. For example, I'm not sure how to do Gibbs sampling in this framework.
>
> If this is the way forward, I'm not sure what would be most productive:
>
> 1) Incorporate just the Theano likelihoods into PyMC
> 2) Rework the design of PyMC to use a more Functional approach and incorporate the Theano likelihoods
> 3) Maintain PyMC and start a new package based on the Functional/Theano approach
>
> I think there's value in a Functional design as it allows for easier swapping out of different components, but maybe I am overly biased by my sense of aesthetics. I do think that Theano likelihoods are the clearest source of value.
> Options 2 and 3 would require recreating or reworking many PyMC features like missing value imputation, database storage, and graph printing (Theano can print the computational graph, but not the model graph).
>
> What ends up being most productive of course depends on what other people are interested in working on.
The functional approach seems to make for more elegant algorithm implementation, but somewhat less elegant model specification. At least based on your examples, I think the current approach to specifying models is more "user friendly", at least for non-computer scientists.

The only other reservation I have regarding Theano is that it is somewhat of a bear to install with GPU support, at least on OS X. I spent most of the morning trying, and am still not successful. One of the barriers to adoption for PyMC has been installation, with all the Fortran code that needs to be built. Having Theano as a dependency may significantly worsen this. I suppose this will always be the case when using GPUs. More generally, it seems to be an inevitable tradeoff when trying to boost Python performance that some new package dependency is introduced.

Any idea how much faster Theano will be than, say, cython?

cf




Thomas Wiecki

unread,
May 7, 2012, 2:49:37 PM5/7/12
to py...@googlegroups.com
On Mon, May 7, 2012 at 2:23 PM, Chris Fonnesbeck <fonne...@gmail.com> wrote:
> On Monday, May 7, 2012 at 1:00 PM, John Salvatier wrote:
>> Hi Chris,
>>
>> I do think this is the way forward, but it's a good idea to do more experimentation to figure out if there are big problems when doing different things. For example, I'm not sure how to do Gibbs sampling in this framework.
>>
>> If this is the way forward, I'm not sure what would be most productive:
>>
>> 1) Incorporate just the Theano likelihoods into PyMC
>> 2) Rework the design of PyMC to use a more Functional approach and incorporate the Theano likelihoods
>> 3) Maintain PyMC and start a new package based on the Functional/Theano approach
>>
>> I think there's value in a Functional design as it allows for easier swapping out of different components, but maybe I am overly biased by my sense of aesthetics. I do think that Theano likelihoods are the clearest source of value.
>> Options 2 and 3 would require recreating or reworking many PyMC features like missing value imputation, database storage, and graph printing (Theano can print the computational graph, but not the model graph).
>>
>> What ends up being most productive of course depends on what other people are interested in working on.
> The functional approach seems to make for more elegant algorithm implementation, but somewhat less elegant model specification. At least based on your examples, I think the current approach to specifying models is more "user friendly", at least for non-computer scientists.
>
> The only other reservation I have regarding Theano is that it is somewhat of a bear to install with GPU support, at least on OS X. I spent most of the morning trying, and am still not successful.

Is a GPU toolchain a necessary dependency for Theano? I was under the
belief that it's an optional dependency and Theano will work fine
without it.

> One of the barriers to adoption for PyMC has been installation, with all the Fortran code that needs to be built. Having Theano as a dependency may significantly worsen this. I suppose this will always be the case when using GPUs. More generally, it seems to be an inevitable tradeoff when trying to boost Python performance that some new package dependency is introduced.
>
> Any idea how much faster Theano will be than, say, cython?
>
> cf
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups "PyMC" group.

John Salvatier

unread,
May 7, 2012, 2:51:36 PM5/7/12
to py...@googlegroups.com
That seems like a fair assessment. I've been trying to think about how to make the model specification simpler and more user friendly, with a bit of success (Thomas Wiecki helped me out), but it probably does need more work. The code behind the functional design is simple enough that it may useful to explain to users what is going on behind the scenes. The AddVar and AddData functions are really simple and Model is really just two lists.

Getting theano to use the GPU is tricky (I haven't played around with it much), but getting theano to work without the GPU hasn't been difficult for me. You still get compilation down to C even if you're not using the GPU. 

I don't have a sense for the speed differences between cython and Theano, but that would probably be a useful thing to investigate.


cf




--
You received this message because you are subscribed to the Google Groups "PyMC" group.

John Salvatier

unread,
May 7, 2012, 2:52:23 PM5/7/12
to py...@googlegroups.com
Using the GPU is optional. 
Reply all
Reply to author
Forward
0 new messages