I just wanted to send a quick note on a topic that may be of some
interest to PyMC users.
There has been quite a bit of interest lately in using GPUs to speed
up MCMC in various ways-- the most obvious target is the computation
of log-likelihoods in MH steps, particularly when the number of
observed data points is very large. See example reference:
http://pubs.amstat.org/doi/abs/10.1198/jcgs.2010.10016
With large data sets, it should be no surprise that doing this can
dramatically reduce the runtime. We recently started working on a
(still in very early prototype stages, unstable API etc.) general C
library of CUDA functions which can be easily called in Python (I am
using Cython to do it):
https://github.com/dukestats/gpustats
The goal at the moment is to implement a variety of density functions
on the GPU. An immediate use (outside of using them in hand-coded
MCMC) would be to wire these into PyMC somehow (haven't thought too
much about this) to allow folks with GPUs to put them to work in their
PyMC code. We don't have a set timeline for this but will likely be an
ongoing project.
As a toy example I took John McFarland's code from a few days ago,
ratcheted up the size of the observed data, and monkey-patched
pymc.MvNormal to use the GPU multivariate normal pdf function
implemented in the above repo:
import pymc as pm
import numpy as np
import numpy.random as rand
import pdfs
import numpy as np
from numpy.linalg import inv
import pymc as pm
k = 5
rand.seed(1)
true_std = rand.randn(k)
true_mean = rand.randn(k)
true_cov = np.eye(k) * np.outer(true_std, true_std)
n = 10000
np.random.seed(1)
vals = np.random.multivariate_normal(true_mean, true_cov, n)
# Trying to give tau a vague prior and reasonable starting value...
tau = pm.Wishart('tau', n=k + 1, Tau=np.eye(k),
value=inv(np.eye(k)*10.0))
m = pm.Normal('m', mu=0, tau=1e-5, value=np.zeros(k), size=k)
data = pm.MvNormal('data', m, tau, value=vals, observed=True)
def logp(value=None, mu=None, tau=None):
# Call the GPU function-- which need a covariance matrix
return pdfs.mvnpdf(value, [mu], [inv(tau)]).sum()
data._logp.fun = logp
S = pm.MCMC(locals())
S.sample(iter=1000,burn=0,thin=1)
The difference in runtime speaks for itself:
Without the GPU function:
15:09 ~/code/gpustats $ time python pymc_test.py
2011-02-20 15:21:25-0500 [-] Log opened.
Sampling: 100% [0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000]
Iterations: 1000
real 0m31.912s
user 0m32.120s
sys 0m0.120s
With:
15:21 ~/code/gpustats $ time python pymc_test.py
2011-02-20 15:22:27-0500 [-] Log opened.
Sampling: 100% [0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000]
Iterations: 1000
real 0m3.798s
user 0m3.260s
sys 0m1.000s
So roughly 10x speedup. As the data size shrinks, eventually due to
the overhead of the CUDA calls, you will start to break even. With
1,000 observations instead of 10,000, the speedup drops to only about
1.5x on my computer.
The most fair comparison is to look directly at the speed of the logp
functions (here logp is as defined above):
In [19]: timeit pm.mv_normal_chol_like(vals, true_mean,
np.linalg.cholesky(true_cov))
10 loops, best of 3: 36.4 ms per loop
In [20]: timeit logp(vals, true_mean, true_cov)
1000 loops, best of 3: 1.43 ms per loop
In [21]: 36.4 / 1.43
Out[21]: 25.454545454545453
Or with 1e6 data points:
In [23]: timeit pm.mv_normal_chol_like(vals, true_mean,
np.linalg.cholesky(true_cov))
1 loops, best of 3: 366 ms per loop
In [24]: timeit logp(vals, true_mean, true_cov)
100 loops, best of 3: 3.47 ms per loop
In [25]: 366 / 3.47
Out[25]: 105.47550432276657
These speeds of course depend on what kind of GPU you have.
Anyway-- if anyone is interested in getting involved with this effort,
please feel free to contact me on or off the PyMC list.
Cheers,
Wes
I think that approach is highly promising, thanks for sharing!
I tried something similar for my hand-coded likelihood function. I
have used pycuda (http://mathema.tician.de/software/pycuda) which made
integration of cuda code into python extremely easy, have you looked
into that?
In my case I hit a memory transfer bottleneck where each time it would
costly copy over the data from RAM to device memory, but I think it
depends on how you code the likelihood (e.g. caching would be an
option).
-Thomas
> --
> You received this message because you are subscribed to the Google Groups "PyMC" group.
> To post to this group, send email to py...@googlegroups.com.
> To unsubscribe from this group, send email to pymc+uns...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/pymc?hl=en.
>
>
PyCUDA is definitely on my radar-- there is some concern about
creating a Python-only library when many Bayesians are still doing
Matlab, R, and their own C code. One really nice thing about PyCUDA is
that it enables you to manage the lifetime of the data on the GPU
global memory in Python-land which may matter on down the road-- right
now the code is "stateless", each function call is a full roundtrip of
input data and results which, as you mentioned, can be a problem.
- Wes
I think using PyOpenCL is an inevitability, especially given that
Apple is putting AMD GPUs in the next line of Macbook Pros. I've heard
that developing OpenCL code is a bit more difficult. CUDA can be
enough of a challenge-- I had to tinker quite a bit to get all of the
global-shared memory transactions to coalesce and to have no shared
memory bank conflicts (for those who know what that CUDA-speak means
=))
I got PyCUDA 0.94 to install out of the box on my MBP-- but you have
to a) use 32-bit Python and b) have the CUDA driver installed. Part a)
doesn't make me very happy (my target is desktops running Linux at the
moment where this is less of an issue).
- Wes
One potentially problematic issue with OpenCL appears to be the lack
of a CURAND library equivalent. There is an MD5-based PRNG in PyOpenCL
but I'm not sure how well it compares in speed or quality of the
random number generator. For the purposes of PyMC the main benefit
will be computing densities, and this can all be done in OpenCL.
- Wes
-Thomas
Sorry to drag up an old topic, but I've recently begun playing around with Theano and was wondering if anyone has made any progress on using it with PyMC lately. It looks like Theano's RNG capabilities have increased quite a bit over the past few months.
--
You received this message because you are subscribed to the Google Groups "PyMC" group.
To view this discussion on the web visit https://groups.google.com/d/msg/pymc/-/AXxJRMqPwsIJ.
The other nice thing is that Theano can do automatic differentiation
(e.g. of one's own likelihood) so that might enable automatic use of
gradient samplers such as Hamiltonian.