Gaussian mixture model using PyMC

529 views
Skip to first unread message

Wes McKinney

unread,
Feb 10, 2011, 5:59:40 PM2/10/11
to py...@googlegroups.com
I was playing with doing inference a simple mixture of multivariate
normal distributions using PyMC and wanted to see if anyone else had
done this before, or if someone who knows a bit more of a PyMC expert
could comment on whether I've set things up properly (because the
results are not what I would expect).

See link for a sample of what the simulated data looks like:

http://db.tt/hcsjNIc

Priors are set to the ensemble mean / covariance for simplicity-- I'm
just trying to recover the generative process for starters. I also
know this isn't the best way to do this computationally speaking,
mainly for getting comfortable setting up PyMC models.

Thanks in advance,
Wes

import pymc as pm
import pymc.distributions as dist
import numpy as np
from numpy.linalg import inv, cholesky as chol
import numpy.linalg as L
import numpy.random as rand

import matplotlib.pyplot as plt

#-------------------------------------------------------------------------------
# Generate MV normal mixture

gen_mean = {
0 : [0, 5],
1 : [-10, 0],
2 : [-10, 10]
}

gen_sd = {
0 : [0.5, 0.5],
1 : [.5, 1],
2 : [1, .25]
}

gen_corr = {
0 : 0.5,
1 : -0.5,
2 : 0
}

def generate_data(n=1e5, k=2, ncomps=3, seed=1):
rand.seed(seed)
data_concat = []
labels_concat = []

for j in range(ncomps):
mean = gen_mean[j]
sd = gen_sd[j]
corr = gen_corr[j]

cov = np.empty((k, k))
cov.fill(corr)
cov[np.diag_indices(k)] = 1
cov *= np.outer(sd, sd)

rvs = pm.rmv_normal_cov(mean, cov, size=n)

data_concat.append(rvs)
labels_concat.append(np.repeat(j, n))

return (np.concatenate(labels_concat),
np.concatenate(data_concat, axis=0))

N = int(2e2) # n data points per component
K = 2 # ndim
ncomps = 3 # n mixture components

true_labels, data = generate_data(n=N, k=K, ncomps=ncomps)

def plot_2d_mixture(data, labels):
plt.figure(figsize=(10, 10))
colors = 'bgr'
for j in np.unique(labels):
x, y = data[labels == j].T
plt.plot(x, y, '%s.' % colors[j], ms=1)


def plot_thetas(sampler):
plot_2d_mixture(data, true_labels)

def plot_theta(i):
x, y = sampler.trace('theta_%d' % i)[:].T
plt.plot(x, y, 'k.')

for i in range(3):
plot_theta(i)

#-------------------------------------------------------------------------------
# set up PyMC model

# priors, fairly vague
prior_mean = data.mean(0)
sigma0 = np.diag([1., 1.])
c0 = np.cov(data.T)

# shared hyperparameter?
# theta_tau = pm.Wishart('theta_tau', n=4, Tau=L.inv(sigma0))

thetas = []
covs = []
for j in range(ncomps):
# need a hyperparameter for degrees of freedom?
cov = pm.InverseWishart('C_%d' % j, n=4, C=c0)
theta = pm.MvNormalCov('theta_%d' % j, mu=prior_mean, C=cov)

thetas.append(theta)
covs.append(cov)

alpha0 = np.ones(3.) / 3
weights = pm.Dirichlet('weights', theta=alpha0)
labels = pm.Categorical('labels', p=weights, size=len(data))

@pm.stochastic(observed=True)
def mixture(value=data, thetas=thetas, covs=covs, labels=labels,
weights=weights):

loglike = 0.
for j, (theta, cov) in enumerate(zip(thetas, covs)):
this_data = data[labels == j]
ch = np.linalg.cholesky(cov)
loglike += pm.mv_normal_chol_like(this_data, theta, ch)

return loglike

sampler = pm.MCMC({'mixture' : mixture,
'thetas' : thetas,
'covs' : covs,
'weights' : weights,
'labels' : labels})

sampler.sample(iter=10000, burn=1000, thin=10)

John McFarland

unread,
Feb 19, 2011, 11:40:28 AM2/19/11
to PyMC
On Feb 10, 4:59 pm, Wes McKinney <wesmck...@gmail.com> wrote:
> I was playing with doing inference a simple mixture of multivariate
> normal distributions using PyMC and wanted to see if anyone else had
> done this before, or if someone who knows a bit more of a PyMC expert
> could comment on whether I've set things up properly (because the
> results are not what I would expect).
>
> See link for a sample of what the simulated data looks like:
>
> http://db.tt/hcsjNIc
>
> Priors are set to the ensemble mean / covariance for simplicity-- I'm
> just trying to recover the generative process for starters. I also
> know this isn't the best way to do this computationally speaking,
> mainly for getting comfortable setting up PyMC models.
>
> Thanks in advance,
> Wes

Wes,

Thanks for sharing this. I'm not an expert in PyMC or mixture models,
but I think this is an interesting problem. There appears to be a lot
of literature out there on Bayesian estimation of mixture models, so
might be worth some review (if you search "Bayesian mixture" on
google, the first page includes a paper "Robust Bayesian Mixture
Modeling" by Bishop and Svensen that might be worth looking at; a
"variational approach" seems to be preferred).

My guess is that estimation of both the means and labels is too
difficult for the MCMC sampler. The chance of the sampler randomly
configuring the labels just right in coordination with the component
means also moving to the right locations just seems too small. Even
if I start the means at the correct values, they each still go
immediately to the ensemble mean and get stuck there.

I don't know exactly what your objective is with the example, but the
model setup looks basically OK to me, although I might formulate the
priors a little differently. For example, using the same covariance
for both the data and the means seems odd to me (but correct me if
that is a common formulation); I would probably just put a vague prior
on the means:
theta = pm.Normal('theta_%d'%j, mu=0.0, tau=1e-5, size=2,
value=np.zeros(2))

Also I would be careful defining your InverseWishart prior with C=c0
where c0 is the sample covariance of the data. The InverseWishart
model appears to be counter-intuitive, in that increasing the
magnitude of C actually decreases the variance you see in the
InverseWishart distribution. So in your case, you would get a tighter
prior on the covariance when your data show more scatter; probably the
opposite of what you want. Based on some of the other discussion on
the list, it seems a more standard formulation might be to define your
model in terms of a precision matrix, and give it a vague Wishart
prior:
tau = pm.Wishart('T_%d' %j, n=4, Tau=np.eye(2), value=np.eye(2))

As far as identification of the posterior distributions, the sampler
does appears to converge to a reasonable posterior if you start both
the means and the labels at "correct" values. From a practical
perspective, it would be easy to first run your data through an EM
algorithm (which can easily get a good MLE) and then use that as a
starting point in PyMC. However, I'm not sure how well a basic MCMC
approach would do (given good starting values) on a problem in which
the component densities were closer to each other, introducing some
ambiguity about the classification of the data. As you said, maybe
someone has more experience with Bayesian inference on mixture models?

Regards,
John

Wes McKinney

unread,
Feb 19, 2011, 12:22:52 PM2/19/11
to py...@googlegroups.com

Thanks for the note-- I discussed this model with a colleague and we
concluded that the mixing is just far too slow-- a preferred approach
is to estimate the mixture weights only, then later label the data
points using the posterior distributions for each mixture component
(select the component having the highest posterior probability).

> I don't know exactly what your objective is with the example, but the
> model setup looks basically OK to me, although I might formulate the
> priors a little differently.  For example, using the same covariance
> for both the data and the means seems odd to me (but correct me if
> that is a common formulation); I would probably just put a vague prior
> on the means:
> theta = pm.Normal('theta_%d'%j, mu=0.0, tau=1e-5, size=2,
> value=np.zeros(2))

I was only looking to get my feet wet with PyMC on a slightly
non-trivial example-- so didn't worry too much about the prior
specification.

> Also I would be careful defining your InverseWishart prior with C=c0
> where c0 is the sample covariance of the data.  The InverseWishart
> model appears to be counter-intuitive, in that increasing the
> magnitude of C actually decreases the variance you see in the
> InverseWishart distribution.  So in your case, you would get a tighter
> prior on the covariance when your data show more scatter; probably the
> opposite of what you want.  Based on some of the other discussion on
> the list, it seems a more standard formulation might be to define your
> model in terms of a precision matrix, and give it a vague Wishart
> prior:
> tau = pm.Wishart('T_%d' %j, n=4, Tau=np.eye(2), value=np.eye(2))
>
> As far as identification of the posterior distributions, the sampler
> does appears to converge to a reasonable posterior if you start both
> the means and the labels at "correct" values.  From a practical
> perspective, it would be easy to first run your data through an EM
> algorithm (which can easily get a good MLE) and then use that as a
> starting point in PyMC.  However, I'm not sure how well a basic MCMC
> approach would do (given good starting values) on a problem in which
> the component densities were closer to each other, introducing some
> ambiguity about the classification of the data.  As you said, maybe
> someone has more experience with Bayesian inference on mixture models?
>
> Regards,
> John
>

> --
> You received this message because you are subscribed to the Google Groups "PyMC" group.
> To post to this group, send email to py...@googlegroups.com.
> To unsubscribe from this group, send email to pymc+uns...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/pymc?hl=en.
>
>

I'll tinker around with it again when I have some time. David
Cournepeau has an EM library so might be interesting to play with as
well.

Reply all
Reply to author
Forward
0 new messages