See link for a sample of what the simulated data looks like:
Priors are set to the ensemble mean / covariance for simplicity-- I'm
just trying to recover the generative process for starters. I also
know this isn't the best way to do this computationally speaking,
mainly for getting comfortable setting up PyMC models.
Thanks in advance,
Wes
import pymc as pm
import pymc.distributions as dist
import numpy as np
from numpy.linalg import inv, cholesky as chol
import numpy.linalg as L
import numpy.random as rand
import matplotlib.pyplot as plt
#-------------------------------------------------------------------------------
# Generate MV normal mixture
gen_mean = {
0 : [0, 5],
1 : [-10, 0],
2 : [-10, 10]
}
gen_sd = {
0 : [0.5, 0.5],
1 : [.5, 1],
2 : [1, .25]
}
gen_corr = {
0 : 0.5,
1 : -0.5,
2 : 0
}
def generate_data(n=1e5, k=2, ncomps=3, seed=1):
rand.seed(seed)
data_concat = []
labels_concat = []
for j in range(ncomps):
mean = gen_mean[j]
sd = gen_sd[j]
corr = gen_corr[j]
cov = np.empty((k, k))
cov.fill(corr)
cov[np.diag_indices(k)] = 1
cov *= np.outer(sd, sd)
rvs = pm.rmv_normal_cov(mean, cov, size=n)
data_concat.append(rvs)
labels_concat.append(np.repeat(j, n))
return (np.concatenate(labels_concat),
np.concatenate(data_concat, axis=0))
N = int(2e2) # n data points per component
K = 2 # ndim
ncomps = 3 # n mixture components
true_labels, data = generate_data(n=N, k=K, ncomps=ncomps)
def plot_2d_mixture(data, labels):
plt.figure(figsize=(10, 10))
colors = 'bgr'
for j in np.unique(labels):
x, y = data[labels == j].T
plt.plot(x, y, '%s.' % colors[j], ms=1)
def plot_thetas(sampler):
plot_2d_mixture(data, true_labels)
def plot_theta(i):
x, y = sampler.trace('theta_%d' % i)[:].T
plt.plot(x, y, 'k.')
for i in range(3):
plot_theta(i)
#-------------------------------------------------------------------------------
# set up PyMC model
# priors, fairly vague
prior_mean = data.mean(0)
sigma0 = np.diag([1., 1.])
c0 = np.cov(data.T)
# shared hyperparameter?
# theta_tau = pm.Wishart('theta_tau', n=4, Tau=L.inv(sigma0))
thetas = []
covs = []
for j in range(ncomps):
# need a hyperparameter for degrees of freedom?
cov = pm.InverseWishart('C_%d' % j, n=4, C=c0)
theta = pm.MvNormalCov('theta_%d' % j, mu=prior_mean, C=cov)
thetas.append(theta)
covs.append(cov)
alpha0 = np.ones(3.) / 3
weights = pm.Dirichlet('weights', theta=alpha0)
labels = pm.Categorical('labels', p=weights, size=len(data))
@pm.stochastic(observed=True)
def mixture(value=data, thetas=thetas, covs=covs, labels=labels,
weights=weights):
loglike = 0.
for j, (theta, cov) in enumerate(zip(thetas, covs)):
this_data = data[labels == j]
ch = np.linalg.cholesky(cov)
loglike += pm.mv_normal_chol_like(this_data, theta, ch)
return loglike
sampler = pm.MCMC({'mixture' : mixture,
'thetas' : thetas,
'covs' : covs,
'weights' : weights,
'labels' : labels})
sampler.sample(iter=10000, burn=1000, thin=10)
Thanks for the note-- I discussed this model with a colleague and we
concluded that the mixing is just far too slow-- a preferred approach
is to estimate the mixture weights only, then later label the data
points using the posterior distributions for each mixture component
(select the component having the highest posterior probability).
> I don't know exactly what your objective is with the example, but the
> model setup looks basically OK to me, although I might formulate the
> priors a little differently. For example, using the same covariance
> for both the data and the means seems odd to me (but correct me if
> that is a common formulation); I would probably just put a vague prior
> on the means:
> theta = pm.Normal('theta_%d'%j, mu=0.0, tau=1e-5, size=2,
> value=np.zeros(2))
I was only looking to get my feet wet with PyMC on a slightly
non-trivial example-- so didn't worry too much about the prior
specification.
> Also I would be careful defining your InverseWishart prior with C=c0
> where c0 is the sample covariance of the data. The InverseWishart
> model appears to be counter-intuitive, in that increasing the
> magnitude of C actually decreases the variance you see in the
> InverseWishart distribution. So in your case, you would get a tighter
> prior on the covariance when your data show more scatter; probably the
> opposite of what you want. Based on some of the other discussion on
> the list, it seems a more standard formulation might be to define your
> model in terms of a precision matrix, and give it a vague Wishart
> prior:
> tau = pm.Wishart('T_%d' %j, n=4, Tau=np.eye(2), value=np.eye(2))
>
> As far as identification of the posterior distributions, the sampler
> does appears to converge to a reasonable posterior if you start both
> the means and the labels at "correct" values. From a practical
> perspective, it would be easy to first run your data through an EM
> algorithm (which can easily get a good MLE) and then use that as a
> starting point in PyMC. However, I'm not sure how well a basic MCMC
> approach would do (given good starting values) on a problem in which
> the component densities were closer to each other, introducing some
> ambiguity about the classification of the data. As you said, maybe
> someone has more experience with Bayesian inference on mixture models?
>
> Regards,
> John
>
> --
> You received this message because you are subscribed to the Google Groups "PyMC" group.
> To post to this group, send email to py...@googlegroups.com.
> To unsubscribe from this group, send email to pymc+uns...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/pymc?hl=en.
>
>
I'll tinker around with it again when I have some time. David
Cournepeau has an EM library so might be interesting to play with as
well.