Multivariate Regression in PyMC

blan

unread,

Feb 12, 2011, 1:08:14 PM2/12/11

to PyMC

Hi all,

I am new to PyMC and am having some trouble fitting a multivariate
regression model. I was hoping somebody wouldn't mind taking a look
at my basic code and pointing out any mistakes I may be making. I am
trying to fit a basic linear model:

Y = Xb + e,

where Y is (n x 1) and X is (n x k). Here k = 4. I have tried many
different configurations of the following basic approach:

import numpy as np
from pymc import InverseGamma, deterministic, Normal, MvNormalCov
from config import s2, Y, X

Ik = np.identity(4)

b = MvNormalCov('b', mu = np.zeros(4), C =
1e8*Ik) # Prior on coefficient vector
sigma2 = InverseGamma('sigma2', alpha = 3, beta = (1.0 / (2 * s2))) #
Prior on sigma^2

@deterministic
def mu(X = X, b = b):
return np.dot(X,b)

@deterministic
def tau(sigma2 = sigma2):
return 1.0 / sigma2

Y = Normal('Y', mu, tau, value = Y, observed = True)

I have fit the model with frequentest methods, so know about where the
posterior means should be. However, when I simulate the posterior in
PyMC I am getting means that are way off. For instance, the OLS
estimate of b[1] is -3.35, while the posterior mean for this parameter
is 2469. Clearly I have a mistake! Any help would be greatly
appreciated. Thanks!

Chris Fonnesbeck

unread,

Feb 12, 2011, 1:29:57 PM2/12/11

to PyMC

Can you post your data file, so that I can mess around with the model?
One problem I see is that you named the data stochastic Y, which will
overwrite the data vector itself. May not matter, but you might want
to give it a unique name. Also, you probably want to avoid in inverse
gamma prior on the variance if you want it to be uninformative; they
turn out to be pretty informative (see Gelman 2006). If you dont mind
the information, then no worries.

One other idea -- you may want to provide reasonable initial values
for b, since the variance is so huge. It can easily get stuck in low-
probability regions in cases like this.

Chris

blan

unread,

Feb 12, 2011, 1:38:13 PM2/12/11

to PyMC

Hi Chris,

That would be great. The data is just from the text I am using to
teach myself this stuff, Carlin and Louis (2009). The link to the
data is

http://www.biostat.umn.edu/~brad/data/land_data.txt

As it is not for my actual work, I am not too worried about the
priors. Thank you for the tip though. I will try what you
recommended and see if it fixes things. Thanks for your help!

Chris Fonnesbeck

unread,

Feb 12, 2011, 1:58:08 PM2/12/11

to PyMC

Yeah, giving some sane starting values seems to have worked.

{'95% HPD interval': array([[ 3.61151514, 5.14205003],
[-4.13174923, -2.64597387],
[ 2.50344411, 3.85207883],
[ 0.85618149, 2.28345262]]),
'mc error': array([ 0.03873973, 0.0391429 , 0.03033799,
0.03350415]),
'mean': array([ 4.38028987, -3.36625853, 3.28085591, 1.61798147]),
'n': 5000,
'quantiles': {2.5: array([ 3.6564718 , -4.13003482, 2.52034961,
0.85618149]),
25: array([ 4.11925573, -3.63458265, 3.05276479, 1.389134 ]),
50: array([ 4.32780316, -3.33692739, 3.30093981, 1.62899525]),
75: array([ 4.62855652, -3.04346921, 3.50765626, 1.87653607]),
97.5: array([ 5.29443095, -2.64191083, 3.90125011, 2.28345262])},
'standard deviation': array([ 0.41222244, 0.41570342, 0.34188305,
0.36581739])}

I made a few other cosmetic changes as well. Here is the updated
model:

http://dl.dropbox.com/u/233041/mvreg.py

Cheers,
Chris

blan

unread,

Feb 12, 2011, 2:13:05 PM2/12/11

to PyMC

Great, that works for me as well. Thank you for spending time working
on that for me. I appreciate it.

Best,
Randall

Reply all

Reply to author

Forward