This is a really nice example, and very clearly written up. I put it in a git repository to make it easy to keep track of changes and suggestions, in case others also speak up.
https://github.com/aflaxman/pymc-multilevel-example
It seems like you can make it 2x faster by messing around with step methods and initial values. I didn't have enough procrastination time to push the issue of how many steps of MCMC are really needed, though. Some additional notes are in the git repo.
--Abie
Hi All,
Best,
Scott
###Code###
###End Code###
--
You received this message because you are subscribed to the Google Groups "PyMC" group.
To post to this group, send email to py...@googlegroups.com.
To unsubscribe from this group, send email to pymc+uns...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pymc?hl=en.
Thank you for working on the code. The README file in the git repo is instructive. I'll contribute any changes I make to the repo going forward.
A couple of follow-up questions (to you and the list)
1 - You mention in the README that the data wrangling section of the model opaque. I too find this to be the case. Having come from doing data analysis only in stats packages such as Stata, where datasets are usually just thought of as a rectangular object, I have a hard time figuring how to manage the data in python. In fact, that is why I ultimately used the matrix form of the model in the @deterministic section because then I knew what data were getting used in the analysis. So my question is: Is there anything in particular about my model specification that appeared problematic? My python skills are growing quickly but I still feel a little "wobbly."
2 - This dovetails with 1. Any thoughts on how to specify the stochastic section so that it uses a different tau depending on whether y_i is in the clustered condition or the unclustered condition? This is where the python way of doing things with data is not clear to me and I suspect it is simply due to the fact that I am used to traditional stat packages. In Stata or SAS, I would do an if-then type statement. My attempts for if-then statements in python haven't worked out.
Thanks again for the insights!
Best,
Scott
I had some time to try the changes in data wrangling and model specification that are my response to your questions. I tried to track the changes well with git commits, and update that README file, so it's all here:
https://github.com/aflaxman/pymc-multilevel-example
https://github.com/aflaxman/pymc-multilevel-example/commits/master/
The big change is using csv2rec to load the csv, which I think leads to much clearer data manipulation code:
data = pl.csv2rec('data.csv')
## Design matrix for intercept and intervention effect
X = [[1., t_ij] for t_ij in data.treat]
## Design matrix for cluster effect
Z = pl.zeros([len(data), 10])
for row, j in enumerate(data.j):
if j <= 10:
Z[row, j-1] = 1.
Then the change to do different taus for the treatment and control is just changes to two lines:
var_e1 = mc.Uniform('var_e1', lower=0, upper=100, value=[1., 1.])
tau_e1 = mc.Lambda('tau_e1', lambda v=var_e1: v**-1, trace=False)
@mc.stochastic(observed=True)
def y_i(value=data.y, mu=y_hat, tau=tau_e1):
return mc.normal_like(value,mu,tau[data.treat])
--Abie
________________________________________
From: py...@googlegroups.com [py...@googlegroups.com] on behalf of Scott Baldwin [scott_...@byu.edu]
Sent: Friday, May 13, 2011 8:32 AM
To: py...@googlegroups.com
Subject: Re: [pymc] Help Specifying a Multilevel Model for Partially Clustered Data
Thank you very much. Your changes are very instructive. The pylab csv change leads to an object that is much more like a dataframe. Likewise, I didn't realize that I'd only need to index the tau in the stochastic call.
You mentioned Stata's xtmixed in your README file. If you've got Stata 11 or greater, you can fit this model with REML or ML with the following code:
xtmixed y tx || j:treat, nocons residuals(independent, by(tx))
In the end, I am with you -- I like the flexibility of the Bayesian approach in PyMC. Further, the confidence interval coverage will be below the nominal rate in Stata (they use z-tests) but the Bayesian approach will give us appropriate coverage.
Thanks a million. I'll keep watching the github repository and will let you know if I make any other changes.
Best,
Scott