Can't you just use numpy.cov? From most recent pandas master
[~/statsmodels/statsmodels-git/scikits/statsmodels/tsa/]
[74]: import scikits.statsmodels.api as sm
[~/statsmodels/statsmodels-git/scikits/statsmodels/tsa/]
[75]: data = sm.datasets.macrodata.load()
[~/statsmodels/statsmodels-git/scikits/statsmodels/tsa/]
[76]: df = pandas.DataFrame(data.data)
[~/statsmodels/statsmodels-git/scikits/statsmodels/tsa/]
[77]: cv = np.cov(df, rowvar=0)
[~/statsmodels/statsmodels-git/scikits/statsmodels/tsa/]
[78]: cv_df = pandas.DataFrame(cv, index=df.columns, columns=df.columns)
Skipper
That will work-- the only addition is that you'll want to call dropna
to exclude rows with missing observations:
df.dropna(axis=0)
if you look at DataFrame.corr, it does a pairwise correlation, same
could be done for cov. In my past life I implemented the Stambaugh
covariance estimator for time series starting at different time points
(http://www.nber.org/papers/w5918), but it's a bit more work.
-W
Too old of a pandas 0.4.0dev, I implemented dropna on 7/29. I swear
I'm going to get the final release out in the next 2 weeks! are you
equipped to build the latest code from source (if not I can post a
binary later today)? I need to get set up to post automatic nightly
development snapshots...
-W