covariance matrix in pandas

J

unread,

Aug 31, 2011, 11:51:46 AM8/31/11

to pystat...@googlegroups.com

Hi all,

Imagine I have the following DataFrame of stock returns:

<class 'pandas.core.frame.DataFrame'>
Index: 520 entries, 2009-08-03 00:00:00 to 2011-08-23 00:00:00
Data columns:
aapl    519 non-null values
c       519 non-null values
goog    519 non-null values
gs      519 non-null values
ibm     519 non-null values
jpm     519 non-null values
siri    519 non-null values
tgt     519 non-null values
wmt     519 non-null values
x       519 non-null values
dtypes: float64(10)

Is there an "built in" way to build a covariance matrix or does one need to create it "manually" by slicing and dicing the frame?

Skipper Seabold

unread,

Aug 31, 2011, 12:05:37 PM8/31/11

to pystat...@googlegroups.com

Can't you just use numpy.cov? From most recent pandas master

[~/statsmodels/statsmodels-git/scikits/statsmodels/tsa/]
[74]: import scikits.statsmodels.api as sm

[~/statsmodels/statsmodels-git/scikits/statsmodels/tsa/]
[75]: data = sm.datasets.macrodata.load()

[~/statsmodels/statsmodels-git/scikits/statsmodels/tsa/]
[76]: df = pandas.DataFrame(data.data)

[~/statsmodels/statsmodels-git/scikits/statsmodels/tsa/]
[77]: cv = np.cov(df, rowvar=0)

[~/statsmodels/statsmodels-git/scikits/statsmodels/tsa/]
[78]: cv_df = pandas.DataFrame(cv, index=df.columns, columns=df.columns)

Skipper

Wes McKinney

unread,

Aug 31, 2011, 12:29:34 PM8/31/11

to pystat...@googlegroups.com

That will work-- the only addition is that you'll want to call dropna
to exclude rows with missing observations:

df.dropna(axis=0)

if you look at DataFrame.corr, it does a pairwise correlation, same
could be done for cov. In my past life I implemented the Stambaugh
covariance estimator for time series starting at different time points
(http://www.nber.org/papers/w5918), but it's a bit more work.

-W

Message has been deleted

J

unread,

Aug 31, 2011, 1:44:58 PM8/31/11

to pystat...@googlegroups.com

So I'm using pandas 0.4.0dev and dropna is throwing a fit:

Traceback (most recent call last):
File "portfolio.py", line 413, in <module>
ret = port.get_covariance_matrix()
File "portfolio.py", line 262, in get_covariance_matrix
cv = np.cov(frame.dropna())
AttributeError: 'DataFrame' object has no attribute 'dropna'

If I remove dropna, numpy throws a fit:

Traceback (most recent call last):
File "portfolio.py", line 413, in <module>
    ret = port.get_covariance_matrix()
File "portfolio.py", line 262, in get_covariance_matrix
    cv = np.cov(frame)
File "C:\Python27\lib\site-packages\numpy\lib\function_base.py", line 1971, in cov
    X = array(m, ndmin=2, dtype=float)
TypeError: __array__() takes exactly 1 argument (2 given)

I'm using numpy 1.6.0

Any thoughts?

Wes McKinney

unread,

Aug 31, 2011, 1:48:14 PM8/31/11

to pystat...@googlegroups.com

Too old of a pandas 0.4.0dev, I implemented dropna on 7/29. I swear
I'm going to get the final release out in the next 2 weeks! are you
equipped to build the latest code from source (if not I can post a
binary later today)? I need to get set up to post automatic nightly
development snapshots...

-W

J

unread,

Aug 31, 2011, 2:02:21 PM8/31/11

to pystat...@googlegroups.com

I'm working on a windows machine during the day to test things out. My "prod" environment is Mac OS X/UNIX. I think the last time I tried building from source it failed and I needed the binary. No real big rush because my main goal is building the covar matrix (thanks A LOT for that paper by the way). Unless of course my np.cov error is related to the older 0.4.0dev...

Reply all

Reply to author

Forward