covariance matrix in pandas

653 views
Skip to first unread message

J

unread,
Aug 31, 2011, 11:51:46 AM8/31/11
to pystat...@googlegroups.com
Hi all,
 
Imagine I have the following DataFrame of stock returns:
 
<class 'pandas.core.frame.DataFrame'>
Index: 520 entries, 2009-08-03 00:00:00 to 2011-08-23 00:00:00
Data columns:
aapl    519  non-null values
c       519  non-null values
goog    519  non-null values
gs      519  non-null values
ibm     519  non-null values
jpm     519  non-null values
siri    519  non-null values
tgt     519  non-null values
wmt     519  non-null values
x       519  non-null values
dtypes: float64(10)
 
Is there an "built in" way to build a covariance matrix or does one need to create it "manually" by slicing and dicing the frame?

Skipper Seabold

unread,
Aug 31, 2011, 12:05:37 PM8/31/11
to pystat...@googlegroups.com

Can't you just use numpy.cov? From most recent pandas master

[~/statsmodels/statsmodels-git/scikits/statsmodels/tsa/]
[74]: import scikits.statsmodels.api as sm

[~/statsmodels/statsmodels-git/scikits/statsmodels/tsa/]
[75]: data = sm.datasets.macrodata.load()

[~/statsmodels/statsmodels-git/scikits/statsmodels/tsa/]
[76]: df = pandas.DataFrame(data.data)

[~/statsmodels/statsmodels-git/scikits/statsmodels/tsa/]
[77]: cv = np.cov(df, rowvar=0)

[~/statsmodels/statsmodels-git/scikits/statsmodels/tsa/]
[78]: cv_df = pandas.DataFrame(cv, index=df.columns, columns=df.columns)

Skipper

Wes McKinney

unread,
Aug 31, 2011, 12:29:34 PM8/31/11
to pystat...@googlegroups.com

That will work-- the only addition is that you'll want to call dropna
to exclude rows with missing observations:

df.dropna(axis=0)

if you look at DataFrame.corr, it does a pairwise correlation, same
could be done for cov. In my past life I implemented the Stambaugh
covariance estimator for time series starting at different time points
(http://www.nber.org/papers/w5918), but it's a bit more work.

-W

Message has been deleted

J

unread,
Aug 31, 2011, 1:44:58 PM8/31/11
to pystat...@googlegroups.com
So I'm using pandas 0.4.0dev and dropna is throwing a fit:
 
Traceback (most recent call last):
  File "portfolio.py", line 413, in <module>
    ret = port.get_covariance_matrix()
  File "portfolio.py", line 262, in get_covariance_matrix
    cv = np.cov(frame.dropna())
AttributeError: 'DataFrame' object has no attribute 'dropna'
 
If I remove dropna, numpy throws a fit:
 
Traceback (most recent call last):
  File "portfolio.py", line 413, in <module>
    ret = port.get_covariance_matrix()
  File "portfolio.py", line 262, in get_covariance_matrix
    cv = np.cov(frame)
  File "C:\Python27\lib\site-packages\numpy\lib\function_base.py", line 1971, in cov
    X = array(m, ndmin=2, dtype=float)
TypeError: __array__() takes exactly 1 argument (2 given)
 
I'm using numpy 1.6.0
 
Any thoughts?
 

Wes McKinney

unread,
Aug 31, 2011, 1:48:14 PM8/31/11
to pystat...@googlegroups.com

Too old of a pandas 0.4.0dev, I implemented dropna on 7/29. I swear
I'm going to get the final release out in the next 2 weeks! are you
equipped to build the latest code from source (if not I can post a
binary later today)? I need to get set up to post automatic nightly
development snapshots...

-W

J

unread,
Aug 31, 2011, 2:02:21 PM8/31/11
to pystat...@googlegroups.com
I'm working on a windows machine during the day to test things out. My "prod" environment is Mac OS X/UNIX. I think the last time I tried building from source it failed and I needed the binary. No real big rush because my main goal is building the covar matrix (thanks A LOT for that paper by the way). Unless of course my np.cov error is related to the older 0.4.0dev...
Reply all
Reply to author
Forward
0 new messages