Wouter Overmeire
unread,Dec 1, 2011, 4:34:19 AM12/1/11Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to pystat...@googlegroups.com
MultiIndex seems to store the level data always as dtype('object').
When using DataFrame.delevel() the added columns from the index also have dtype('object').
This prevents from using DataFrame.delevel.corr() to have a look at the correlation between the original DataFrame columns and the index level values. Does anyone have an idea to work around this?
See example below:
In [1]: import pandas
In [2]: import numpy as np
In [3]: import itertools
In [4]: tuples = [tuple for tuple in itertools.product(['foo', 'bar'], [10, 20], [1.0, 1.1])]
In [5]: index = pandas.MultiIndex.from_tuples(tuples, names=['prm0', 'prm1', 'prm2'])
In [6]: df = pandas.DataFrame(np.random.randn(8,3), columns=['A', 'B', 'C'], index=index)
In [7]: df
Out[7]:
A B C
prm0 prm1 prm2
foo 10 1.0 0.2074 0.3425 -1.295
1.1 0.3194 0.8114 2.133
foo 20 1.0 -0.1798 -1.162 0.5774
1.1 -0.4635 1.436 1.419
bar 10 1.0 -1.013 0.7605 -1.184
1.1 -0.4716 0.6983 0.5209
bar 20 1.0 -0.87 -0.3788 0.272
1.1 1.018 -0.4496 1.132
In [8]: df.corr()
Out[8]:
A B C
A 1 -0.2445 0.3852
B -0.2445 1 0.08211
C 0.3852 0.08211 1
In [9]: df.delevel().corr()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
...
2535 cols = self.columns
2536 mat = self.as_matrix(cols).T
-> 2537 baseCov = np.cov(mat)
2538
2539 sigma = np.sqrt(np.diag(baseCov))
.../python2.7/site-packages/numpy/lib/function_base.pyc in cov(m, y, rowvar, bias, ddof)
1920 raise ValueError("ddof must be integer")
1921
-> 1922 X = array(m, ndmin=2, dtype=float)
1923 if X.shape[0] == 1:
1924 rowvar = 1
ValueError: setting an array element with a sequence.
My guess is that this exception is related to the fact corr can not work with strings.
So let`s try it without the strings.
In [10]: df.delevel()[['prm1', 'prm2', 'A', 'B', 'C']]
Out[10]:
prm1 prm2 A B C
0 10 1 0.2074 0.3425 -1.295
1 10 1.1 0.3194 0.8114 2.133
2 20 1 -0.1798 -1.162 0.5774
3 20 1.1 -0.4635 1.436 1.419
4 10 1 -1.013 0.7605 -1.184
5 10 1.1 -0.4716 0.6983 0.5209
6 20 1 -0.87 -0.3788 0.272
7 20 1.1 1.018 -0.4496 1.132
In [11]: df.delevel()[['prm1', 'prm2', 'A', 'B', 'C']].corr()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
[...]
TypeError: function not supported for these types, and can't coerce safely to supported types
In [12]: df.delevel()['prm1'].values.dtype
Out[12]: dtype('object')
In [13]: df.delevel()['prm1']
Out[13]:
0 10
1 10
2 20
3 20
4 10
5 10
6 20
7 20
Name: prm1
In [14]: index.levels
Out[14]:
[Index([bar, foo], dtype=object),
Index([10, 20], dtype=object),
Index([1.0, 1.1], dtype=object)]
In [15]: