Stay tuned:
https://github.com/wesm/pandas/issues/41
https://github.com/wesm/pandas/issues/115
https://github.com/wesm/pandas/issues/218
https://github.com/wesm/pandas/issues/273
https://github.com/wesm/pandas/issues/479
The main reason I haven't done this yet is that there are thorny
corner cases that are unpleasant to deal with in the implementation.
But anyway, I've already got quite a bit of functioning concatenation
code done inside GroupBy, so I'm going to see if I can make it user
friendly, fast, and ready for the imminent 0.7.0 release. Having a
working API function that isn't all that fast would be preferable to
no function at all, I guess-- rather have you complaining about it
being slow than not being able to do it at all :)
- Wes
Wes: I had found your GH #479 when I googled this problem. That's how
I knew it was in the works. Looks like it has just been released,
great! So the new syntax is:
big_df = df_list[0].join(df_list[1:])
correct? I got it from GH #115. But I will have to re-install from
source 0.7 in order to try it myself.
Thanks!!
Chris
hi Chris,
yes, I finally did a proper job of implementing multi-joins and
multi-appends. There is new a single API function, concat, that does
all the hard labor. So you can do:
df_list[0].append(df_list[1:])
or simply
concat(df_list, axis=0)
With concat you have more control-- namely you can choose how the
other axes should be handled (i.e. use the columns from the first
object, or union/intersect them).
The last thing I'm going to do is allow you to pass a layers of keys
for the groups to form a hierarchical index along the concatenation
axis. For example, if you have:
>>> df1
a b
0 1 2
1 3 4
2 5 6
>>> df2
a b
0 7 8
1 9 10
2 11 12
you might want:
>>> concat([df1, df2], axis=1, group_keys=['one', 'two'])
one two
a b a b
0 1 2 7 8
1 3 4 9 10
2 5 6 11 12
i.e. a hierarchical index along the concatenation index.
This will be a part of the upcoming 0.7.0 release-- stay tuned. Of
course everything but that last bit (which is already implemented for
groupby--actually somewhat nontrivial--but I need to expose with a
reasonable API) is available in git master
- W
To your question about associating a key with each DataFrame, this is
hot off the presses:
In [2]: df
Out[2]:
0 1 2
0 1.6614 -0.71357 0.9032
1 0.9877 -0.43574 -1.8906
2 0.1742 -0.06604 0.5700
3 0.9013 -0.80383 -1.8286
4 -1.8021 -1.20078 0.4313
In [3]: concat([df, df], keys=[0, 1], axis=1)
Out[3]:
0 1
0 1 2 0 1 2
0 1.6614 -0.71357 0.9032 1.6614 -0.71357 0.9032
1 0.9877 -0.43574 -1.8906 0.9877 -0.43574 -1.8906
2 0.1742 -0.06604 0.5700 0.1742 -0.06604 0.5700
3 0.9013 -0.80383 -1.8286 0.9013 -0.80383 -1.8286
4 -1.8021 -1.20078 0.4313 -1.8021 -1.20078 0.4313
In [4]: concat([df, df], keys=[0, 1], axis=0)
Out[4]:
0 1 2
0 0 1.6614 -0.71357 0.9032
1 0.9877 -0.43574 -1.8906
2 0.1742 -0.06604 0.5700
3 0.9013 -0.80383 -1.8286
4 -1.8021 -1.20078 0.4313
1 0 1.6614 -0.71357 0.9032
1 0.9877 -0.43574 -1.8906
2 0.1742 -0.06604 0.5700
3 0.9013 -0.80383 -1.8286
4 -1.8021 -1.20078 0.4313
So it makes it very easy to concatenate DataFrame objects and
simultaneously index them with a MultiIndex based on some keys. For
example:
In [5]: glued = concat([df, df], keys=['foo', 'bar'], axis=1)
In [6]: glued
Out[6]:
foo bar
0 1 2 0 1 2
0 1.6614 -0.71357 0.9032 1.6614 -0.71357 0.9032
1 0.9877 -0.43574 -1.8906 0.9877 -0.43574 -1.8906
2 0.1742 -0.06604 0.5700 0.1742 -0.06604 0.5700
3 0.9013 -0.80383 -1.8286 0.9013 -0.80383 -1.8286
4 -1.8021 -1.20078 0.4313 -1.8021 -1.20078 0.4313
In [7]: glued['bar']
Out[7]:
0 1 2
0 1.6614 -0.71357 0.9032
1 0.9877 -0.43574 -1.8906
2 0.1742 -0.06604 0.5700
3 0.9013 -0.80383 -1.8286
4 -1.8021 -1.20078 0.4313
- Wes
Perhaps it'd be nice to support concat({0=df, 1=df}, axis=1) as well
(with no guaranteed order for the resulting concatenation, of course,
but lots of times there's no reason to care).
-- N