--
--
--
Ah this is perfect thanks a lot!
--
just a pedantic note (hammering on Wouter's remark "for Python 2.7")
to make it more robust: use the integer division (// instead of /) so
that the grouping lambda survives the transition to Python 3 (or a
from __future__ import division statement at the top of the module)
--
I actually tried it before writing, on Wouter's example, and this was
the result:
In [5]: df_resampled = df.groupby(lambda x:x/5.).mean()
In [6]: df_resampled
Out[6]:
<class 'pandas.core.frame.DataFrame'>
Index: 100 entries, 0.0 to 19.8
Columns: 100 entries, 0 to 99note index contains 100 entries, not 20. This was on 0.8.1 right now
dtypes: float64(100)
and on 0.9.rcsomething on the morning.
On Thu, Oct 4, 2012 at 4:15 AM, Alvaro Tejero Cantero <alv...@minin.es> wrote:just a pedantic note (hammering on Wouter's remark "for Python 2.7")
to make it more robust: use the integer division (// instead of /) so
that the grouping lambda survives the transition to Python 3 (or a
from __future__ import division statement at the top of the module)
So, in my application, I actually am passing floats into the divisor (aka x / 4.55), and letting pandas automatically figure out the closest integer by which to split the table. It seems like pandas will convert to an integer internally, regardless of if one passes a float or integer into the divisor.
--
Hi Wouter,
Just a quick followup to this thread. At one point, you shows that by doing integer division, one gets an array of indicies. You said:Let`s compare 10 first grouping keys for x //5 and, x / 5.In [18]: [x // 5 for x in range(100)][:10]Out[18]: [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
I was wondering if it is possible to still do the same grouping/averaging but passing in a such an array of indicies. For example, I do some hisotogram binning in another program, and pass in a list of digitized indicies, and now want to group my dataframe along these indicies. Is something like this possible?
I tried doing something like:
>>> df=DataFrame(randn(10,10))
>>> dfresamp=df.groupby([0,0,0,0,0,1,1,1,1,1,1]).mean()
>>> df.shape, dfresamp.shape
((10, 10), (10, 8))
It doesn't really work.
--
--
If you have time, can you clarify one last aspect of this for me. You said:If I have row and column labels, this should take those in, no?In [19]: df.groupby(lambda x: [0,0,0,0,0,1,1,1,1,1][x]).mean()Out[19]:0 1 2 3 4 5 6 7 8 90 0.620309 0.674822 -0.154680 -1.150960 0.092368 0.160989 0.147444 0.111853 -0.084692 -0.5563671 0.068149 -0.273187 0.388405 0.046407 -0.054020 -0.395190 0.509529 0.095781 -0.152507 0.036615
--