What do you think about a kwarg to rank, like "levels=None"? If None, it does the default, which is the 1...n rank. If levels=k, then it does the 1..k, eg 1..10. Then you could do
df.groupby(df.A.rank(levels=10)).mean()
which is a hell of a lot prettier than (df.A.rank()/float(len(df))*10.).astype(int)
Actually, what I am after is a bit more than this. I have some tile summary code that I use all the time that takes as input x and y and returns a record array of the tile label, the min and max for x, and descriptive statistics for y in each tile. If looks like this in use:
In [14]: rt = nansafe.tile_summary(df.A, df.B, levels=np.arange(10, 101, 10))
In [15]: print mlab.rec2txt(rt)
label xmin xmax qcount qmean qmedian qmin qmax qcimin qcimax
tile 0 -3.466 -1.259 100 -0.067 -0.105 -2.424 2.308 -0.165 0.030
tile 1 -1.258 -0.821 100 0.031 0.014 -3.017 2.016 -0.067 0.129
tile 2 -0.820 -0.533 100 -0.013 0.024 -2.924 2.476 -0.117 0.091
tile 3 -0.531 -0.258 100 0.190 0.236 -2.449 2.454 0.091 0.289
tile 4 -0.252 0.016 100 0.008 -0.063 -1.717 1.789 -0.073 0.090
tile 5 0.025 0.240 100 0.014 -0.018 -2.116 2.513 -0.081 0.109
tile 6 0.243 0.494 100 0.050 0.006 -1.957 1.618 -0.030 0.130
tile 7 0.494 0.820 100 0.170 -0.011 -1.924 2.327 0.082 0.257
tile 8 0.826 1.244 100 -0.036 -0.131 -2.317 2.360 -0.137 0.065
tile 9 1.246 3.055 100 -0.153 -0.123 -3.225 1.845 -0.249 -0.058
tile ALL -3.466 3.055 1000 0.019 -0.020 -3.225 2.513 -0.011 0.049
and the output y columns are customizable::
Definition: nansafe.tile_summary(x, y, stats=(<function count at 0x98cebc4>, <function mean at 0x98ce994>, <function median at 0x98ce9cc>, <function min at 0x98ceca4>, <function max at 0x98cecdc>, <function cimin at 0x98cec34>, <function cimax at 0x98cec6c>), levels=(20.0, 40.0, 60.0, 80.0, 100.0), breakpoints=None, tilelabel='tile', names=None, catd=None, verbose_output=False, force_categorical=False)
Do you think something like this has a place in pandas?
JDH