Message from discussion
Pandas row binning
Received: by 10.50.159.201 with SMTP id xe9mr6767115igb.0.1349731497724;
Mon, 08 Oct 2012 14:24:57 -0700 (PDT)
X-BeenThere: pydata@googlegroups.com
Received: by 10.50.51.234 with SMTP id n10ls6348427igo.3.canary; Mon, 08 Oct
2012 14:24:56 -0700 (PDT)
Received: by 10.43.3.4 with SMTP id nw4mr10213869icb.13.1349731496357;
Mon, 08 Oct 2012 14:24:56 -0700 (PDT)
Received: by 10.43.3.4 with SMTP id nw4mr10213868icb.13.1349731496338;
Mon, 08 Oct 2012 14:24:56 -0700 (PDT)
Return-Path: <hughesada...@gmail.com>
Received: from mail-ob0-f176.google.com (mail-ob0-f176.google.com [209.85.214.176])
by gmr-mx.google.com with ESMTPS id s9si1334135igw.0.2012.10.08.14.24.56
(version=TLSv1/SSLv3 cipher=OTHER);
Mon, 08 Oct 2012 14:24:56 -0700 (PDT)
Received-SPF: pass (google.com: domain of hughesada...@gmail.com designates 209.85.214.176 as permitted sender) client-ip=209.85.214.176;
Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of hughesada...@gmail.com designates 209.85.214.176 as permitted sender) smtp.mail=hughesada...@gmail.com; dkim=pass header...@gmail.com
Received: by mail-ob0-f176.google.com with SMTP id x4so3774759obh.7
for <pydata@googlegroups.com>; Mon, 08 Oct 2012 14:24:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=20120113;
h=mime-version:in-reply-to:references:date:message-id:subject:from:to
:content-type;
bh=SrhfKKQ+FJJB96wl47VnL1xpwBqllSvSbLsEPzDQ3/I=;
b=bocwMe3BYlOBguk8e+NNmg1g2UqFZEya4P48LjBjOo1tmJVlasym3vhlhdBDSkND/l
aEsUJo0dV7roz+l9PBmVFetXPMBdPhi8iXS7nQau1Kqxmxyp8a9f23YmV+geAeD+wIzt
eYTexYDUbMje7NrIGY3cDx+po1z16nHRDgsXBJYpCoUNudVe6z1fmjCBMVSb/agBedbT
tcigjoy65Fchn61XoWxosjLSk2DkF3+TjZn2d8Yt/C+rEJjjUkAKCo9JZljmneEdkqWj
tvHiYkEQRzQUw85rNB1P+BVRSURLyRRyGurjOh3QvSL5qts8mnFVNzYQDKNWl3+v6z3n
QhxA==
MIME-Version: 1.0
Received: by 10.60.170.9 with SMTP id ai9mr13903496oec.36.1349731496120; Mon,
08 Oct 2012 14:24:56 -0700 (PDT)
Received: by 10.182.139.67 with HTTP; Mon, 8 Oct 2012 14:24:56 -0700 (PDT)
In-Reply-To: <CAKS7gT6RDYYQXWi5ebfyYQPexh-b10f32OhqW+Ap7ubGged...@mail.gmail.com>
References: <CAMHV+dCPnVLN9a5c=OCL3h-cYMJ3W8m7tvL8CNZCSwRAw80...@mail.gmail.com>
<CAKS7gT6mmEnDTHay703K5jLsFXW4J-kkhuCTS9njcp88OiT...@mail.gmail.com>
<CAMHV+dDHboWEkxE_-LFJ7OhW6=UBfaiyYYUxaKG+4eH8tc9...@mail.gmail.com>
<CAMHV+dBRE0CvvoAuf+x9r96Ghh9vZL0UAc1Xo-5eU-2LEpA...@mail.gmail.com>
<CAKS7gT66taZitVwE_dFmP9LHmBuraAnaFmBZtCSPcgy_4iy...@mail.gmail.com>
<CAMHV+dDR8gtODJZodLd4EGafOeQTYPO2axXYKg1N2BgALLi...@mail.gmail.com>
<CAFFY=MFs-v_+c=m-H4VcCK0kaJq9ACSMF_YOZ5a_HZPKS89...@mail.gmail.com>
<CAMHV+dCtkTuV3mOOA3U9H=5swePE1pZR3AU-rk7tkOtTR67...@mail.gmail.com>
<CAKS7gT6WGPksw1PmSsdCCtsHUWtPPhUH413yVLKg66oA=4S...@mail.gmail.com>
<CAMHV+dD77g1u=Ra3uMwXT+iAm0XUjvP_bQ=nuV90FB6pAZp...@mail.gmail.com>
<CAMHV+dAuTYhKv08CifyXX7bjv_730ObDJ8=zR2J0M6MopYo...@mail.gmail.com>
<CAMHV+dCb+CsOLEwMCwSgsbsGGzFS4P2OoxGP7Q=gRMLSnFg...@mail.gmail.com>
<CAKS7gT5j9d_DoL+rzkdhnqAXX86gq97+j7ECBk7e5wYwhwk...@mail.gmail.com>
<CAMHV+dBxiDVT19cxV8p+jC7+kJknovtfhPiCRGBBEtmaz_S...@mail.gmail.com>
<CAKS7gT6RDYYQXWi5ebfyYQPexh-b10f32OhqW+Ap7ubGged...@mail.gmail.com>
Date: Mon, 8 Oct 2012 17:24:56 -0400
Message-ID: <CAMHV+dBCaSFSDvokbk+T6Z_FQh9J0V-6PrsEYgo-R3zcExJ...@mail.gmail.com>
Subject: Re: [pydata] Pandas row binning
From: Adam Hughes <hughesada...@gmail.com>
To: pydata@googlegroups.com
Content-Type: multipart/alternative; boundary=bcaec54a3e025e50b604cb92da66
--bcaec54a3e025e50b604cb92da66
Content-Type: text/plain; charset=ISO-8859-1
Ok thanks for the explanation. I will use Series.
On Mon, Oct 8, 2012 at 4:23 PM, Wouter Overmeire <loda...@gmail.com> wrote:
>
>
> 2012/10/8 Adam Hughes <hughesada...@gmail.com>
>
>> If you have time, can you clarify one last aspect of this for me. You
>> said:
>>
>> In [19]: df.groupby(lambda x: [0,0,0,0,0,1,1,1,1,1][x]).mean()
>> Out[19]:
>> 0 1 2 3 4 5 6
>> 7 8 9
>> 0 0.620309 0.674822 -0.154680 -1.150960 0.092368 0.160989 0.147444
>> 0.111853 -0.084692 -0.556367
>> 1 0.068149 -0.273187 0.388405 0.046407 -0.054020 -0.395190 0.509529
>> 0.095781 -0.152507 0.036615
>>
>> If I have row and column labels, this should take those in, no?
>
>
> If a function is used as 'by' argument for groupby. This function is
> called for each label of the axis on which groupby runs. The functions
> returns a groupby key/label, there will be as many groups as the number of
> unique returned vaues of this function when called by all the axis labels.
> The above example ran on axis=0 and got as input 0, 1, 2, 3, 4, ... It gave
> back two unique values 0 and 1, so there are two groups, labeled 0, 1
>
> Now for the example below, it runs on axis=1, this axis has labels 'A',
> 'B', 'C', 'D', 'E', ... and the lambda function get`s as input the same
> values 'A', 'B', 'C', 'D', 'E', ... this will result in a TypeError, you
> need an integer to index a list.
> I would use a Series here. If you want to use a lambda function, it needs
> to be converted such that it can take as input a string and output whatever
> groupby label you need.
>
>
>> In [14]: df = pd.DataFrame(randn(10,10),
>> columns=list(ascii_uppercase[:10]), index=list(ascii_lowercase[:10]))
>>
>> >>> df.groupby(lambda x: ['a','a','a','a','a','b','b','b','b','b'][x],
>> axis=1).mean()
>>
>> This is not correct, it gives an error. Am I misunderstanding?
>
>
>
> --
>
>
>
--bcaec54a3e025e50b604cb92da66
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Ok thanks for the explanation.=A0 I will use Series.<br><br><div class=3D"g=
mail_quote">On Mon, Oct 8, 2012 at 4:23 PM, Wouter Overmeire <span dir=3D"l=
tr"><<a href=3D"mailto:loda...@gmail.com" target=3D"_blank">lodagro@gmai=
l.com</a>></span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><br><br><div class=3D"gmail_quote"><div clas=
s=3D"im">2012/10/8 Adam Hughes <span dir=3D"ltr"><<a href=3D"mailto:hugh=
esada...@gmail.com" target=3D"_blank">hughesada...@gmail.com</a>></span>=
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
<div>If you have time, can you clarify one last aspect of this for me.=A0 Y=
ou said:<br><br><div><font face=3D"courier new, monospace">In [19]: df.grou=
pby(lambda x: [0,0,0,0,0,1,1,1,1,1][x]).mean()</font></div>
<div><font face=3D"courier new, monospace">Out[19]:=A0</font></div><div><fo=
nt face=3D"courier new, monospace">=A0 =A0 =A0 =A0 =A0 0 =A0 =A0 =A0 =A0 1 =
=A0 =A0 =A0 =A0 2 =A0 =A0 =A0 =A0 3 =A0 =A0 =A0 =A0 4 =A0 =A0 =A0 =A0 5 =A0=
=A0 =A0 =A0 6 =A0 =A0 =A0 =A0 7 =A0 =A0 =A0 =A0 8 =A0 =A0 =A0 =A0 9</font>=
</div><div>
<font face=3D"courier new, monospace">0 =A00.620309 =A00.674822 -0.154680 -=
1.150960 =A00.092368 =A00.160989 =A00.147444 =A00.111853 -0.084692 -0.55636=
7</font></div><div><font face=3D"courier new, monospace">1 =A00.068149 -0.2=
73187 =A00.388405 =A00.046407 -0.054020 -0.395190 =A00.509529 =A00.095781 -=
0.152507 =A00.036615</font></div>
<br></div>If I have row and column labels, this should take those in, no?</=
blockquote><div><br></div></div><div>If a function is used as 'by' =
argument for groupby. This function is called for each label of the axis on=
which groupby runs. The functions returns a groupby key/label, there will =
be as many groups as the number of unique returned vaues of this function w=
hen called by all the axis labels. The above example ran on axis=3D0 and go=
t as input 0, 1, 2, 3, 4, ... It gave back two unique values 0 and 1, so th=
ere are two groups, labeled 0, 1</div>
<div><br></div><div>Now for the example below, it runs on axis=3D1, this ax=
is has labels 'A', 'B', 'C', 'D', 'E=
9;, ... and the lambda function get`s as input the same values 'A',=
'B', 'C', 'D', 'E', ... this will result i=
n a TypeError, you need an integer to index a list.</div>
<div>I would use a Series here. If you want to use a lambda function, it ne=
eds to be converted such that it can take as input a string and output what=
ever groupby label you need.</div><div class=3D"im"><div>=A0</div><blockquo=
te class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc so=
lid;padding-left:1ex">
<div><div><font face=3D"courier new, monospace">In [14]: df =3D pd.DataFram=
e(randn(10,10), columns=3Dlist(ascii_uppercase[:10]), index=3Dlist(ascii_lo=
wercase[:10]))</font><br>
</div><br></div>>>> df.groupby(lambda x: ['a','a',=
'a','a','a','b','b','b',=
9;b','b'][x], axis=3D1).mean()<br><br>This is not correct, it g=
ives an error.=A0 Am I misunderstanding? =A0</blockquote>
<div>=A0</div></div></div>
<p></p>
-- <br>
=A0<br>
=A0<br>
</blockquote></div><br>
--bcaec54a3e025e50b604cb92da66--