Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion Pandas row binning

Received: by 10.50.159.201 with SMTP id xe9mr6767115igb.0.1349731497724;
        Mon, 08 Oct 2012 14:24:57 -0700 (PDT)
X-BeenThere: pydata@googlegroups.com
Received: by 10.50.51.234 with SMTP id n10ls6348427igo.3.canary; Mon, 08 Oct
 2012 14:24:56 -0700 (PDT)
Received: by 10.43.3.4 with SMTP id nw4mr10213869icb.13.1349731496357;
        Mon, 08 Oct 2012 14:24:56 -0700 (PDT)
Received: by 10.43.3.4 with SMTP id nw4mr10213868icb.13.1349731496338;
        Mon, 08 Oct 2012 14:24:56 -0700 (PDT)
Return-Path: <hughesada...@gmail.com>
Received: from mail-ob0-f176.google.com (mail-ob0-f176.google.com [209.85.214.176])
        by gmr-mx.google.com with ESMTPS id s9si1334135igw.0.2012.10.08.14.24.56
        (version=TLSv1/SSLv3 cipher=OTHER);
        Mon, 08 Oct 2012 14:24:56 -0700 (PDT)
Received-SPF: pass (google.com: domain of hughesada...@gmail.com designates 209.85.214.176 as permitted sender) client-ip=209.85.214.176;
Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of hughesada...@gmail.com designates 209.85.214.176 as permitted sender) smtp.mail=hughesada...@gmail.com; dkim=pass header...@gmail.com
Received: by mail-ob0-f176.google.com with SMTP id x4so3774759obh.7
        for <pydata@googlegroups.com>; Mon, 08 Oct 2012 14:24:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20120113;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        bh=SrhfKKQ+FJJB96wl47VnL1xpwBqllSvSbLsEPzDQ3/I=;
        b=bocwMe3BYlOBguk8e+NNmg1g2UqFZEya4P48LjBjOo1tmJVlasym3vhlhdBDSkND/l
         aEsUJo0dV7roz+l9PBmVFetXPMBdPhi8iXS7nQau1Kqxmxyp8a9f23YmV+geAeD+wIzt
         eYTexYDUbMje7NrIGY3cDx+po1z16nHRDgsXBJYpCoUNudVe6z1fmjCBMVSb/agBedbT
         tcigjoy65Fchn61XoWxosjLSk2DkF3+TjZn2d8Yt/C+rEJjjUkAKCo9JZljmneEdkqWj
         tvHiYkEQRzQUw85rNB1P+BVRSURLyRRyGurjOh3QvSL5qts8mnFVNzYQDKNWl3+v6z3n
         QhxA==
MIME-Version: 1.0
Received: by 10.60.170.9 with SMTP id ai9mr13903496oec.36.1349731496120; Mon,
 08 Oct 2012 14:24:56 -0700 (PDT)
Received: by 10.182.139.67 with HTTP; Mon, 8 Oct 2012 14:24:56 -0700 (PDT)
In-Reply-To: <CAKS7gT6RDYYQXWi5ebfyYQPexh-b10f32OhqW+Ap7ubGged...@mail.gmail.com>
References: <CAMHV+dCPnVLN9a5c=OCL3h-cYMJ3W8m7tvL8CNZCSwRAw80...@mail.gmail.com>
	<CAKS7gT6mmEnDTHay703K5jLsFXW4J-kkhuCTS9njcp88OiT...@mail.gmail.com>
	<CAMHV+dDHboWEkxE_-LFJ7OhW6=UBfaiyYYUxaKG+4eH8tc9...@mail.gmail.com>
	<CAMHV+dBRE0CvvoAuf+x9r96Ghh9vZL0UAc1Xo-5eU-2LEpA...@mail.gmail.com>
	<CAKS7gT66taZitVwE_dFmP9LHmBuraAnaFmBZtCSPcgy_4iy...@mail.gmail.com>
	<CAMHV+dDR8gtODJZodLd4EGafOeQTYPO2axXYKg1N2BgALLi...@mail.gmail.com>
	<CAFFY=MFs-v_+c=m-H4VcCK0kaJq9ACSMF_YOZ5a_HZPKS89...@mail.gmail.com>
	<CAMHV+dCtkTuV3mOOA3U9H=5swePE1pZR3AU-rk7tkOtTR67...@mail.gmail.com>
	<CAKS7gT6WGPksw1PmSsdCCtsHUWtPPhUH413yVLKg66oA=4S...@mail.gmail.com>
	<CAMHV+dD77g1u=Ra3uMwXT+iAm0XUjvP_bQ=nuV90FB6pAZp...@mail.gmail.com>
	<CAMHV+dAuTYhKv08CifyXX7bjv_730ObDJ8=zR2J0M6MopYo...@mail.gmail.com>
	<CAMHV+dCb+CsOLEwMCwSgsbsGGzFS4P2OoxGP7Q=gRMLSnFg...@mail.gmail.com>
	<CAKS7gT5j9d_DoL+rzkdhnqAXX86gq97+j7ECBk7e5wYwhwk...@mail.gmail.com>
	<CAMHV+dBxiDVT19cxV8p+jC7+kJknovtfhPiCRGBBEtmaz_S...@mail.gmail.com>
	<CAKS7gT6RDYYQXWi5ebfyYQPexh-b10f32OhqW+Ap7ubGged...@mail.gmail.com>
Date: Mon, 8 Oct 2012 17:24:56 -0400
Message-ID: <CAMHV+dBCaSFSDvokbk+T6Z_FQh9J0V-6PrsEYgo-R3zcExJ...@mail.gmail.com>
Subject: Re: [pydata] Pandas row binning
From: Adam Hughes <hughesada...@gmail.com>
To: pydata@googlegroups.com
Content-Type: multipart/alternative; boundary=bcaec54a3e025e50b604cb92da66

--bcaec54a3e025e50b604cb92da66
Content-Type: text/plain; charset=ISO-8859-1

Ok thanks for the explanation.  I will use Series.

On Mon, Oct 8, 2012 at 4:23 PM, Wouter Overmeire <loda...@gmail.com> wrote:

>
>
> 2012/10/8 Adam Hughes <hughesada...@gmail.com>
>
>> If you have time, can you clarify one last aspect of this for me.  You
>> said:
>>
>> In [19]: df.groupby(lambda x: [0,0,0,0,0,1,1,1,1,1][x]).mean()
>> Out[19]:
>>           0         1         2         3         4         5         6
>>       7         8         9
>> 0  0.620309  0.674822 -0.154680 -1.150960  0.092368  0.160989  0.147444
>>  0.111853 -0.084692 -0.556367
>> 1  0.068149 -0.273187  0.388405  0.046407 -0.054020 -0.395190  0.509529
>>  0.095781 -0.152507  0.036615
>>
>> If I have row and column labels, this should take those in, no?
>
>
> If a function is used as 'by' argument for groupby. This function is
> called for each label of the axis on which groupby runs. The functions
> returns a groupby key/label, there will be as many groups as the number of
> unique returned vaues of this function when called by all the axis labels.
> The above example ran on axis=0 and got as input 0, 1, 2, 3, 4, ... It gave
> back two unique values 0 and 1, so there are two groups, labeled 0, 1
>
> Now for the example below, it runs on axis=1, this axis has labels 'A',
> 'B', 'C', 'D', 'E', ... and the lambda function get`s as input the same
> values 'A', 'B', 'C', 'D', 'E', ... this will result in a TypeError, you
> need an integer to index a list.
> I would use a Series here. If you want to use a lambda function, it needs
> to be converted such that it can take as input a string and output whatever
> groupby label you need.
>
>
>> In [14]: df = pd.DataFrame(randn(10,10),
>> columns=list(ascii_uppercase[:10]), index=list(ascii_lowercase[:10]))
>>
>> >>> df.groupby(lambda x: ['a','a','a','a','a','b','b','b','b','b'][x],
>> axis=1).mean()
>>
>> This is not correct, it gives an error.  Am I misunderstanding?
>
>
>
> --
>
>
>

--bcaec54a3e025e50b604cb92da66
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Ok thanks for the explanation.=A0 I will use Series.<br><br><div class=3D"g=
mail_quote">On Mon, Oct 8, 2012 at 4:23 PM, Wouter Overmeire <span dir=3D"l=
tr">&lt;<a href=3D"mailto:loda...@gmail.com" target=3D"_blank">lodagro@gmai=
l.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><br><br><div class=3D"gmail_quote"><div clas=
s=3D"im">2012/10/8 Adam Hughes <span dir=3D"ltr">&lt;<a href=3D"mailto:hugh=
esada...@gmail.com" target=3D"_blank">hughesada...@gmail.com</a>&gt;</span>=
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
<div>If you have time, can you clarify one last aspect of this for me.=A0 Y=
ou said:<br><br><div><font face=3D"courier new, monospace">In [19]: df.grou=
pby(lambda x: [0,0,0,0,0,1,1,1,1,1][x]).mean()</font></div>
<div><font face=3D"courier new, monospace">Out[19]:=A0</font></div><div><fo=
nt face=3D"courier new, monospace">=A0 =A0 =A0 =A0 =A0 0 =A0 =A0 =A0 =A0 1 =
=A0 =A0 =A0 =A0 2 =A0 =A0 =A0 =A0 3 =A0 =A0 =A0 =A0 4 =A0 =A0 =A0 =A0 5 =A0=
 =A0 =A0 =A0 6 =A0 =A0 =A0 =A0 7 =A0 =A0 =A0 =A0 8 =A0 =A0 =A0 =A0 9</font>=
</div><div>



<font face=3D"courier new, monospace">0 =A00.620309 =A00.674822 -0.154680 -=
1.150960 =A00.092368 =A00.160989 =A00.147444 =A00.111853 -0.084692 -0.55636=
7</font></div><div><font face=3D"courier new, monospace">1 =A00.068149 -0.2=
73187 =A00.388405 =A00.046407 -0.054020 -0.395190 =A00.509529 =A00.095781 -=
0.152507 =A00.036615</font></div>


<br></div>If I have row and column labels, this should take those in, no?</=
blockquote><div><br></div></div><div>If a function is used as &#39;by&#39; =
argument for groupby. This function is called for each label of the axis on=
 which groupby runs. The functions returns a groupby key/label, there will =
be as many groups as the number of unique returned vaues of this function w=
hen called by all the axis labels. The above example ran on axis=3D0 and go=
t as input 0, 1, 2, 3, 4, ... It gave back two unique values 0 and 1, so th=
ere are two groups, labeled 0, 1</div>

<div><br></div><div>Now for the example below, it runs on axis=3D1, this ax=
is has labels &#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;D&#39;, &#39;E&#3=
9;, ... and the lambda function get`s as input the same values &#39;A&#39;,=
 &#39;B&#39;, &#39;C&#39;, &#39;D&#39;, &#39;E&#39;, ... this will result i=
n a TypeError, you need an integer to index a list.</div>

<div>I would use a Series here. If you want to use a lambda function, it ne=
eds to be converted such that it can take as input a string and output what=
ever groupby label you need.</div><div class=3D"im"><div>=A0</div><blockquo=
te class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc so=
lid;padding-left:1ex">

<div><div><font face=3D"courier new, monospace">In [14]: df =3D pd.DataFram=
e(randn(10,10), columns=3Dlist(ascii_uppercase[:10]), index=3Dlist(ascii_lo=
wercase[:10]))</font><br>
</div><br></div>&gt;&gt;&gt; df.groupby(lambda x: [&#39;a&#39;,&#39;a&#39;,=
&#39;a&#39;,&#39;a&#39;,&#39;a&#39;,&#39;b&#39;,&#39;b&#39;,&#39;b&#39;,&#3=
9;b&#39;,&#39;b&#39;][x], axis=3D1).mean()<br><br>This is not correct, it g=
ives an error.=A0 Am I misunderstanding? =A0</blockquote>

<div>=A0</div></div></div>

<p></p>

-- <br>
=A0<br>
=A0<br>
</blockquote></div><br>

--bcaec54a3e025e50b604cb92da66--