Re: [PANDA] pandas: trouble transforming dataframe into aggregated dataframe

15 views
Skip to first unread message

Christopher Groskopf

unread,
Jun 5, 2015, 10:47:06 PM6/5/15
to panda-pro...@googlegroups.com
Hey Pallav, this is the mailing list for PANDA: the data warehousing tool. You're looking for *pandas* the data processing library.

C

On Fri, Jun 5, 2015 at 5:16 PM, Pallav Gupta <palla...@gmail.com> wrote:

Cross-posted in SO: http://stackoverflow.com/questions/30676476/pandas-trouble-transforming-dataframe-into-aggregated-dataframe

I'm a new user of pandas. I have been going through the help docs, and trying various experiments (groupby(), multiindex, value_cuounts()). But I am not able to get the desired end result.

My dataframe is as follows (it is time indexed):

DATE, GROUP, X, Y, STATUS
2014-01-01  A  0 0 PASS
2014-01-01  A  0 1 FAIL
2014-01-01  A  1 0 PASS
2014-01-02  B  0 0 PASS
2014-01-02  B  0 1 PASS
2014-01-02  B  1 1 FAIL
....

The 'STATUS' column is of dtype=category. I would like to end up with a new dataframe that looks like as follows:

DATE GROUP STATUS  PCT
2014-01-01 A PASS 0.667
2014-01-01 A FAIL 0.333
2014-01-02 B PASS 0.667
2014-01-02 B FAIL 0.333

Essentially, for each group, I want to calculate the % of all status.

I have tried df.groupby('GROUP').value_counts() followed by divide by sum() to calculate the percentages. That works OK. However, I lose the index information and I don't know to add it to the new dataframe to achieve the desired output above. There must be some easy way in pandas to do it, but I'm not seeing it.

Any suggestions are appreciated. Thanks.

--
You received this message because you are subscribed to the Google Groups "PANDA Project Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to panda-project-u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages