pandas DataFrame boxplot method mixes up data - bug?

61 views
Skip to first unread message

Gregor Hendel

unread,
Oct 7, 2013, 4:20:38 AM10/7/13
to pystat...@googlegroups.com
Hi,

I'm new to this group and apologize if I'm missing a previous post on the same topic.
 
I often have equally shaped Data Frames from multiple sources which I want to compare. I usually use the concat-function
to get a hierarchically indexed DataFrame.

The use of the boxplot method together with an additional grouping argument causes the data frame to mix up the data
and show four equal plots. The code example below illustrates this behaviour.

I also have troubles using the boxplot method together with the 'ax'-keyword as argument. Instead of filling my four axes
as I would exspect, the final figure contains only one axis.

I also provide a working example which does what I need.

Should this be reported as a bug in pandas somewhere else?

I'm working with pandas version 0.11.0 in an Ipython notebook 1.0.0 WITHOUT inlined pylab (because I thought this might be cause of the trouble)
I appreciate all comments about this issue.

Best regards,
Gregor

Now, here is the code:

import matplotlib.pylab as plt
import pandas as pd
import numpy as np

# create 4 data frames with 2 columns, each with 20 entries. mean is shifted to make difference clearer
dfs = [pd.DataFrame(np.random.randn(20,2) + (10 * i)) for i in xrange(4)]

# create a big, hierarchically indexed data frame
newdf = pd.concat(dfs, axis=1, keys=list("ABCD"))
keys = zip(list("ABCD"), [1] * 4)
cat = pd.qcut(dfs[0][0], 3)

# the standard boxplot method (without grouping) works as exspected
newdf[keys].boxplot()
plt.show()

# this is what I was expecting to see all the time
grouped = newdf.groupby(cat.labels)
fig, axes = plt.subplots(2,2,figsize=(18,10), sharey=True)
for idx, letter in enumerate(list("ABCD")):
    ax = axes[idx / 2][idx % 2]
    data = [grouped.get_group(group)[[(letter, 1)]] for group in xrange(3)]
    ax.boxplot(data)
    ax.set_title("Data from frame " + letter)
plt.show()

# first example: all bars look the same
newdf[keys].boxplot(by=cat.labels)
plt.show()

# second example: only the last box plot is drawn
fig = plt.figure(figsize=(18,10))
axes = [fig.add_subplot(2,2,i) for i in xrange(4)]
for i in xrange(4):
    dfs[i].boxplot(column=1, by=cat.labels, ax=axes[i])
plt.show()


Reply all
Reply to author
Forward
0 new messages