Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
pandas.pivot_table indexing problem/bug:
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  9 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Gagi  
View profile  
 More options Nov 16 2012, 6:53 pm
From: Gagi <dragol...@gmail.com>
Date: Fri, 16 Nov 2012 15:53:38 -0800 (PST)
Local: Fri, Nov 16 2012 6:53 pm
Subject: pandas.pivot_table indexing problem/bug:

Hi pandas gurus,

I started using pandas hoping to get some large scale fast pivoting
capability to convert long form data coming out of a database to a nice
table. I have 4 columns as keys and one numerical value to average across
possibly ~2500 unique column names. The CSV file is ~3-5M rows. I have
Constructed a similar test DataFrame below to show the error. Any help is
appreciated!

With NUM_ROWS=100:      It works :) But I have 2.5M rows.
With NUM_ROWS=1000:    IndexError: index out of range for array
With NUM_ROWS=10000:  ValueError: negative dimensions are not allowed

With NUM_ROWS=10000000 & pivot.table(rows=['A', 'B', 'C'], it works! (Maxed
out at 24GB ram :D)

Out:
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 381231 entries, (0, 0, -7) to (99, 299, 6)
Columns: 3000 entries, 0 to 2999
dtypes: float64(3000)

pandas.__verson__ = '0.9.1rc1'
numpy.__version__ = '1.6.2'

In [1]:

import pandas as pd

import numpy as np

In [2]:

# Generate Long File & Test Pivot

NUM_ROWS = 1000

df = pd.DataFrame({'A' : np.random.randint(100, size=NUM_ROWS), 'B' : np.random.randint(300, size=NUM_ROWS), 'C' : np.random.randint(-7, 7, size=NUM_ROWS), 'D' : np.random.randint(-19,19, size=NUM_ROWS),'E' : np.random.randint(3000, size=NUM_ROWS),'F' : np.random.randn(NUM_ROWS)})

df_pivoted = df.pivot_table(rows=['A', 'B', 'C', 'D'], cols='E', values='F')

df_pivoted

--------------------------------------------------------------------------- IndexError                                Traceback (most recent call last)<ipython-input-2-21cdba88317b> in <module>()      2 NUM_ROWS = 1000      3 df = pd.DataFrame({'A' : np.random.randint(100, size=NUM_ROWS), 'B' : np.random.randint(300, size=NUM_ROWS), 'C' : np.random.randint(-7, 7, size=NUM_ROWS), 'D' : np.random.randint(-19,19, size=NUM_ROWS),'E' : np.random.randint(3000, size=NUM_ROWS),'F' : np.random.randn(NUM_ROWS)})----> 4 df_pivoted = df.pivot_table(rows=['A', 'B', 'C', 'D'], cols='E', values='F')      5 df_pivoted
C:\Python27\lib\site-packages\pandas\tools\pivot.pyc in pivot_table(data, values, rows, cols, aggfunc, fill_value, margins)    103                   for i in range(len(rows), len(keys))]    104 --> 105     table = agged.unstack(to_unstack)    106     107     if isinstance(table, DataFrame):
C:\Python27\lib\site-packages\pandas\core\frame.pyc in unstack(self, level)   3694         """   3695         from pandas.core.reshape import unstack-> 3696         return unstack(self, level)   3697    3698     #----------------------------------------------------------------------
C:\Python27\lib\site-packages\pandas\core\reshape.pyc in unstack(obj, level)    357 def unstack(obj, level):    358     if isinstance(level, (tuple, list)):--> 359         return _unstack_multiple(obj, level)    360     361     if isinstance(obj, DataFrame):
C:\Python27\lib\site-packages\pandas\core\reshape.pyc in _unstack_multiple(data, clocs)    260                           columns=data.columns)    261 --> 262         unstacked = dummy.unstack('__placeholder__')    263         if isinstance(unstacked, Series):    264             unstcols = unstacked.index
C:\Python27\lib\site-packages\pandas\core\frame.pyc in unstack(self, level)   3694         """   3695         from pandas.core.reshape import unstack-> 3696         return unstack(self, level)   3697    3698     #----------------------------------------------------------------------
C:\Python27\lib\site-packages\pandas\core\reshape.pyc in unstack(obj, level)    361     if isinstance(obj, DataFrame):    362         if isinstance(obj.index, MultiIndex):--> 363             return _unstack_frame(obj, level)    364         else:    365             return obj.T.stack(dropna=False)
C:\Python27\lib\site-packages\pandas\core\reshape.pyc in _unstack_frame(obj, level)    399     else:    400         unstacker = _Unstacker(obj.values, obj.index, level=level,--> 401                                value_columns=obj.columns)    402         return unstacker.get_result()    403
C:\Python27\lib\site-packages\pandas\core\reshape.pyc in __init__(self, values, index, level, value_columns)     78      79         self._make_sorted_values_labels()---> 80         self._make_selectors()     81      82     def _make_sorted_values_labels(self):
C:\Python27\lib\site-packages\pandas\core\reshape.pyc in _make_selectors(self)    117         selector = self.sorted_labels[-1] + stride * group_index    118         mask = np.zeros(np.prod(self.full_shape), dtype=bool)--> 119         mask.put(selector, True)    120     121         # compress labels
IndexError: index out of range for array


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gagi  
View profile  
 More options Nov 16 2012, 7:07 pm
From: Gagi <dragol...@gmail.com>
Date: Fri, 16 Nov 2012 16:07:53 -0800 (PST)
Local: Fri, Nov 16 2012 7:07 pm
Subject: Re: pandas.pivot_table indexing problem/bug:

Here is the code not clipped:

import pandas as pd
import numpy as np

# Generate Long File & Test Pivot
NUM_ROWS = 1000000
df = pd.DataFrame({'A' : np.random.randint(100, size=NUM_ROWS),
                                'B' : np.random.randint(300,
size=NUM_ROWS),
                                'C' : np.random.randint(-7, 7,
size=NUM_ROWS),
                                'D' : np.random.randint(-19,19,
size=NUM_ROWS),
                                'E' : np.random.randint(3000,
size=NUM_ROWS),
                                'F' : np.random.randn(NUM_ROWS)})

df_pivoted = df.pivot_table(rows=['A', 'B', 'C'], cols='E', values='F')
df_pivoted


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gagi  
View profile  
 More options Nov 16 2012, 7:13 pm
From: Gagi <dragol...@gmail.com>
Date: Fri, 16 Nov 2012 16:13:19 -0800 (PST)
Local: Fri, Nov 16 2012 7:13 pm
Subject: Re: pandas.pivot_table indexing problem/bug:

And Here is the code that fails. Note that in the code below I am pivoting
on combinations of 4 Columns A, B, C, D, and this fails for only 1000 rows,
but the above code works on 1M rows.

import pandas as pd
import numpy as np

# Generate Long File & Test Pivot
NUM_ROWS = 1000
df = pd.DataFrame({'A' : np.random.randint(100, size=NUM_ROWS),
                                'B' : np.random.randint(300,
size=NUM_ROWS),
                                'C' : np.random.randint(-7, 7,
size=NUM_ROWS),
                                'D' : np.random.randint(-19,19,
size=NUM_ROWS),
                                'E' : np.random.randint(3000,
size=NUM_ROWS),
                                'F' : np.random.randn(NUM_ROWS)})

df_pivoted = df.pivot_table(rows=['A', 'B', 'C', 'D'], cols='E', values='F')
df_pivoted


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Wes McKinney  
View profile  
 More options Nov 16 2012, 9:33 pm
From: Wes McKinney <w...@lambdafoundry.com>
Date: Fri, 16 Nov 2012 21:33:06 -0500
Local: Fri, Nov 16 2012 9:33 pm
Subject: Re: [pydata] Re: pandas.pivot_table indexing problem/bug:

Hi Gagi,

I'm fairly certain this is an issue in the unstack algorithm. I'm
surprised it never came up until now. Creating a GitHub issue; I will
address as soon as I can.

http://github.com/pydata/pandas/issues/2278

Thanks,
Wes


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gagi Drmanac  
View profile  
 More options Nov 17 2012, 8:18 pm
From: Gagi Drmanac <dragol...@gmail.com>
Date: Sat, 17 Nov 2012 17:18:06 -0800
Local: Sat, Nov 17 2012 8:18 pm
Subject: Re: [pydata] Re: pandas.pivot_table indexing problem/bug:

Thanks Wes!

I'll gladly test it out once the issue has been tracked down.

Thanks,
-Gagi

Hi Gagi,


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Wes McKinney  
View profile  
 More options Nov 22 2012, 6:09 pm
From: Wes McKinney <w...@lambdafoundry.com>
Date: Thu, 22 Nov 2012 18:09:22 -0500
Local: Thurs, Nov 22 2012 6:09 pm
Subject: Re: [pydata] Re: pandas.pivot_table indexing problem/bug:

hi Gagi,

I fixed this issue today-- unstack is also much faster now in a lot of
cases. Keep in mind that the pivot table will be of size N vs K where
N is the number of observed key-tuples in the rows and K the number in
the columns. That could potentially be very large depending on your
data set.

- Wes


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gagi Drmanac  
View profile  
 More options Nov 22 2012, 6:11 pm
From: Gagi Drmanac <dragol...@gmail.com>
Date: Thu, 22 Nov 2012 15:11:50 -0800
Local: Thurs, Nov 22 2012 6:11 pm
Subject: Re: [pydata] Re: pandas.pivot_table indexing problem/bug:
Thanks Wes,

I'll give it a try. Is the new code available in one of the nightly
build binaries?

Thanks,
-Gagi

On Nov 22, 2012, at 3:09 PM, Wes McKinney <w...@lambdafoundry.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Wes McKinney  
View profile  
 More options Nov 22 2012, 6:23 pm
From: Wes McKinney <w...@lambdafoundry.com>
Date: Thu, 22 Nov 2012 18:23:39 -0500
Local: Thurs, Nov 22 2012 6:23 pm
Subject: Re: [pydata] Re: pandas.pivot_table indexing problem/bug:

(pls bottom post if you would!)

There appears to be an issue with the nightly binaries process and
they haven't updated for the last 10 days. Will have to look into it
next week-- you'd have to build from source to get it earlier.

- Wes


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gagi  
View profile  
 More options Nov 27 2012, 1:33 pm
From: Gagi <dragol...@gmail.com>
Date: Tue, 27 Nov 2012 10:33:03 -0800 (PST)
Local: Tues, Nov 27 2012 1:33 pm
Subject: Re: [pydata] Re: pandas.pivot_table indexing problem/bug:

Thanks for looking into this. I'll check the binaries folder for the update.

-Gagi


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »