I started using pandas hoping to get some large scale fast pivoting capability to convert long form data coming out of a database to a nice table. I have 4 columns as keys and one numerical value to average across possibly ~2500 unique column names. The CSV file is ~3-5M rows. I have Constructed a similar test DataFrame below to show the error. Any help is appreciated!
With NUM_ROWS=100: It works :) But I have 2.5M rows. With NUM_ROWS=1000: IndexError: index out of range for array With NUM_ROWS=10000: ValueError: negative dimensions are not allowed
With NUM_ROWS=10000000 & pivot.table(rows=['A', 'B', 'C'], it works! (Maxed out at 24GB ram :D)
And Here is the code that fails. Note that in the code below I am pivoting on combinations of 4 Columns A, B, C, D, and this fails for only 1000 rows, but the above code works on 1M rows.
On Fri, Nov 16, 2012 at 7:13 PM, Gagi <dragol...@gmail.com> wrote:
> And Here is the code that fails. Note that in the code below I am pivoting
> on combinations of 4 Columns A, B, C, D, and this fails for only 1000 rows,
> but the above code works on 1M rows.
I'm fairly certain this is an issue in the unstack algorithm. I'm
surprised it never came up until now. Creating a GitHub issue; I will
address as soon as I can.
> I'm fairly certain this is an issue in the unstack algorithm. I'm
> surprised it never came up until now. Creating a GitHub issue; I will
> address as soon as I can.
On Sat, Nov 17, 2012 at 8:18 PM, Gagi Drmanac <dragol...@gmail.com> wrote:
> Thanks Wes!
> I'll gladly test it out once the issue has been tracked down.
> Thanks,
> -Gagi
>> Hi Gagi,
>> I'm fairly certain this is an issue in the unstack algorithm. I'm
>> surprised it never came up until now. Creating a GitHub issue; I will
>> address as soon as I can.
I fixed this issue today-- unstack is also much faster now in a lot of
cases. Keep in mind that the pivot table will be of size N vs K where
N is the number of observed key-tuples in the rows and K the number in
the columns. That could potentially be very large depending on your
data set.
> On Sat, Nov 17, 2012 at 8:18 PM, Gagi Drmanac <dragol...@gmail.com> wrote:
>> Thanks Wes!
>> I'll gladly test it out once the issue has been tracked down.
>> Thanks,
>> -Gagi
>>> Hi Gagi,
>>> I'm fairly certain this is an issue in the unstack algorithm. I'm
>>> surprised it never came up until now. Creating a GitHub issue; I will
>>> address as soon as I can.
> I fixed this issue today-- unstack is also much faster now in a lot of
> cases. Keep in mind that the pivot table will be of size N vs K where
> N is the number of observed key-tuples in the rows and K the number in
> the columns. That could potentially be very large depending on your
> data set.
On Thu, Nov 22, 2012 at 6:11 PM, Gagi Drmanac <dragol...@gmail.com> wrote:
> Thanks Wes,
> I'll give it a try. Is the new code available in one of the nightly
> build binaries?
> Thanks,
> -Gagi
> On Nov 22, 2012, at 3:09 PM, Wes McKinney <w...@lambdafoundry.com> wrote:
>> On Sat, Nov 17, 2012 at 8:18 PM, Gagi Drmanac <dragol...@gmail.com> wrote:
>>> Thanks Wes!
>>> I'll gladly test it out once the issue has been tracked down.
>>> Thanks,
>>> -Gagi
>>>> Hi Gagi,
>>>> I'm fairly certain this is an issue in the unstack algorithm. I'm
>>>> surprised it never came up until now. Creating a GitHub issue; I will
>>>> address as soon as I can.
>> I fixed this issue today-- unstack is also much faster now in a lot of
>> cases. Keep in mind that the pivot table will be of size N vs K where
>> N is the number of observed key-tuples in the rows and K the number in
>> the columns. That could potentially be very large depending on your
>> data set.
>> - Wes
>> --
> --
(pls bottom post if you would!)
There appears to be an issue with the nightly binaries process and
they haven't updated for the last 10 days. Will have to look into it
next week-- you'd have to build from source to get it earlier.
On Thursday, November 22, 2012 3:23:40 PM UTC-8, Wes McKinney wrote:
> On Thu, Nov 22, 2012 at 6:11 PM, Gagi Drmanac <drag...@gmail.com<javascript:>> > wrote: > > Thanks Wes,
> > I'll give it a try. Is the new code available in one of the nightly > > build binaries?
> > Thanks, > > -Gagi
> > On Nov 22, 2012, at 3:09 PM, Wes McKinney <w...@lambdafoundry.com<javascript:>> > wrote:
> >> On Sat, Nov 17, 2012 at 8:18 PM, Gagi Drmanac <drag...@gmail.com<javascript:>> > wrote: > >>> Thanks Wes!
> >>> I'll gladly test it out once the issue has been tracked down.
> >>> Thanks, > >>> -Gagi
> >>>> Hi Gagi,
> >>>> I'm fairly certain this is an issue in the unstack algorithm. I'm > >>>> surprised it never came up until now. Creating a GitHub issue; I will > >>>> address as soon as I can.
> >> I fixed this issue today-- unstack is also much faster now in a lot of > >> cases. Keep in mind that the pivot table will be of size N vs K where > >> N is the number of observed key-tuples in the rows and K the number in > >> the columns. That could potentially be very large depending on your > >> data set.
> >> - Wes
> >> --
> > --
> (pls bottom post if you would!)
> There appears to be an issue with the nightly binaries process and > they haven't updated for the last 10 days. Will have to look into it > next week-- you'd have to build from source to get it earlier.
> - Wes
Thanks for looking into this. I'll check the binaries folder for the update.