Let me share one more happy bit :)

0 views
Skip to first unread message

Valery Khamenya

unread,
Dec 17, 2009, 11:44:48 AM12/17/09
to Unladen Swallow
Hi,

briefly: u-s worked out where CPython stuck incredibly.

Details:

My use case was as simple as:

1. unpickle a large dictionary 
2. make a minor change 
3. cPickle the dictionary back to disk.

where "large dictionary" means:

  60K elements, 
  each element's value is in average 4K ASCII chars, 
  the keys are up to 20 chars.

That is, dict's data size is about 250Mb only.

I use 64bit Ubuntu with 5Gb RAM. When CPython comes to item 3 (pickleing) it reaches the RAM capacity. During the pickleing CPython creates a 1Gb swap. 

Here CPython has reached both the ceiling of my RAM and my patience :)

The swapping killed the performance completely. After half an *hour* of waiting I killed the process and tried u-s.

IT WORKED OUT A GREAT DEAL BETTER.

The u-s process has eaten about 50% RAM only and finished all job in a half a *minute*.

u-s team, thanks!!

best regards
--
Valery

sk...@pobox.com

unread,
Dec 17, 2009, 11:52:03 AM12/17/09
to Valery Khamenya, Unladen Swallow

Valery> briefly: u-s worked out where CPython stuck incredibly.

Valery> Details:

Valery> My use case was as simple as:

Valery> 1. unpickle a large dictionary
Valery> 2. make a minor change
Valery> 3. cPickle the dictionary back to disk.


Valery> where "large dictionary" means:

Valery> 60K elements,
Valery> each element's value is in average 4K ASCII chars,
Valery> the keys are up to 20 chars.

Valery> That is, dict's data size is about 250Mb only.

Valery,

Can you provide a simple test case to support a report for the Python bug
tracker?

Thx,

--
Skip Montanaro - sk...@pobox.com - http://www.smontanaro.net/

Alex Gaynor

unread,
Dec 17, 2009, 11:53:55 AM12/17/09
to sk...@pobox.com, Valery Khamenya, Unladen Swallow

It may be worth it to benchmark your script against the python 2.7
alpha that came out. That should include a number of backported
improvements to pickle which it seems would help you tremendously.

Alex

--
"I disapprove of what you say, but I will defend to the death your
right to say it." -- Voltaire
"The people's good is the highest law." -- Cicero
"Code can always be simpler than you think, but never as simple as you
want" -- Me

Collin Winter

unread,
Dec 17, 2009, 12:05:56 PM12/17/09
to Alex Gaynor, sk...@pobox.com, Valery Khamenya, Unladen Swallow
On Thu, Dec 17, 2009 at 8:53 AM, Alex Gaynor <alex....@gmail.com> wrote:
> On Thu, Dec 17, 2009 at 11:52 AM,  <sk...@pobox.com> wrote:
>>
>>    Valery> briefly: u-s worked out where CPython stuck incredibly.
>>
>>    Valery> Details:
>>
>>    Valery> My use case was as simple as:
>>
>>    Valery> 1. unpickle a large dictionary
>>    Valery> 2. make a minor change
>>    Valery> 3. cPickle the dictionary back to disk.
>>
>>
>>    Valery> where "large dictionary" means:
>>
>>    Valery>   60K elements,
>>    Valery>   each element's value is in average 4K ASCII chars,
>>    Valery>   the keys are up to 20 chars.
>>
>>    Valery> That is, dict's data size is about 250Mb only.
>>
>> Valery,
>>
>> Can you provide a simple test case to support a report for the Python bug
>> tracker?
>>
>> Thx,
>>
>> --
>> Skip Montanaro - sk...@pobox.com - http://www.smontanaro.net/
>>
>
> It may be worth it to benchmark your script against the python 2.7
> alpha that came out.  That should include a number of backported
> improvements to pickle which it seems would help you tremendously.

Some of those patches have yet to land, but yes: cPickle in Unladen
Swallow should easily blow the doors off the CPython 2.x version.
There will be a renewed pushed to land all the cPickle patches in the
upcoming merge process.

I'm glad the work we put into cPickle is paying off :) That work was
originally done for YouTube.

Collin

Valery Khamenya

unread,
Dec 17, 2009, 1:36:06 PM12/17/09
to sk...@pobox.com, Unladen Swallow
Hi Skip,

> Can you provide a simple test case to support a report for the Python bug
> tracker?

Sure. A simplified version of my code is attached.

Here we go with CPython run (I killed the test after 10 minutes and 3Gb swap):
~/wrk/unladen-trunk$ date;python2.6
unladen-bmarks/performance/bm_huge_pickle.py;date
Thu Dec 17 19:00:44 CET 2009
dict is created
dict is pickled
dict is unpickled
dict is changed.
^C^C^C
Killed
Thu Dec 17 19:10:37 CET 2009

Here we go with u-s run:
~/wrk/unladen-trunk$ date;./python
unladen-bmarks/performance/bm_huge_pickle.py;date
Thu Dec 17 18:59:31 CET 2009
dict is created
dict is pickled
dict is unpickled
dict is changed.
dict is pickled
Thu Dec 17 18:59:52 CET 2009

20 sec, no swap.

No need for any fine grain benchmarking ;)

REMARKS:

1. if one changes the line
d[k] = d[k].split('CG')
to
d[k] = tuple(d[k].split('CG'))
then CPython isn't that crazy anymore. However if I'd do such fix in
my app then I go crazy ;)

2. Perhaps, you've got something other than 64Bit box with 5Gb RAM --
just change the n_elements parameter.

best regards
--
Valery

bm_huge_pickle.py
Reply all
Reply to author
Forward
0 new messages