1. unpickle a large dictionary2. make a minor change3. cPickle the dictionary back to disk.
Valery> Details:
Valery> My use case was as simple as:
Valery> 1. unpickle a large dictionary
Valery> 2. make a minor change
Valery> 3. cPickle the dictionary back to disk.
Valery> where "large dictionary" means:
Valery> 60K elements,
Valery> each element's value is in average 4K ASCII chars,
Valery> the keys are up to 20 chars.
Valery> That is, dict's data size is about 250Mb only.
Valery,
Can you provide a simple test case to support a report for the Python bug
tracker?
Thx,
--
Skip Montanaro - sk...@pobox.com - http://www.smontanaro.net/
It may be worth it to benchmark your script against the python 2.7
alpha that came out. That should include a number of backported
improvements to pickle which it seems would help you tremendously.
Alex
--
"I disapprove of what you say, but I will defend to the death your
right to say it." -- Voltaire
"The people's good is the highest law." -- Cicero
"Code can always be simpler than you think, but never as simple as you
want" -- Me
Some of those patches have yet to land, but yes: cPickle in Unladen
Swallow should easily blow the doors off the CPython 2.x version.
There will be a renewed pushed to land all the cPickle patches in the
upcoming merge process.
I'm glad the work we put into cPickle is paying off :) That work was
originally done for YouTube.
Collin
> Can you provide a simple test case to support a report for the Python bug
> tracker?
Sure. A simplified version of my code is attached.
Here we go with CPython run (I killed the test after 10 minutes and 3Gb swap):
~/wrk/unladen-trunk$ date;python2.6
unladen-bmarks/performance/bm_huge_pickle.py;date
Thu Dec 17 19:00:44 CET 2009
dict is created
dict is pickled
dict is unpickled
dict is changed.
^C^C^C
Killed
Thu Dec 17 19:10:37 CET 2009
Here we go with u-s run:
~/wrk/unladen-trunk$ date;./python
unladen-bmarks/performance/bm_huge_pickle.py;date
Thu Dec 17 18:59:31 CET 2009
dict is created
dict is pickled
dict is unpickled
dict is changed.
dict is pickled
Thu Dec 17 18:59:52 CET 2009
20 sec, no swap.
No need for any fine grain benchmarking ;)
REMARKS:
1. if one changes the line
d[k] = d[k].split('CG')
to
d[k] = tuple(d[k].split('CG'))
then CPython isn't that crazy anymore. However if I'd do such fix in
my app then I go crazy ;)
2. Perhaps, you've got something other than 64Bit box with 5Gb RAM --
just change the n_elements parameter.
best regards
--
Valery