Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

pickle.load() extremely slow performance

2,173 views
Skip to first unread message

Jim Garrison

unread,
Mar 20, 2009, 6:25:37 PM3/20/09
to
I'm converting a Perl system to Python, and have run into a severe
performance problem with pickle.

One facet of the system involves scanning and loading into memory a
couple of parallel directory trees containing OTO 10^4 files. The
trees don't change during development/testing and the scan takes 30-40
seconds, so to save time I cache the loaded tree structure to disk, in
Perl with module Storable, and in Python with pickle.

In Perl, the save operation produces a file of about 3MB, and both
save and restore take a second or two. In Python, pickle.dump()
produces a similar-size file but takes 20 seconds, and pickle.load()
takes 45 seconds, which is actually LONGER than the time required to
scan the directory trees.

Is there anything I can do to speed up pickle.load() to get
performance comparable to Perl's Storable?

Message has been deleted

John Machin

unread,
Mar 20, 2009, 6:40:25 PM3/20/09
to

Have you read this:
http://www.python.org/doc/2.6/library/pickle.html
?
Have you considered using cPickle instead of pickle?
Have you considered using *ickle.dump(..., protocol=-1) ?

Jim Garrison

unread,
Mar 20, 2009, 8:26:22 PM3/20/09
to

I'm using Python 3 on Windows (Server 2003). According to the docs

"The pickle module has an transparent optimizer (_pickle) written
in C. It is used whenever available. Otherwise the pure Python
implementation is used."

How can I tell if _pickle is being used?

Jim Garrison

unread,
Mar 20, 2009, 8:39:41 PM3/20/09
to
Jim Garrison wrote:
> John Machin wrote:
[snip]

>> Have you considered using cPickle instead of pickle?
>> Have you considered using *ickle.dump(..., protocol=-1) ?
>
> I'm using Python 3 on Windows (Server 2003). According to the docs
>
> "The pickle module has an transparent optimizer (_pickle) written
> in C. It is used whenever available. Otherwise the pure Python
> implementation is used."
>
> How can I tell if _pickle is being used?

Answered my own question

>>> import _pickle
>>> dir (_pickle)
['PickleError', 'Pickler', 'PicklingError', 'Unpickler',
'UnpicklingError', '__doc__', '__name__', '__package__']
>>> dir(_pickle.Pickler)
['__class__', '__delattr__', '__doc__', '__eq__', '__format__',
'__ge__', '__getattribute__', '__gt__', '__hash__', '__init__',
'__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
'bin', 'clear_memo', 'dump', 'fast', 'memo', 'persistent_id']
>>> dir(_pickle.Pickler)
['__class__', '__delattr__', '__doc__', '__eq__', '__format__',
'__ge__', '__getattribute__', '__gt__', '__hash__', '__init__',
'__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
'bin', 'clear_memo', 'dump', 'fast', 'memo', 'persistent_id']

_pickle seems to be there. Also, if I step into the load
call (pydev under Eclipse) it steps into pickle.load() but
won't step into the call to the Unpickler constructor. I
assume that means it's calling out to the C implementation.

Carl Banks

unread,
Mar 20, 2009, 10:30:07 PM3/20/09
to

The slow performance is most likely due to the poor performance of
Python 3's IO, which is caused by (among other things) bad buffering
strategy. It's a Python 3 growing pain, and is being rewritten.
Python 3.1 should be must faster but it's not been released yet.

As a workaround, mmap the file instead. For example (untested):


f = open('dirlisting.dat','rb')
try:
f.seek(0,2)
size = f.tell()
f.seek(0,0)
m = mmap.mmap(f.fileno(),size,access=mmap.ACCESS_READ)
try:
dir_listing = pickle.loads(m)
finally:
m.close()
finally:
f.close()


Pickling the output left as an exercise.


Carl Banks

bearoph...@lycos.com

unread,
Mar 20, 2009, 10:41:21 PM3/20/09
to
Carl Banks:

> The slow performance is most likely due to the poor performance of
> Python 3's IO, which is caused by [...]

My suggestion for the Original Poster is just to try using Python 2.x,
if possible :-)

Bye,
bearophile

Terry Reedy

unread,
Mar 21, 2009, 12:21:20 AM3/21/09
to pytho...@python.org
Carl Banks wrote:
>
> The slow performance is most likely due to the poor performance of
> Python 3's IO, which is caused by (among other things) bad buffering
> strategy. It's a Python 3 growing pain, and is being rewritten.
> Python 3.1 should be must faster but it's not been released yet.

3.1a1 is out and I believe it has the io improvements.

Benjamin Peterson

unread,
Mar 21, 2009, 1:08:49 PM3/21/09
to pytho...@python.org
Terry Reedy <tjreedy <at> udel.edu> writes:
>
> 3.1a1 is out and I believe it has the io improvements.

Massive ones, too. It'd be interesting to see your results on the alpha.


Jim Garrison

unread,
Mar 23, 2009, 11:57:54 AM3/23/09
to

On 3.1a1 the unpickle step takes 2.4 seconds, an 1875% improvement.

Thanks.

Jean-Paul Calderone

unread,
Mar 23, 2009, 12:40:17 PM3/23/09
to pytho...@python.org

Surely you mean a 94.7% improvement?

Jean-Paul

Steve Holden

unread,
Mar 23, 2009, 1:31:44 PM3/23/09
to pytho...@python.org
Jean-Paul Calderone wrote:
> On Mon, 23 Mar 2009 10:57:54 -0500, Jim Garrison <j...@acm.org> wrote:
> Surely you mean a 94.7% improvement?
>
Well, since it's now running almost twenty times faster, the speed has
increased by 1875%. Not sure what the mathematics of improvement are ...

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/
Want to know? Come to PyCon - soon! http://us.pycon.org/

Jim Garrison

unread,
Mar 23, 2009, 5:21:05 PM3/23/09
to
Steve Holden wrote:
> Jean-Paul Calderone wrote:
>> On Mon, 23 Mar 2009 10:57:54 -0500, Jim Garrison <j...@acm.org> wrote:
>>> Benjamin Peterson wrote:
>>>> Terry Reedy <tjreedy <at> udel.edu> writes:
>>>>> 3.1a1 is out and I believe it has the io improvements.
>>>> Massive ones, too. It'd be interesting to see your results on the alpha.
>>> On 3.1a1 the unpickle step takes 2.4 seconds, an 1875% improvement.
>> Surely you mean a 94.7% improvement?
>>
> Well, since it's now running almost twenty times faster, the speed has
> increased by 1875%. Not sure what the mathematics of improvement are ...
>
> regards
> Steve
The arithmetic depends on whether you're looking at time or
velocity, which are inverses of each other.

If you double your velocity (100% increase) the time required goes
down by 50%. A 1000% increase in velocity results in a 90% decrease
in time... etc. I guess I equate "performance" to velocity.

0 new messages