Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Segmenting a pickle stream without unpickling

8 views

Skip to first unread message

Boris Borcic

unread,

May 19, 2006, 4:44:03 AM5/19/06

Assuming that the items of my_stream share no content (they are
dumps of db cursor fetches), is there a simple way to do the
equivalent of

def pickles(my_stream) :
from cPickle import load,dumps
while 1 :
yield dumps(load(my_stream))

without the overhead associated with unpickling objects
just to pickle them again ?

TIA, Boris Borcic

Paul Rubin

unread,

May 19, 2006, 5:04:55 AM5/19/06

Boris Borcic <bbo...@gmail.com> writes:
> def pickles(my_stream) :
> from cPickle import load,dumps
> while 1 :
> yield dumps(load(my_stream))
>
> without the overhead associated with unpickling objects
> just to pickle them again ?

I think you'd have to write something special. The unpickler parses
as it goes along, and all the dispatch actions build up objects.
You'd have to write a set of actions that just read past the
representations. I think there's no way to know where an object ends
without parsing it, including parsing any objects nested inside it.

Tim Peters

unread,

May 19, 2006, 3:51:28 PM5/19/06

to pytho...@python.org

[Boris Borcic]

cPickle (but not pickle.py) Unpickler objects have a barely documented
noload() method. This "acts like" load(), except doesn't import
modules or construct objects of user-defined classes. The return
value of noload() is undocumented and usually useless. ZODB uses it a
lot ;-)

Anyway, that can go much faster than load(), and works even if the
classes and modules referenced by pickles aren't available in the
unpickling environment. It doesn't return the individual pickle
strings, but they're easy to get at by paying attention to the file
position between noload() calls. For example,

import cPickle as pickle
import os

# Build a pickle file with 4 pickles.

PICKLEFILE = "temp.pck"

class C:
pass

f = open(PICKLEFILE, "wb")
p = pickle.Pickler(f, 1)

p.dump(2)
p.dump([3, 4])
p.dump(C())
p.dump("all done")

f.close()

# Now use noload() to extract the 4 pickle
# strings in that file.

f = open(PICKLEFILE, "rb")
limit = os.path.getsize(PICKLEFILE)
u = pickle.Unpickler(f)
pickles = []
pos = 0
while pos < limit:
u.noload()
thispos = f.tell()
f.seek(pos)
pickles.append(f.read(thispos - pos))
pos = thispos

from pprint import pprint
pprint(pickles)

That prints a list containing the 4 pickle strings:

['K\x02.',
']q\x01(K\x03K\x04e.',
'(c__main__\nC\nq\x02o}q\x03b.',
'U\x08all doneq\x04.']

You could do much the same by calling pickletools.dis() and ignoring
its output, but that's likely to be slower.

0 new messages