Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

A bug in cPickle?

1 view
Skip to first unread message

Victor Kryukov

unread,
May 16, 2007, 1:06:20 PM5/16/07
to
Hello list,

The following behavior is completely unexpected. Is it a bug or a by-
design feature?

Regards,
Victor.

-----------------

from pickle import dumps
from cPickle import dumps as cdumps

print dumps('1001799')==dumps(str(1001799))
print cdumps('1001799')==cdumps(str(1001799))

>>>>output:>>>>
True
False

vicbook:~ victor$ python
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> quit()
vicbook:~ victor$ uname -a
Darwin vicbook 8.9.1 Darwin Kernel Version 8.9.1: Thu Feb 22 20:55:00
PST 2007; root:xnu-792.18.15~1/RELEASE_I386 i386 i386
vicbook:~ victor$

Chris Cioffi

unread,
May 16, 2007, 1:33:08 PM5/16/07
to Victor Kryukov, pytho...@python.org
On 16 May 2007 10:06:20 -0700, Victor Kryukov <victor....@gmail.com> wrote:
> Hello list,
>
> The following behavior is completely unexpected. Is it a bug or a by-
> design feature?
>
> Regards,
> Victor.
>
> -----------------
>
> from pickle import dumps
> from cPickle import dumps as cdumps
>
> print dumps('1001799')==dumps(str(1001799))
> print cdumps('1001799')==cdumps(str(1001799))
>
> >>>>output:>>>>
> True
> False
>


Python 2.4 gives the same behavior on Windows:

ActivePython 2.4.3 Build 12 (ActiveState Software Inc.) based on
Python 2.4.3 (#69, Apr 11 2006, 15:32:42) [MSC v.1310 32 bit (Intel)] on win32


Type "help", "copyright", "credits" or "license" for more information.

>>> from pickle import dumps
>>> from cPickle import dumps as cdumps
>>> print dumps('1001799') == dumps(str(1001799))

True
>>> print cdumps('1001799') == cdumps(str(1001799))
False
>>> print cdumps('1001799')
S'1001799'
p1
.
>>> print cdumps(str(1001799))
S'1001799'
.
>>> print dumps('1001799')
S'1001799'
p0
.
>>> print dumps(str(1001799))
S'1001799'
p0
.

This does seem odd, at the very least.

Chris
--
"A little government and a little luck are necessary in life, but only
a fool trusts either of them." -- P. J. O'Rourke

infidel

unread,
May 16, 2007, 6:40:53 PM5/16/07
to
ActivePython 2.5.1.1 as well:

PythonWin 2.5.1 (r251:54863, May 1 2007, 17:47:05) [MSC v.1310 32 bit
(Intel)] on win32.
Portions Copyright 1994-2006 Mark Hammond - see 'Help/About PythonWin'
for further copyright information.


>>> from pickle import dumps
>>> from cPickle import dumps as cdumps

>>> print dumps('10')
S'10'
p0
.
>>> print dumps(str(10))
S'10'
p0
.
>>> print cdumps('10')
S'10'
p1
.
>>> print cdumps(str(10))
S'10'
.

Nick Craig-Wood

unread,
May 17, 2007, 7:30:05 AM5/17/07
to
Victor Kryukov <victor....@gmail.com> wrote:
> The following behavior is completely unexpected. Is it a bug or a by-
> design feature?
>
> from pickle import dumps
> from cPickle import dumps as cdumps
>
> print dumps('1001799')==dumps(str(1001799))
> print cdumps('1001799')==cdumps(str(1001799))
>
> >>>>output:>>>>
> True
> False

Does it matter since it is decoded properly?

>>> import pickle
>>> import cPickle
>>> cPickle.dumps('1001799')
"S'1001799'\np1\n."
>>> pickle.dumps('1001799')
"S'1001799'\np0\n."

>>> pickle.loads(pickle.dumps('1001799'))
'1001799'
>>> pickle.loads(cPickle.dumps('1001799'))
'1001799'
>>> cPickle.loads(pickle.dumps('1001799'))
'1001799'
>>> cPickle.loads(cPickle.dumps('1001799'))
'1001799'
>>>


--
Nick Craig-Wood <ni...@craig-wood.com> -- http://www.craig-wood.com/nick

Facundo Batista

unread,
May 17, 2007, 8:49:34 AM5/17/07
to pytho...@python.org
Victor Kryukov wrote:


> The following behavior is completely unexpected. Is it a bug or a by-
> design feature?
>

> ...


>
> from pickle import dumps
> from cPickle import dumps as cdumps
>
> print dumps('1001799')==dumps(str(1001799))
> print cdumps('1001799')==cdumps(str(1001799))

It's a feature, the behaviour is described in the documentation:

"""
Since the pickle data format is actually a tiny stack-oriented
programming language, and some freedom is taken in the encodings of
certain objects, it is possible that the two modules produce different
data streams for the same input objects. However it is guaranteed that
they will always be able to read each other's data streams
"""

>>> from pickle import dumps, loads
>>> from cPickle import dumps as cdumps, loads as cloads
>>> s = '1001799'
>>> s == cloads(dumps(s))
True
>>> s == loads(cdumps(s))
True

Regards,

--
. Facundo
.
Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/


"Martin v. Löwis"

unread,
May 17, 2007, 12:46:00 PM5/17/07
to
> This does seem odd, at the very least.

The differences between the pn codes comes from this comment in cPickle.c:

/* Make sure memo keys are positive! */
/* XXX Why?
* XXX And does "positive" really mean non-negative?
* XXX pickle.py starts with PUT index 0, not 1. This makes for
* XXX gratuitous differences between the pickling modules.
*/
p++;

The second difference (where sometimes p1 is written and sometimes not)
comes from this block in put:

if (ob->ob_refcnt < 2 || self->fast)
return 0;

Here, a reference to the object is only marshalled if the object has
more than one reference to it. The string literal does; the dynamically
computed string does not. If there is only one reference to an object,
there is no need to store it in the memo, as it can't possibly be
referenced later on.

Regards,
Martin

0 new messages