The following behavior is completely unexpected. Is it a bug or a by-
design feature?
Regards,
Victor.
-----------------
from pickle import dumps
from cPickle import dumps as cdumps
print dumps('1001799')==dumps(str(1001799))
print cdumps('1001799')==cdumps(str(1001799))
>>>>output:>>>>
True
False
vicbook:~ victor$ python
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> quit()
vicbook:~ victor$ uname -a
Darwin vicbook 8.9.1 Darwin Kernel Version 8.9.1: Thu Feb 22 20:55:00
PST 2007; root:xnu-792.18.15~1/RELEASE_I386 i386 i386
vicbook:~ victor$
Python 2.4 gives the same behavior on Windows:
ActivePython 2.4.3 Build 12 (ActiveState Software Inc.) based on
Python 2.4.3 (#69, Apr 11 2006, 15:32:42) [MSC v.1310 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from pickle import dumps
>>> from cPickle import dumps as cdumps
>>> print dumps('1001799') == dumps(str(1001799))
True
>>> print cdumps('1001799') == cdumps(str(1001799))
False
>>> print cdumps('1001799')
S'1001799'
p1
.
>>> print cdumps(str(1001799))
S'1001799'
.
>>> print dumps('1001799')
S'1001799'
p0
.
>>> print dumps(str(1001799))
S'1001799'
p0
.
This does seem odd, at the very least.
Chris
--
"A little government and a little luck are necessary in life, but only
a fool trusts either of them." -- P. J. O'Rourke
PythonWin 2.5.1 (r251:54863, May 1 2007, 17:47:05) [MSC v.1310 32 bit
(Intel)] on win32.
Portions Copyright 1994-2006 Mark Hammond - see 'Help/About PythonWin'
for further copyright information.
>>> from pickle import dumps
>>> from cPickle import dumps as cdumps
>>> print dumps('10')
S'10'
p0
.
>>> print dumps(str(10))
S'10'
p0
.
>>> print cdumps('10')
S'10'
p1
.
>>> print cdumps(str(10))
S'10'
.
Does it matter since it is decoded properly?
>>> import pickle
>>> import cPickle
>>> cPickle.dumps('1001799')
"S'1001799'\np1\n."
>>> pickle.dumps('1001799')
"S'1001799'\np0\n."
>>> pickle.loads(pickle.dumps('1001799'))
'1001799'
>>> pickle.loads(cPickle.dumps('1001799'))
'1001799'
>>> cPickle.loads(pickle.dumps('1001799'))
'1001799'
>>> cPickle.loads(cPickle.dumps('1001799'))
'1001799'
>>>
--
Nick Craig-Wood <ni...@craig-wood.com> -- http://www.craig-wood.com/nick
> The following behavior is completely unexpected. Is it a bug or a by-
> design feature?
>
> ...
>
> from pickle import dumps
> from cPickle import dumps as cdumps
>
> print dumps('1001799')==dumps(str(1001799))
> print cdumps('1001799')==cdumps(str(1001799))
It's a feature, the behaviour is described in the documentation:
"""
Since the pickle data format is actually a tiny stack-oriented
programming language, and some freedom is taken in the encodings of
certain objects, it is possible that the two modules produce different
data streams for the same input objects. However it is guaranteed that
they will always be able to read each other's data streams
"""
>>> from pickle import dumps, loads
>>> from cPickle import dumps as cdumps, loads as cloads
>>> s = '1001799'
>>> s == cloads(dumps(s))
True
>>> s == loads(cdumps(s))
True
Regards,
--
. Facundo
.
Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/
The differences between the pn codes comes from this comment in cPickle.c:
/* Make sure memo keys are positive! */
/* XXX Why?
* XXX And does "positive" really mean non-negative?
* XXX pickle.py starts with PUT index 0, not 1. This makes for
* XXX gratuitous differences between the pickling modules.
*/
p++;
The second difference (where sometimes p1 is written and sometimes not)
comes from this block in put:
if (ob->ob_refcnt < 2 || self->fast)
return 0;
Here, a reference to the object is only marshalled if the object has
more than one reference to it. The string literal does; the dynamically
computed string does not. If there is only one reference to an object,
there is no need to store it in the memo, as it can't possibly be
referenced later on.
Regards,
Martin