Unpickling problem (backwards incompatibility) in Python 3

611 views
Skip to first unread message

Simon King

unread,
Sep 1, 2019, 8:25:52 PM9/1/19
to sage-...@googlegroups.com
Hi!

I have a pickle that I can unpickle in Sage-with-Python-2, but it fails
to unpickle in Sage-with-Python-3, because of some UnicodeError.
Strangely, when I read the pickle as a string
open('path/to/file.sobj').read()
then it fails with a (different?) UnicodeError in Python-3.

The details (namely the pickle and the steps to read it in Python-2) can
be found in https://trac.sagemath.org/ticket/28414#comment:36
Can some expert please tell me how to unpickle the pickle in Python-3?

Best regards,
Simon

Simon King

unread,
Sep 2, 2019, 11:55:45 AM9/2/19
to sage-...@googlegroups.com
On 2019-09-02, Simon King <simon...@uni-jena.de> wrote:
> Strangely, when I read the pickle as a string
> open('path/to/file.sobj').read()
> then it fails with a (different?) UnicodeError in Python-3.

That's not the problem. However, I am not able to track down the problem
myself. So, I'd appreciate help.

I think I can show that the unpickling problem is NOT related with an
optional package. I believe being unable to read a pickle is a blocker
for the transition to Python-3. Therefore, I opened #28444.

Best regards,
Simon

Vincent Delecroix

unread,
Sep 2, 2019, 12:06:12 PM9/2/19
to sage-...@googlegroups.com
Hello Simon,

As discussed in

https://groups.google.com/forum/#!topic/sage-devel/JuKzzgxDlmA

Pickling/unpickling is not supposed to work accross Sage versions
(including the Python version you use).

Is this what you are trying to make work?

Vincent

Simon King

unread,
Sep 2, 2019, 12:22:24 PM9/2/19
to sage-...@googlegroups.com
On 2019-09-02, Vincent Delecroix <20100.d...@gmail.com> wrote:
> As discussed in
>
> https://groups.google.com/forum/#!topic/sage-devel/JuKzzgxDlmA
>
> Pickling/unpickling is not supposed to work accross Sage versions
> (including the Python version you use).
>
> Is this what you are trying to make work?

The pickle that I posted at #28444 is of course not more than a small
example and is a matter of seconds to reconstruct in Python-3. However,
losing data that are the result of several months of computation would
be hard to swallow.

Also, if I understand the error message "UnicodeDecodeError: 'ascii'
codec can't decode byte 0x80 in position 0: ordinal not in range(128)"
correctly, it is an error in unpickling a string. And this, I believe,
can and must be totally absolutely backwards compatible.

Cheers,
Simon

Simon King

unread,
Sep 2, 2019, 4:43:38 PM9/2/19
to sage-devel
On Monday, September 2, 2019 at 6:22:24 PM UTC+2, Simon King wrote:
Also, if I understand the error message "UnicodeDecodeError: 'ascii'
codec can't decode byte 0x80 in position 0: ordinal not in range(128)"
correctly, it is an error in unpickling a string. And this, I believe,
can and must be totally absolutely backwards compatible.

The problem apparently boils down to the following:
- Pickle the string  '\x80\x1f' in Python-2
- Try to load that pickle in Python-3 (it fails).

Bummer!

AFAIK, what I need to unpickle my old data is a way to tell Python-3 that it shall (at least temporarily) unpickle all strings as bytes, in the sense that the pickled string  '\x80\x1f' should be understood as b'\x80\x1f'. Is there a way?

Best regards,
SImon

Julien Puydt

unread,
Sep 3, 2019, 1:40:14 AM9/3/19
to sage-...@googlegroups.com
Hi,
Le 02/09/2019 à 22:43, Simon King a écrit :
> AFAIK, what I need to unpickle my old data is a way to tell Python-3
> that it shall (at least temporarily) unpickle all strings as bytes, in
> the sense that the pickled string  '\x80\x1f' should be understood as
> b'\x80\x1f'. Is there a way?

I tried to run the following two small scripts, and it didn't complain:


#!/usr/bin/python2

import pickle

s='\x80\x1f'
with open('/tmp/data.pickle', 'w') as handle:
pickle.dump(s, handle, protocol=2)





#!/usr/bin/python3

import pickle

with open('/tmp/data.pickle', 'rb') as handle:
s = pickle.load(handle, encoding='bytes')



I hope that helps,

JP

Simon King

unread,
Sep 3, 2019, 2:32:45 AM9/3/19
to sage-...@googlegroups.com
Hi!

On 2019-09-02, Simon King <simon...@uni-jena.de> wrote:
> The problem apparently boils down to the following:
> - Pickle the string '\x80\x1f' in Python-2
> - Try to load that pickle in Python-3 (it fails).
>
> Bummer!

Nils Bruin pointed me to
https://stackoverflow.com/questions/28218466/unpickling-a-python-2-object-with-python-3

The proposed solution there is to use pickle.load(<file>,
encoding='bytes'). The encoding keyword only exist in Python-3.
If I undestand correctly (and some basic tests confirm the following
statements):
- Python-2 can unpickle both pickles created with Python-2 and pickles created
with Python-3 can be read with Python-3
- Without `encoding='bytes'`, Python-3 can in general not unpickle any
pickle created with Python-2 that contains strings.
- With `encoding='bytes'`, Python-3 *can* unpickle a pickle created with
Python-2 containing strings, and it doesn't interfere with unpickling
a pickle created with Python-3.

So, I suggest that at #28444 we change sage.misc.persist so that it uses
pickle.load() with the `encoding` keyword in Python-3 and withou that
keyword in Python-2. Do the Python-3 experts agree that that approach
makes sense? Given that a Python-2 string corresponds to a Python-3
bytes, I think it does, but I am not an expert.

Best regards,
Simon

Simon King

unread,
Sep 3, 2019, 2:54:50 AM9/3/19
to sage-...@googlegroups.com
PS:

On 2019-09-03, Simon King <simon...@uni-jena.de> wrote:
> - Without `encoding='bytes'`, Python-3 can in general not unpickle any
> pickle created with Python-2 that contains strings.

Stated differently:
Ostensibly, Python-2 str corresponds to Python-3 bytes, and Python-2
unicode corresponds to Python-3 str. But Python-3 tries to unpickle the
pickle of a Python-2 str as a Python-3 str, NOT as a Python-3 bytes. And
that's an annoying oddity (or indeed a bug?) in Python, IMHO.

Cheers,
Simon

Reply all
Reply to author
Forward
0 new messages