229 views

Skip to first unread message

Jul 24, 2020, 6:03:07 PM7/24/20

to sage-devel

I have lots of precomputed data computed by sage8.9 and before that I rely on. The reason I save that data is that it took a lot of computation time to obtain it. The data is saved as .sobj-files.

Suddenly I get "invalid pickle data"-errors when trying to load some of that data with the new version of sage. I am therefore unable to do computations on top of the precomputed data.

I strongly suspect that this issue is due to basing sage9.x on Python3.

How do I convert the data so that I can use it in the new version of sage?

I'm using **sage-9.1-OSX_10.15.4-x86_64.app** now on a late 2013 MacBookPro running MacOSCatalina 10.15.6. A toy example of a sobj-file that now produces the error is attached. The error messages are reproduced below.

Help is greatly appreciated.

Now, said error messages:

--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) <ipython-input-1-90aa9dc4a056> in <module>() ----> 1 DiGraph = load('A3VuDiGraph') /Applications/SageMath-9.1.app/Contents/Resources/sage/local/lib/python3.7/site-packages/sage/misc/persist.pyx in sage.misc.persist.load (build/cythonized/sage/misc/persist.c:2900)() 156 157 ## Load file by absolute filename --> 158 with open(filename, 'rb') as fobj: 159 X = loads(fobj.read(), compress=compress, **kwargs) 160 try: /Applications/SageMath-9.1.app/Contents/Resources/sage/local/lib/python3.7/site-packages/sage/misc/persist.pyx in sage.misc.persist.load (build/cythonized/sage/misc/persist.c:2850)() 157 ## Load file by absolute filename 158 with open(filename, 'rb') as fobj: --> 159 X = loads(fobj.read(), compress=compress, **kwargs) 160 try: 161 X._default_filename = os.path.abspath(filename) /Applications/SageMath-9.1.app/Contents/Resources/sage/local/lib/python3.7/site-packages/sage/misc/persist.pyx in sage.misc.persist.loads (build/cythonized/sage/misc/persist.c:7424)() 1042 1043 unpickler = SageUnpickler(io.BytesIO(s), **kwargs) -> 1044 return unpickler.load() 1045 1046 /Applications/SageMath-9.1.app/Contents/Resources/sage/local/lib/python3.7/site-packages/sage/matrix/matrix0.pyx in sage.matrix.matrix0.unpickle (build/cythonized/sage/matrix/matrix0.c:39715)() 5874 A._cache = cache 5875 if version >= 0: -> 5876 A._unpickle(data, version) 5877 else: 5878 A._unpickle_generic(data, version) /Applications/SageMath-9.1.app/Contents/Resources/sage/local/lib/python3.7/site-packages/sage/matrix/matrix_integer_dense.pyx in sage.matrix.matrix_integer_dense.Matrix_integer_dense._unpickle (build/cythonized/sage/matrix/matrix_integer_dense.c:8217)() 540 self._unpickle_matrix_2x2_version0(data) 541 else: --> 542 raise RuntimeError("invalid pickle data") 543 else: 544 raise RuntimeError("unknown matrix version (=%s)"%version) RuntimeError: invalid pickle data

Jul 24, 2020, 9:26:13 PM7/24/20

to sage-devel

On Friday, July 24, 2020 at 3:03:07 PM UTC-7, Udo Baumgartner wrote:

I have lots of precomputed data computed by sage8.9 and before that I rely on. The reason I save that data is that it took a lot of computation time to obtain it. The data is saved as .sobj-files.

Suddenly I get "invalid pickle data"-errors when trying to load some of that data with the new version of sage. I am therefore unable to do computations on top of the precomputed data.

I strongly suspect that this issue is due to basing sage9.x on Python3.

How do I convert the data so that I can use it in the new version of sage?

One way would be:

- load it in a sage version where you still can

- write the data in a text format (something json-like?)

- use that to reconstruct the data in sage 9.x

from that point you could use pickle again or, being burnt by it once, you might decide to stick with a more text-based format.

An alternative is to use the pickle toolchain that exists on python (and to which sage adds quite a bit!) to analyse and debug the unpickling on sage 9.x. Pickle is a well-defined format with good specifications and override procedures to make "outdated" pickles readable. It would be a fairly custom job in principle, but you might find it's just a few things that have changed. Or someone else has already discovered that and can describe a few steps that likely make your pickle readable again (although I'm not aware of any work in that direction -- and here is a good place to post)

In your case the problem seems to be with a matrix with integer entries. That's a data structure for which json could work really well, so if you can arrange access to an older sage version, the json route may be fairly easy. Otherwise, you'd have to intercept the call pickle makes to the unpickler routine the error is generated in and make sure the data is parsed in the old way. That's probably doable, but will require a significant time investment. It might be helpful to look in the git history of the routine sage.matrix.matrix_integer_dense.Matrix_integer_dense._unpickle, since may well show what changed (and what you need to undo to make it work!)

Jul 24, 2020, 9:55:43 PM7/24/20

to sage-devel

Sage version 9.1 still supports python2. Compile it from source using "./configure --with-python=2" and check whether the resulting sage can read your file.

On Friday, July 24, 2020 at 3:03:07 PM UTC-7, Udo Baumgartner wrote:

Jul 29, 2020, 3:17:19 AM7/29/20

to sage-devel

Many thanks for your replies. I understand the problem now much better and obviously have to figure out a way to save the information in a version-independent way asap and meanwhile keep using sage8.9. It doesn't sound easy to do.

Those pointers to tickets #28302 und #28444 were particularly helpful.

Aside: It is disappointing that no secure method to store computed values is available yet. One of my computations took half a year to complete and on top of that loading some of the data used in those computations is also not possible with sage9.1.

Jul 29, 2020, 4:33:30 AM7/29/20

to sage-devel

Note that Sage 9.1 for Python 2 can be built from source.

Download the source tarball, extract it where you want

your Sage installed. Change to the obtained directory.

Set number of jobs to run in parallel (-j4 for 4 jobs):

```

$ MAKE='make -j4 -s V=0'

```

Build for Python 2:

```

$ make configure

$ ./configure --with-python=2

$ make

```

Jul 29, 2020, 5:18:41 AM7/29/20

to sage-devel

On Wed, Jul 29, 2020 at 9:33 AM Samuel Lelievre

<samuel....@gmail.com> wrote:

>

> Note that Sage 9.1 for Python 2 can be built from source.

>

however, there is no guarantee that the pickles from version 8.9
<samuel....@gmail.com> wrote:

>

> Note that Sage 9.1 for Python 2 can be built from source.

>

remain readable.

Probably the manual should include a warning in this regard. Sagemath

does not guarantee indefinite backward compatibility - a documented

feature/type might get deprecated and subsequently removed.

> Download the source tarball, extract it where you want

> your Sage installed. Change to the obtained directory.

>

> Set number of jobs to run in parallel (-j4 for 4 jobs):

> ```

> $ MAKE='make -j4 -s V=0'

> ```

>

> Build for Python 2:

> ```

> $ make configure

> $ ./configure --with-python=2

> $ make

> ```

>

> You received this message because you are subscribed to the Google Groups "sage-devel" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/sage-devel/36ac2226-3961-409a-8f53-6765f1d8d346o%40googlegroups.com.

Jul 29, 2020, 11:53:10 AM7/29/20

to sage-devel

have to figure out a way to save the information in a version-independent way asap and meanwhile keep using sage8.9. It doesn't sound easy to do.

Let me try to help: As I understood your test case consists of matrices over the integers, right? If you convert it into a pure Python dictionary over Python integers before you save it to an sobj-file then you will have a much more stable format. From such a dictionary you can recover your matrix easily by Sage construction functionality:

If you still feel unsafe with that you shoud choose a readable format, for example json as suggested before. I prefer yaml since it can be easily used for dictionaries. But unfortunately it is not included in Sage per default. So you have to install it first:

Now doing the same as above with YAML:

The file looks like this:

In the first ticket I've linked to this thread I've uploaded a ipynb-file to demonstrate more advanced applications of YAML with Sage.

`sage: m = matrix(ZZ, [[1, 2], [3, 4]])`

sage: mpy = {tuple(k):int(v) for k, v in m.dict().items()}

sage: mpy

{(0, 0): 1, (0, 1): 2, (1, 0): 3, (1, 1): 4}

sage: save(mpy, 'mpy.sobj')

sage: mpy_back = load('mpy.sobj')

sage: m == matrix(mpy_back)

True

If you still feel unsafe with that you shoud choose a readable format, for example json as suggested before. I prefer yaml since it can be easily used for dictionaries. But unfortunately it is not included in Sage per default. So you have to install it first:

`sage -pip install ruamel.yaml`

Now doing the same as above with YAML:

`sage: from ruamel.yaml import YAML`

sage: yaml = YAML()

sage: fp = open('mpy.yaml', "w")

sage: yaml.dump(mpy, fp)

sage: fp.close()

sage: fp = open('mpy.yaml', "r")

sage: mpy_back_yaml = yaml.load(fp)

sage: mpy_back_yaml

ordereddict([((0, 0), 1), ((0, 1), 2), ((1, 0), 3), ((1, 1), 4)])

sage: m == matrix(mpy_back_yaml)

True

The file looks like this:

`cat mpy.yaml`

? - 0

- 0

: 1

? - 0

- 1

: 2

? - 1

- 0

: 3

? - 1

- 1

: 4

In the first ticket I've linked to this thread I've uploaded a ipynb-file to demonstrate more advanced applications of YAML with Sage.

Jul 29, 2020, 3:02:20 PM7/29/20

to sage-...@googlegroups.com

One disadvantage of pickle is safety,
in the computer security sense. See the warning at [1] about it,
that it is possible to execute arbitrary code during unpickling.
This of course also happens to be useful in Sage [2].

If it happens that your data can be
(easily) represented as iterated Python containers (tuples, lists,
dicts or sets) containing strings, integers or booleans, then a
textual format can be a good fit. JSON, YAML or even
ast.literal_eval [3] sound appropriate. More complex objects are
indeed more complex to handle. At least for polynomials the parser
at [4] might help.

JSON or YAML are interoperable, which
is important if you would like to use the data in another system.
It does mean that doing something clever with the data might still
require porting code from Sage that handles the data.

Regards,

TB

--

You received this message because you are subscribed to the Google Groups "sage-devel" group.

To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/sage-devel/eb2330dd-015f-4320-a1c6-eeb2fbba486do%40googlegroups.com.

Aug 3, 2020, 1:43:01 PM8/3/20

to sage-devel

Many thanks to all of you for your helpful suggestions. The structures that cause the problems in all cases appear indeed to be integer matrices.

However, much of the saved data is composite, and I will probably use a slightly different approach to conversion for each individual case.

The most complicated stored objects are directed graphs. For those I currently plan to relabel them, save the labelling-map separately and reattach the old labels in the new version of sage.

If this sort of incompatibility of unpickling reoccurs in the future complex objects will be a pain to save, because the most abstract structures will be the most painful to restore again or save in an elementary form.

In case you are curious, for the submitted toy-example the command

.relabel(return_map=True)

returns the labelling map for the encoded directed graph:

{Free module of degree 3 and rank 1 over Integer Ring
Echelon basis matrix:
[0 1 0]: 1, Free module of degree 3 and rank 2 over Integer Ring
Echelon basis matrix:
[ 1 0 -1]
[ 0 1 -1]: 0, Free module of degree 3 and rank 1 over Integer Ring
Echelon basis matrix:
[ 1 -1 1]: 2, Free module of degree 3 and rank 1 over Integer Ring
Echelon basis matrix:
[1 0 0]: 4, Free module of degree 3 and rank 2 over Integer Ring
Echelon basis matrix:
[1 0 0]
[0 0 1]: 3, Free module of degree 3 and rank 2 over Integer Ring
Echelon basis matrix:
[1 0 0]
[0 1 0]: 12, Free module of degree 3 and rank 1 over Integer Ring
Echelon basis matrix:
[ 1 0 -1]: 8, Free module of degree 3 and rank 1 over Integer Ring
Echelon basis matrix:
[ 0 1 -1]: 6, Free module of degree 3 and rank 1 over Integer Ring
Echelon basis matrix:
[ 1 -1 0]: 9, Free module of degree 3 and rank 2 over Integer Ring
Echelon basis matrix:
[ 1 -1 0]
[ 0 0 1]: 7, Free module of degree 3 and rank 2 over Integer Ring
Echelon basis matrix:
[ 1 0 0]
[ 0 1 -1]: 10, Free module of degree 3 and rank 1 over Integer Ring
Echelon basis matrix:
[0 0 1]: 11, Free module of degree 3 and rank 2 over Integer Ring
Echelon basis matrix:
[0 1 0]
[0 0 1]: 5}

Aug 3, 2020, 3:45:46 PM8/3/20

to sage-devel

Looking at the offending code (in matrix_integer_dense.pyx, line 535:

def _unpickle(self, data, int version):

if version == 0:

if isinstance(data, bytes):

self._unpickle_version0(data)

elif isinstance(data, list):

self._unpickle_matrix_2x2_version0(data)

else:

if version == 0:

if isinstance(data, bytes):

self._unpickle_version0(data)

elif isinstance(data, list):

self._unpickle_matrix_2x2_version0(data)

else:

raise RuntimeError("invalid pickle data")

else:

raise RuntimeError("unknown matrix version (=%s)"%version)

it's the classing Py2/Py3 unpickle incompatibility: there's a shift between str/bytes/unicode between the 3 that rather fundamentally breaks pickling of them. It can be fixed fairly easily. We're probably unpickling what used to be a python 2 (str==bytes) object as a python 3 latin1-encoded string, because most py2 str/bytes objects are supposed to be strings (and latin1 has the nice property that it preserves bit patterns). So in the code above, we just need to check

I think we want change this to something like:

` if version == 0:`

+ if isinstance(data, str): #old Py2 pickle: old "bytes" object reaches us as a latin1-encoded string

+ data = data.encode('latin1')

if isinstance(data, bytes):

Making this change is really worthwhile: these are small problems that are hard to find without a "pickle jar" without good coverage. Each of them is probably fairly easy to fix. These problems come up particularly when data is encoded as "bytes". That happens for optimized picklers. There aren't that many of them, so the number of these changes required is probably limited, and each of them can have a huge positive knock-on effect as you can see here.

(If I have important, expensive computational data I want to save for posterity, I would not go for Pickle anyway: you want something that's platform-independent. Sage will die at some point in the future too, and Pickle is not a format that is *that* easy to maintain: It's not archive quality in the librarian sense).

Aug 20, 2020, 9:37:36 PM8/20/20

to sage-devel

See also https://trac.sagemath.org/ticket/30402 which relates.

Aug 23, 2020, 3:42:48 AM8/23/20

to sage-devel

I applied the change suggested by Nils Bruin (above) to Sage 9.1 installed from source on Kubuntu 20.04, and it works for me.

Aug 31, 2020, 9:57:10 AM8/31/20

to sage-devel

On Wed, Jul 29, 2020 at 9:02 PM TB <math...@gmail.com> wrote:

>

> One disadvantage of pickle is safety, in the computer security sense. See the warning at [1] about it, that it is possible to execute arbitrary code during unpickling. This of course also happens to be useful in Sage [2].

>

> If it happens that your data can be (easily) represented as iterated Python containers (tuples, lists, dicts or sets) containing strings, integers or booleans, then a textual format can be a good fit. JSON, YAML or even ast.literal_eval [3] sound appropriate. More complex objects are indeed more complex to handle. At least for polynomials the parser at [4] might help.

>

> JSON or YAML are interoperable, which is important if you would like to use the data in another system. It does mean that doing something clever with the data might still require porting code from Sage that handles the data.

I have suggested many times because of this that Sage needs a standard
>

> One disadvantage of pickle is safety, in the computer security sense. See the warning at [1] about it, that it is possible to execute arbitrary code during unpickling. This of course also happens to be useful in Sage [2].

>

> If it happens that your data can be (easily) represented as iterated Python containers (tuples, lists, dicts or sets) containing strings, integers or booleans, then a textual format can be a good fit. JSON, YAML or even ast.literal_eval [3] sound appropriate. More complex objects are indeed more complex to handle. At least for polynomials the parser at [4] might help.

>

> JSON or YAML are interoperable, which is important if you would like to use the data in another system. It does mean that doing something clever with the data might still require porting code from Sage that handles the data.

interfacing for converting Sage Objects to JSON-compatible data

structures (which would have to be defined, preferably along with a

schema, for any and all types that want to support this). Possibly

with some optional support for BSON for more efficient binary

representation of large objects.

I would be happy to help with such an effort from the technical side,

but I'm not the best person to lead it since it cuts to the core of

the subject of mathematical data interchange, a subject on which there

are experts, and I am not one of them.

But I really do wish we would de-emphasize the use of pickle for

saving Sage objects. It's perfectly good for caching certain

computations and storing large computational results to disk for

near-future use, as well as for distribution over clusters and the

like. But we need to make it very clear to users that this is *not*

an archival data format, while at the same time offering something

better that is.

Aug 31, 2020, 9:58:39 AM8/31/20

to sage-devel

format for serializing Sage Objects, as it supports a mix of

structured metadata and binary data, which would be useful for a broad

range of purposes. But I'm biased ;)

https://asdf-standard.readthedocs.io/en/1.5.0/

Sep 1, 2020, 12:10:45 PM9/1/20

to sage-...@googlegroups.com

Hi,

perhaps a poor man's approach that could be considered in the strategy

is to extend the ability of Sage to provide to the user a way to

regenerate from the interpreter the objects that it constructed, with

the sage_input function:

sage: M = random_matrix(ZZ,3,3)

sage: sage_input(M)

matrix(ZZ, [[1, -3, 1], [-3, 0, 1], [4, -3, -1]])

Such a non-binary "format" (a sequence of Sage commands) seems to be

more stable in the longer term, and if something got wrong due to

backward incomatibilities, one can still deal with it by hand.

Unfortunately, this method is not complete:

sage: ZZ['x'].random_element()

4*x + 1

sage: sage_input(_)

R.<x> = ZZ[]

4*x + 1

but

sage: GF(9)['x'].random_element()

2*z2 + 2

sage: sage_input(_)

ValueError: Can't convert 2*z2 + 2 to sage_input form

Ciao,

Thierry

perhaps a poor man's approach that could be considered in the strategy

is to extend the ability of Sage to provide to the user a way to

regenerate from the interpreter the objects that it constructed, with

the sage_input function:

sage: M = random_matrix(ZZ,3,3)

sage: sage_input(M)

matrix(ZZ, [[1, -3, 1], [-3, 0, 1], [4, -3, -1]])

Such a non-binary "format" (a sequence of Sage commands) seems to be

more stable in the longer term, and if something got wrong due to

backward incomatibilities, one can still deal with it by hand.

Unfortunately, this method is not complete:

sage: ZZ['x'].random_element()

4*x + 1

sage: sage_input(_)

R.<x> = ZZ[]

4*x + 1

but

sage: GF(9)['x'].random_element()

2*z2 + 2

sage: sage_input(_)

ValueError: Can't convert 2*z2 + 2 to sage_input form

Ciao,

Thierry

> --

> You received this message because you are subscribed to the Google Groups "sage-devel" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/sage-devel/CAOTD34Z2BqU7t2iBqH2r%3DWGccP8dmuKjSVbPW_WaZjRitS5suQ%40mail.gmail.com.
> You received this message because you are subscribed to the Google Groups "sage-devel" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+...@googlegroups.com.

Reply all

Reply to author

Forward

0 new messages