unable to open some sobj-files computed on earlier versions of sage with sage9.1

250 views
Skip to first unread message

Udo Baumgartner

unread,
Jul 24, 2020, 6:03:07 PM7/24/20
to sage-devel

I have lots of precomputed data computed by sage8.9 and before that I rely on. The reason I save that data is that it took a lot of computation time to obtain it. The data is saved as .sobj-files. 

Suddenly I get "invalid pickle data"-errors when trying to load some of that data with the new version of sage. I am therefore unable to do computations on top of the precomputed data. 

I strongly suspect that this issue is due to basing sage9.x on Python3. 

How do I convert the data so that I can use it in the new version of sage?

I'm using sage-9.1-OSX_10.15.4-x86_64.app now on a late 2013 MacBookPro running MacOSCatalina 10.15.6. A toy example of a sobj-file that now produces the error is attached. The error messages are reproduced below.


Help is greatly appreciated.


Now, said error messages:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-1-90aa9dc4a056> in <module>()
----> 1 DiGraph = load('A3VuDiGraph')

/Applications/SageMath-9.1.app/Contents/Resources/sage/local/lib/python3.7/site-packages/sage/misc/persist.pyx in sage.misc.persist.load (build/cythonized/sage/misc/persist.c:2900)()
    156 
    157     ## Load file by absolute filename
--> 158     with open(filename, 'rb') as fobj:
    159         X = loads(fobj.read(), compress=compress, **kwargs)
    160     try:

/Applications/SageMath-9.1.app/Contents/Resources/sage/local/lib/python3.7/site-packages/sage/misc/persist.pyx in sage.misc.persist.load (build/cythonized/sage/misc/persist.c:2850)()
    157     ## Load file by absolute filename
    158     with open(filename, 'rb') as fobj:
--> 159         X = loads(fobj.read(), compress=compress, **kwargs)
    160     try:
    161         X._default_filename = os.path.abspath(filename)

/Applications/SageMath-9.1.app/Contents/Resources/sage/local/lib/python3.7/site-packages/sage/misc/persist.pyx in sage.misc.persist.loads (build/cythonized/sage/misc/persist.c:7424)()
   1042 
   1043     unpickler = SageUnpickler(io.BytesIO(s), **kwargs)
-> 1044     return unpickler.load()
   1045 
   1046 

/Applications/SageMath-9.1.app/Contents/Resources/sage/local/lib/python3.7/site-packages/sage/matrix/matrix0.pyx in sage.matrix.matrix0.unpickle (build/cythonized/sage/matrix/matrix0.c:39715)()
   5874     A._cache = cache
   5875     if version >= 0:
-> 5876         A._unpickle(data, version)
   5877     else:
   5878         A._unpickle_generic(data, version)

/Applications/SageMath-9.1.app/Contents/Resources/sage/local/lib/python3.7/site-packages/sage/matrix/matrix_integer_dense.pyx in sage.matrix.matrix_integer_dense.Matrix_integer_dense._unpickle (build/cythonized/sage/matrix/matrix_integer_dense.c:8217)()
    540                 self._unpickle_matrix_2x2_version0(data)
    541             else:
--> 542                 raise RuntimeError("invalid pickle data")
    543         else:
    544             raise RuntimeError("unknown matrix version (=%s)"%version)

RuntimeError: invalid pickle data


A3VuDiGraph.sobj

Nils Bruin

unread,
Jul 24, 2020, 9:26:13 PM7/24/20
to sage-devel
On Friday, July 24, 2020 at 3:03:07 PM UTC-7, Udo Baumgartner wrote:

I have lots of precomputed data computed by sage8.9 and before that I rely on. The reason I save that data is that it took a lot of computation time to obtain it. The data is saved as .sobj-files. 

Suddenly I get "invalid pickle data"-errors when trying to load some of that data with the new version of sage. I am therefore unable to do computations on top of the precomputed data. 

I strongly suspect that this issue is due to basing sage9.x on Python3. 

How do I convert the data so that I can use it in the new version of sage?

One way would be:
 - load it in a sage version where you still can
 - write the data in a text format (something json-like?)
 - use that to reconstruct the data in sage 9.x

from that point you could use pickle again or, being burnt by it once, you might decide to stick with a more text-based format.

An alternative is to use the pickle toolchain that exists on python (and to which sage adds quite a bit!) to analyse and debug the unpickling on sage 9.x. Pickle is a well-defined format with good specifications and override procedures to make "outdated" pickles readable. It would be a fairly custom job in principle, but you might find it's just a few things that have changed. Or someone else has already discovered that and can describe a few steps that likely make your pickle readable again (although I'm not aware of any work in that direction -- and here is a good place to post)

In your case the problem seems to be with a matrix with integer entries. That's a data structure for which json could work really well, so if you can arrange access to an older sage version, the json route may be fairly easy. Otherwise, you'd have to intercept the call pickle makes to the unpickler routine the error is generated in and make sure the data is parsed in the old way. That's probably doable, but will require a significant time investment. It might be helpful to look in the git history of the routine sage.matrix.matrix_integer_dense.Matrix_integer_dense._unpickle, since may well show what changed (and what you need to undo to make it work!)
 

Matthias Koeppe

unread,
Jul 24, 2020, 9:55:43 PM7/24/20
to sage-devel
Sage version 9.1 still supports python2. Compile it from source using "./configure --with-python=2" and check whether the resulting sage can read your file.


On Friday, July 24, 2020 at 3:03:07 PM UTC-7, Udo Baumgartner wrote:

Sebastian Oehms

unread,
Jul 28, 2020, 6:11:28 AM7/28/20
to sage-devel
Related tickets to this issue are #28302 and #28444!

Best,
Sebastian

Udo Baumgartner

unread,
Jul 29, 2020, 3:17:19 AM7/29/20
to sage-devel
Many thanks for your replies. I understand the problem now much better and obviously have to figure out a way to save the information in a version-independent way asap and meanwhile keep using sage8.9. It doesn't sound easy to do.

Those pointers to tickets #28302 und #28444 were particularly helpful. 

Aside: It is disappointing that no secure method to store computed values is available yet. One of my computations took half a year to complete and on top of that loading some of the data used in those computations is also not possible with sage9.1.

Samuel Lelievre

unread,
Jul 29, 2020, 4:33:30 AM7/29/20
to sage-devel
Note that Sage 9.1 for Python 2 can be built from source.

Download the source tarball, extract it where you want
your Sage installed. Change to the obtained directory.

Set number of jobs to run in parallel (-j4 for 4 jobs):
```
$ MAKE='make -j4 -s V=0'
```

Build for Python 2:
```
$ make configure
$ ./configure --with-python=2
$ make
```

Dima Pasechnik

unread,
Jul 29, 2020, 5:18:41 AM7/29/20
to sage-devel
On Wed, Jul 29, 2020 at 9:33 AM Samuel Lelievre
<samuel....@gmail.com> wrote:
>
> Note that Sage 9.1 for Python 2 can be built from source.
>
however, there is no guarantee that the pickles from version 8.9
remain readable.

Probably the manual should include a warning in this regard. Sagemath
does not guarantee indefinite backward compatibility - a documented
feature/type might get deprecated and subsequently removed.



> Download the source tarball, extract it where you want
> your Sage installed. Change to the obtained directory.
>
> Set number of jobs to run in parallel (-j4 for 4 jobs):
> ```
> $ MAKE='make -j4 -s V=0'
> ```
>
> Build for Python 2:
> ```
> $ make configure
> $ ./configure --with-python=2
> $ make
> ```
>
> --
> You received this message because you are subscribed to the Google Groups "sage-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sage-devel/36ac2226-3961-409a-8f53-6765f1d8d346o%40googlegroups.com.

Sebastian Oehms

unread,
Jul 29, 2020, 11:53:10 AM7/29/20
to sage-devel
have to figure out a way to save the information in a version-independent way asap and meanwhile keep using sage8.9. It doesn't sound easy to do.

Let me try to help: As I understood your test case consists of matrices over the integers, right? If you convert it into a pure Python dictionary over Python integers  before you save it to an sobj-file then you will have a much more stable format. From such a dictionary you can recover your matrix easily by Sage construction functionality:

sage: m = matrix(ZZ, [[1, 2], [3, 4]])
sage
: mpy = {tuple(k):int(v) for k, v in m.dict().items()}
sage
: mpy
{(0, 0): 1, (0, 1): 2, (1, 0): 3, (1, 1): 4}
sage
: save(mpy, 'mpy.sobj')
sage
: mpy_back = load('mpy.sobj')
sage
: m == matrix(mpy_back)
True



If you still feel unsafe with that you shoud choose a readable format, for example json as suggested before. I prefer yaml since it can be easily used for dictionaries. But unfortunately it is not included in Sage per default. So you have to install it first:


sage -pip install ruamel.yaml




Now doing the same as above with YAML:


sage: from ruamel.yaml import YAML
sage
: yaml = YAML()
sage
: fp = open('mpy.yaml', "w")
sage
: yaml.dump(mpy, fp)
sage
: fp.close()
sage
: fp = open('mpy.yaml', "r")
sage
: mpy_back_yaml = yaml.load(fp)
sage
: mpy_back_yaml
ordereddict
([((0, 0), 1), ((0, 1), 2), ((1, 0), 3), ((1, 1), 4)])
sage
: m == matrix(mpy_back_yaml)
True




The file looks like this:


cat mpy.yaml
? - 0
 
- 0
: 1
? - 0
 
- 1
: 2
? - 1
 
- 0
: 3
? - 1
 
- 1
: 4



In the first ticket I've linked to this thread I've uploaded a ipynb-file to demonstrate more advanced applications of YAML with Sage.

TB

unread,
Jul 29, 2020, 3:02:20 PM7/29/20
to sage-...@googlegroups.com
One disadvantage of pickle is safety, in the computer security sense. See the warning at [1] about it, that it is possible to execute arbitrary code during unpickling. This of course also happens to be useful in Sage [2].

If it happens that your data can be (easily) represented as iterated Python containers (tuples, lists, dicts or sets) containing strings, integers or booleans, then a textual format can be a good fit. JSON, YAML or even ast.literal_eval [3] sound appropriate. More complex objects are indeed more complex to handle. At least for polynomials the parser at [4] might help.

JSON or YAML are interoperable, which is important if you would like to use the data in another system. It does mean that doing something clever with the data might still require porting code from Sage that handles the data.

Regards,
TB

--
You received this message because you are subscribed to the Google Groups "sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+...@googlegroups.com.

Udo Baumgartner

unread,
Aug 3, 2020, 1:43:01 PM8/3/20
to sage-devel
Many thanks to all of you for your helpful suggestions. The structures that cause the problems in all cases appear indeed to be integer matrices. 

However, much of the saved data is composite, and I will probably use a slightly different approach to conversion for each individual case.

The most complicated stored objects are directed graphs. For those I currently plan to relabel them, save the labelling-map separately and reattach the old labels in the new version of sage.

If this sort of incompatibility of unpickling reoccurs in the future complex objects will be a pain to save, because the most abstract structures will be the most painful to restore again or save in an elementary form.

In case you are curious, for the submitted toy-example the command 

.relabel(return_map=True) 

returns the labelling map for the encoded directed graph:

{Free module of degree 3 and rank 1 over Integer Ring Echelon basis matrix: [0 1 0]: 1, Free module of degree 3 and rank 2 over Integer Ring Echelon basis matrix: [ 1 0 -1] [ 0 1 -1]: 0, Free module of degree 3 and rank 1 over Integer Ring Echelon basis matrix: [ 1 -1 1]: 2, Free module of degree 3 and rank 1 over Integer Ring Echelon basis matrix: [1 0 0]: 4, Free module of degree 3 and rank 2 over Integer Ring Echelon basis matrix: [1 0 0] [0 0 1]: 3, Free module of degree 3 and rank 2 over Integer Ring Echelon basis matrix: [1 0 0] [0 1 0]: 12, Free module of degree 3 and rank 1 over Integer Ring Echelon basis matrix: [ 1 0 -1]: 8, Free module of degree 3 and rank 1 over Integer Ring Echelon basis matrix: [ 0 1 -1]: 6, Free module of degree 3 and rank 1 over Integer Ring Echelon basis matrix: [ 1 -1 0]: 9, Free module of degree 3 and rank 2 over Integer Ring Echelon basis matrix: [ 1 -1 0] [ 0 0 1]: 7, Free module of degree 3 and rank 2 over Integer Ring Echelon basis matrix: [ 1 0 0] [ 0 1 -1]: 10, Free module of degree 3 and rank 1 over Integer Ring Echelon basis matrix: [0 0 1]: 11, Free module of degree 3 and rank 2 over Integer Ring Echelon basis matrix: [0 1 0] [0 0 1]: 5}


Nils Bruin

unread,
Aug 3, 2020, 3:45:46 PM8/3/20
to sage-devel
Looking at the offending code (in matrix_integer_dense.pyx, line 535:

    def _unpickle(self, data, int version):
        if version == 0:
            if isinstance(data, bytes):
                self._unpickle_version0(data)
            elif isinstance(data, list):
                self._unpickle_matrix_2x2_version0(data)
            else:

                raise RuntimeError("invalid pickle data")
        else:

            raise RuntimeError("unknown matrix version (=%s)"%version)

it's the classing Py2/Py3 unpickle incompatibility: there's a shift between str/bytes/unicode between the 3 that rather fundamentally breaks pickling of them. It can be fixed fairly easily. We're probably unpickling what used to be a python 2 (str==bytes) object as a python 3 latin1-encoded string, because most py2 str/bytes objects are supposed to be strings (and latin1 has the nice property that it preserves bit patterns).  So in the code above, we just need to check

I think we want change this to something like:

         if version == 0:
+            if isinstance(data, str): #old Py2 pickle: old "bytes" object reaches us as a latin1-encoded string
+                data = data.encode('latin1')
             
if isinstance(data, bytes):


Making this change is really worthwhile: these are small problems that are hard to find without a "pickle jar" without good coverage. Each of them is probably fairly easy to fix. These problems come up particularly when data is encoded as "bytes". That happens for optimized picklers. There aren't that many of them, so the number of these changes required is probably limited, and each of them can have a huge positive knock-on effect as you can see here.

(If I have important, expensive computational data I want to save for posterity, I would not go for Pickle anyway: you want something that's platform-independent. Sage will die at some point in the future too, and Pickle is not a format that is that easy to maintain: It's not archive quality in the librarian sense).

Paul Leopardi

unread,
Aug 20, 2020, 9:37:36 PM8/20/20
to sage-devel

Paul Leopardi

unread,
Aug 23, 2020, 3:42:48 AM8/23/20
to sage-devel
I applied the change suggested by Nils Bruin (above) to Sage 9.1 installed from source on Kubuntu 20.04, and it works for me.

E. Madison Bray

unread,
Aug 31, 2020, 9:57:10 AM8/31/20
to sage-devel
On Wed, Jul 29, 2020 at 9:02 PM TB <math...@gmail.com> wrote:
>
> One disadvantage of pickle is safety, in the computer security sense. See the warning at [1] about it, that it is possible to execute arbitrary code during unpickling. This of course also happens to be useful in Sage [2].
>
> If it happens that your data can be (easily) represented as iterated Python containers (tuples, lists, dicts or sets) containing strings, integers or booleans, then a textual format can be a good fit. JSON, YAML or even ast.literal_eval [3] sound appropriate. More complex objects are indeed more complex to handle. At least for polynomials the parser at [4] might help.
>
> JSON or YAML are interoperable, which is important if you would like to use the data in another system. It does mean that doing something clever with the data might still require porting code from Sage that handles the data.

I have suggested many times because of this that Sage needs a standard
interfacing for converting Sage Objects to JSON-compatible data
structures (which would have to be defined, preferably along with a
schema, for any and all types that want to support this). Possibly
with some optional support for BSON for more efficient binary
representation of large objects.

I would be happy to help with such an effort from the technical side,
but I'm not the best person to lead it since it cuts to the core of
the subject of mathematical data interchange, a subject on which there
are experts, and I am not one of them.

But I really do wish we would de-emphasize the use of pickle for
saving Sage objects. It's perfectly good for caching certain
computations and storing large computational results to disk for
near-future use, as well as for distribution over clusters and the
like. But we need to make it very clear to users that this is *not*
an archival data format, while at the same time offering something
better that is.

E. Madison Bray

unread,
Aug 31, 2020, 9:58:39 AM8/31/20
to sage-devel
P.S. I think it would be extremely cool if Sage adopted the ASDF file
format for serializing Sage Objects, as it supports a mix of
structured metadata and binary data, which would be useful for a broad
range of purposes. But I'm biased ;)
https://asdf-standard.readthedocs.io/en/1.5.0/

thierry

unread,
Sep 1, 2020, 12:10:45 PM9/1/20
to sage-...@googlegroups.com
Hi,

perhaps a poor man's approach that could be considered in the strategy
is to extend the ability of Sage to provide to the user a way to
regenerate from the interpreter the objects that it constructed, with
the sage_input function:

sage: M = random_matrix(ZZ,3,3)
sage: sage_input(M)
matrix(ZZ, [[1, -3, 1], [-3, 0, 1], [4, -3, -1]])

Such a non-binary "format" (a sequence of Sage commands) seems to be
more stable in the longer term, and if something got wrong due to
backward incomatibilities, one can still deal with it by hand.

Unfortunately, this method is not complete:

sage: ZZ['x'].random_element()
4*x + 1
sage: sage_input(_)
R.<x> = ZZ[]
4*x + 1

but

sage: GF(9)['x'].random_element()
2*z2 + 2
sage: sage_input(_)
ValueError: Can't convert 2*z2 + 2 to sage_input form

Ciao,
Thierry
> --
> You received this message because you are subscribed to the Google Groups "sage-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sage-devel/CAOTD34Z2BqU7t2iBqH2r%3DWGccP8dmuKjSVbPW_WaZjRitS5suQ%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages