save/loads and the pickle jar

155 views
Skip to first unread message

Vincent Delecroix

unread,
Jul 31, 2019, 1:32:01 PM7/31/19
to sage-devel
Dear developers,

It appears that a sage object saved with the "save" command can
sometimes be not loaded again with "load". It is the second time
in a month that people complains that it is not working (because
of Sage version mismatch).

situation number 1: Attached to https://arxiv.org/abs/1802.06923
there are various sobj files (storing matrices over number fields).
These files do not load anymore on more recent sage versions!

(For some strange reason the pickle is half broken because the
pickle contains "implementation='flint'" (which is not available
over number fields).

situation number 2: a colleague has access to 2 Sage versions. On
his laptop and on some server. He hoped that doing "save" on the
server and "load" on his laptop he would be able to transfer
data... Again it does not work (with symbolic expressions).

In the current situation I think we should disable the save/load (or at
least warn) to make it clear that it is not safe across version.

I opened the following for the issue

https://trac.sagemath.org/ticket/28302

On the other hand, I think this feature is almost essential to Sage
(just because two people complained to me about this). The early
developers of Sage had the wonderful idea of the pickle jar. But it
only contains old objects. Let me recall the existence of the long
standing

https://trac.sagemath.org/ticket/10768

Best
Vincent

E. Madison Bray

unread,
Aug 2, 2019, 5:47:04 AM8/2/19
to sage-devel
On Wed, Jul 31, 2019 at 7:32 PM Vincent Delecroix
As I have written before, pickle is not an appropriate format for
long-term stable serialization, and never has been, as it is
inherently tied to the code which produced it (and is highly
Python-specific, at that). If this is an endemic problem for someone,
they should use a different serialization format.

Unfortunately, Sage does not currently have a standard, non-pickle way
of serializing most objects (and indeed the formats one might use to
serialize an object might depend heavily on what type of object it is,
and how it is to be used).

Sage's use of pickle for save()/load() is a mis-feature IMO. One that
made sense at a time, for lack of a better choice. And that's still
useful as a means of saving/restoring some objects between sessions.
But I don't think you can always guarantee it to be stable.

Simon King

unread,
Aug 2, 2019, 6:36:46 AM8/2/19
to sage-...@googlegroups.com
Hi,

On 2019-08-02, E. Madison Bray <erik....@gmail.com> wrote:
> Sage's use of pickle for save()/load() is a mis-feature IMO. One that
> made sense at a time, for lack of a better choice. And that's still
> useful as a means of saving/restoring some objects between sessions.
> But I don't think you can always guarantee it to be stable.

What better format do you suggest?

I am surprised that you see a fundamental problem with Sage's pickling: If
code or even the data structures change, then it is still possible to
fit old data (pickles) into the new data format. Evidently, when you
decide to change the data structures in your algorithms, you *have* to
write a function that translates old data into the new format - but
on the positive side, you *can* write such a function.

Is there a different approach that allows to change your code over time
(evidently that's absolutely needed for a project that isn't dead!)
while still being able to read data that you have stored with your old
code?

Best regards,
Simon

Nils Bruin

unread,
Aug 2, 2019, 11:45:00 AM8/2/19
to sage-devel
On Friday, August 2, 2019 at 2:47:04 AM UTC-7, E. Madison Bray wrote:
As I have written before, pickle is not an appropriate format for
long-term stable serialization, and never has been, as it is
inherently tied to the code which produced it (and is highly
Python-specific, at that).  If this is an endemic problem for someone,
they should use a different serialization format.

If one reads the documentation of "pickle" in python then one does get the idea that it is designed to provide serialization that should also work over longer time stretches. It would take a lot of discipline to do the versioning correctly and one shouldn't start supporting pickling on new data structures too soon (it could lead to horribly expensive legacy support when one changes the way data is stored). It's certainly the kind of serialization format one comes up with for storing complicated data structures such as those in computer algebra.
In principle, the discipline can be helped a lot by having a pickle jar that provides good coverage of (legacy) pickles.
I agree that it's very ambitious to try and support pickling across sage, across time, and with the loose feature management and high diversity in developer interests it may well be unachievable/unmaintainable. But I think this is more a problem with the task, not with the pickle format.

I agree that for data storage that really needs to be able to stand the test of time, one needs to go with something human readable/copy-pastable. It's still open for misinterpretation, but at least one stands a chance of decoding it when the original tools have disappeared. In reality, the important thing is to properly document how the data was generated in the first place.

E. Madison Bray

unread,
Aug 12, 2019, 8:42:30 AM8/12/19
to sage-devel
This is partly why we invented ASDF. ASDF is also quite complex, and
can be used to store arbitrarily complicated data structures. But
it's mostly human-readable--I say "mostly" because it does support
blocks of binary data, though most of the time binary data is stored
through a binary array data structure which uses plain text to
describe the array format, so as long as the format of the binary part
is itself reasonably simple it's easy to reconstruct using the
metadata in the plain-text portions.

Although it was designed primarily with astronomy applications in
mind, the core format is domain-agnostic. It would be really neat to
see some ASDF "schemas" (descriptions of how specific types of data
are serialized in ASDF) for pure mathematics.

[1] https://en.wikipedia.org/wiki/Advanced_Scientific_Data_Format

Volker Braun

unread,
Aug 12, 2019, 8:45:55 AM8/12/19
to sage-devel
Not to be confused with Another System Definition Facility, the de facto standard build facility for Common Lisp....

Nils Bruin

unread,
Aug 12, 2019, 12:41:03 PM8/12/19
to sage-devel
On Monday, August 12, 2019 at 5:42:30 AM UTC-7, E. Madison Bray wrote:

Although it was designed primarily with astronomy applications in
mind, the core format is domain-agnostic.  It would be really neat to
see some ASDF "schemas" (descriptions of how specific types of data
are serialized in ASDF) for pure mathematics.

[1] https://en.wikipedia.org/wiki/Advanced_Scientific_Data_Format

On that scale of operation, there have been other initiatives; for instance OpenMath/MathML.
I don't think OpenMath managed to get much traction. From the perspective of a single computer algebra system there's always the problem that OpenMath isn't fully suited for the particular niche data that particular system is interested in. If you go back you'll find that in the early days, Sage actually had a design criterion that overlapped with some of the intended applications of OpenMath: interoperability of math software. I'd say sage has (in practice) been more successful in achieving that than OpenMath.

If you're interested in positioning ASDF for interoperability and long-term object storage in mathematics it would probably be good to compare it to OpenMath and point out where it's different (if OpenMath has indeed failed -- I may just be not aware of areas where it's been successful).
Reply all
Reply to author
Forward
0 new messages