pickling vs. default attributes

38 views
Skip to first unread message

Martin Albrecht

unread,
May 20, 2011, 9:29:21 AM5/20/11
to Sage Development
Hi there,

I have a Python question. Over at

#11316: Weighted degree term orders added

Kwankyu adds weighted term orders to Sage, which is awesome.

However, the patch breaks pickling as follows.

TermOrder objects now have an attribute __weights which is initialised to None
in __init__(). However, pickled Python objects by default do not have
__init__() called on them, instead an empty object is constructed and then the
attribute dictionary filled with the data from the pickle. Of course, old
pickles do not have a "__weights" entry and hence it does not get initialised
to None ... Boom.

What's the least intrusive way of dealing with this: i.e. to ensure that this
attribute is always initialised to None when an object of this type is
created. A class attribute or something?

Cheers,
Martin


--
name: Martin Albrecht
_pgp: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
_otr: 47F43D1A 5D68C36F 468BAEBA 640E8856 D7951CCF
_www: http://martinralbrecht.wordpress.com/
_jab: martinr...@jabber.ccc.de

Kwankyu Lee

unread,
May 22, 2011, 1:11:45 AM5/22/11
to sage-...@googlegroups.com
Hi Martin,

Thank you for asking this to sage-dev.

I want to be more informed about this pickled Sage objects stored in the pickle jar. First let me explain what I understand. The pickle jar is there to ensure that upgrading Sage does not break unpickling of Sage objects pickled in a previous version of Sage. That is good for backward compatibility. But to enhance Sage, I guess, it is frequently necessary to change internal logic or data structure of objects, which may lead to unpickling failures. If this happens, I think a proper way to deal with this situation is not to tweak the code to deal with the old pickled objects, but to replace the old pickled objects with new objects pickled with the upgraded Sage, and to provide some means to (hopefully seamlessly) upgrade the old objects pickled by users of previous versions of Sage (or just inform users about this backward incompatibility). This policy is different from the one for user interface. I think while we should be very conservative in changing, for example, signatures of methods or functions, we should be less so in changing internals for the purpose of enhancing Sage.

It is only for a few days that I cared about pickling and the pickle jar things. Let me know if I misunderstand something.


Kwankyu

Simon King

unread,
May 22, 2011, 3:26:11 AM5/22/11
to sage-devel
Hi Kwankyu

On 22 Mai, 07:11, Kwankyu Lee <ekwan...@gmail.com> wrote:
> I want to be more informed about this pickled Sage objects stored in the
> pickle jar. First let me explain what I understand. The pickle jar is there
> to ensure that upgrading Sage does not break unpickling of Sage objects
> pickled in a previous version of Sage.

I agree.

> But to enhance Sage, I guess, it is frequently necessary to
> change internal logic or data structure of objects, which may lead to
> unpickling failures.

And this must not happen. If the internal data structure has changed,
then it is imperative that data stored in the old format will be
correctly translated into the new format when unpickling.

> If this happens, I think a proper way to deal with this
> situation is not to tweak the code to deal with the old pickled objects, but
> to replace the old pickled objects with new objects pickled with the
> upgraded Sage, and to provide some means to (hopefully seamlessly) upgrade
> the old objects pickled by users of previous versions of Sage (or just
> inform users about this backward incompatibility).

I disagree.

> This policy is different
> from the one for user interface. I think while we should be very
> conservative in changing, for example, signatures of methods or functions,
> we should be less so in changing internals for the purpose of enhancing
> Sage.

"Changing internals" and "breaking pickles" are two very different
things, IMO. Unpickling old pickles should take care of a change of
internal representation. What you call "upgrade the old objects
pickled by users" must be part of the usual unpickling process.

@Martin:

I tried your idea of using a class attribute to initialise an instance
attribute as None. It seemed to work. But perhaps the Python experts
on this list know a pitfall...

Cheers,
Simon

Kwankyu Lee

unread,
May 22, 2011, 8:45:03 PM5/22/11
to sage-...@googlegroups.com
Hi Simon,

It seems to me that

Your remarks: If the internal data structure has changed, then it is imperative that data stored in the old format will be correctly translated into the new format  when unpickling. ... "upgrading the old objects pickled by users" must be part of the usual unpickling process.

is a nice form of my "seamlessly upgrading old pickles". Then my worry is that the "translating" code may clutter the main code implementing the objects. I feel that there should be some infrastructure to guide writing the "translating" code instead of each author doing it ad hoc. I imagine that the class of objects define "_translate_objects_from_xxx_to_XXX(self)" method where xxx is the old version and XXX is the newer version, and the unpickling procedure automatically invoke the methods for old objects pickled by Sage ver. xxx to upgrade them to objects of Sage ver. XXX. An author changing the data structure then may add such a method. The translation may happen in several steps from xxx1 to xxx2, and from xxx2 to xxx3, and so on to XXX.


Kwankyu




Simon King

unread,
May 23, 2011, 2:42:20 AM5/23/11
to sage-devel
On 23 Mai, 02:45, Kwankyu Lee <ekwan...@gmail.com> wrote:
> Then my worry is
> that the "translating" code may clutter the main code implementing the
> objects.

No. The main code implementing the objects is not touched by pickling/
unpickling.

> I feel that there should be some infrastructure to guide writing
> the "translating" code instead of each author doing it ad hoc. I imagine
> that the class of objects define "_translate_objects_from_xxx_to_XXX(self)"
> method where xxx is the old version and XXX is the newer version, and the
> unpickling procedure automatically invoke the methods for old objects
> pickled by Sage ver. xxx to upgrade them to objects of Sage ver. XXX. An
> author changing the data structure then may add such a method. The
> translation may happen in several steps from xxx1 to xxx2, and from xxx2 to
> xxx3, and so on to XXX.

That is all done by the usual Python or Cython pickling methods. See
http://docs.python.org/library/pickle.html#pickle-protocol

The basic idea is that you do *not* store implementation details or a
bit-wise copy of the memory. Instead, you store a reference to a
function unpickle_function, together with arguments A to that
function, so that unpickle_function(*A) reconstructs the object that
you wanted to pickle.

EXAMPLE:

If you have an extension class, then pickling relies on a method
__reduce__, such as here:

sage: P.<a,b,c> = PolynomialRing(QQ)
sage: L = P.__reduce__(); L
(<built-in function unpickle_MPolynomialRing_libsingular>,
(Rational Field, ['a', 'b', 'c'], Degree reverse lexicographic term
order))
sage: L[0](*L[1])
Multivariate Polynomial Ring in a, b, c over Rational Field

So, the unpickle_function returns a polynomial ring with a given
basering, list of variable names, and term order.

Here is the string that is used to save P:

sage: dumps(P, compress=False)
'\x80\x02csage.rings.polynomial.multi_polynomial_libsingular
\nunpickle_MPolynomialRing_libsingular\nq\x01csage.rings.rational_field
\nRationalField\nq\x02)Rq\x03]q\x04(U\x01aU\x01bU
\x01cecsage.rings.polynomial.term_order\nTermOrder\nq\x05)\x81q\x06}q
\x07(U\x18_TermOrder__singular_strq\x08U\x02dpq\tU\x10_TermOrder__nameq
\nU\tdegrevlexq\x0bU\x07__doc__q\x0cU\xf3\nDegree reverse
lexicographic (degrevlex) term ordering.\n\nLet $deg(x^a) = a_0 + ...
+ a_{n-1},$ then $x^a < x^b <=> deg(x^a) < deg(x^b)$ or\n$deg(x^a) =
deg(x^b)$ and $\\exists\\ 0 <= i < n: a_{n-1} = b_{n-1}, ..., a_{i+1}
= b_{i+1}, a_i > b_i.$\nq\rU\x12_TermOrder__matrixq\x0eNU
\x19_TermOrder__macaulay2_strq\x0fU\x07GRevLexq\x10U
\x12_TermOrder__lengthq\x11K\x03U\x06blocksq\x12h\x05)\x81q\x13}q\x14(h
\x08h\th\nU\tdegrevlexq\x15h\x0ch\rh\x0eNh\x0fh\x10h\x11K\x03h\x12)U
\x15_TermOrder__magma_strq\x16U\t"grevlex"q\x17U\x11_TermOrder__forceq
\x18\x89ub\x85q\x19h\x16h\x17h\x18\x89ub\x87Rq\x1a.'

So, the first part of the string (that would usually be compressed, of
course) only describes where the unpickle function can be found, *not*
how it is implemented (and that is essential!).

Now, assume that back in the old times you had a "Polynomial ring over
QQ generated by a,b in degrevlex order, implemented in libSingular",
and you stored it by save(P, 'my_pickle.sobj').

Later, William Stein decides to replace libSingular by a new
implementation of polynomial rings, say, written in Lisp (he sometimes
does those things on April 1st).

William could completely remove libSingular from sage, as long as he
keeps a function
sage.rings.polynomial.multi_polynomial_libsingular.unpickle_MPolynomialRing_libsingular.
That function should still accept a ring, a list of strings and a term
order as arguments, but it should now return an instance of his new
Lisp-implementation of polynomial rings, with the given basering,
variable names, and term order.

Then, load('my_pickle.sobj') will simply return a "Polynomial ring
over QQ generated by a,b in degrevlex order, implemented in Lisp".

Note that there was no "cluttering William's nice new Lisp code by
details of how libSingular used to work", and no transformation of
your old pickle in a new pickle format was needed.

Cheers,
Simon

Simon King

unread,
May 23, 2011, 5:15:31 AM5/23/11
to sage-devel
Hi Martin,

sorry that this thread has changed topic. Back to your original
question:

On 20 Mai, 15:29, Martin Albrecht <martinralbre...@googlemail.com>
wrote:
> What's the least intrusive way of dealing with this: i.e. to ensure that this
> attribute is always initialised to None when an object of this type is
> created. A class attribute or something?

It seems to me that it would work, such as here:

sage: class MyClass(object):
....: __weight = None
....: def __init__(self, weight):
....: self.__weight = weight
....: def test(self):
....: return self.__weight
....:
sage: A = MyClass(3)
sage: A.test()
3 # no surprise

Now, we create a non-initialised instance of MyClass:
sage: B = A.__new__(MyClass)

The attribute is correctly initialised to None (so, during
initialisation of A, the value of the class attribute has not been
changed):
sage: print B.test()
None

But, to be honest, I'd appreciate some Python expert to explain
whether there are pitfalls in that approach.

Of course, one could easily override the default value of the
attribute (too easily, perhaps):
sage: B.__class__._MyClass__weight = 15 # use Python's name
mangling
sage: C = A.__new__(MyClass)
sage: C.test()
15 # and not None.

Cheers,
Simon

Martin Albrecht

unread,
May 23, 2011, 5:42:50 AM5/23/11
to sage-...@googlegroups.com

I think this is fine, Python does allow to shoot yourself in the foot like
this but it's unlikely to happen by accident. By approach was to implement
__getattr__ for _weights but your's is less code :)

Nicolas M. Thiery

unread,
Jun 3, 2011, 2:36:06 PM6/3/11
to sage-...@googlegroups.com
On Mon, May 23, 2011 at 10:42:50AM +0100, Martin Albrecht wrote:
> I think this is fine, Python does allow to shoot yourself in the foot like
> this but it's unlikely to happen by accident. By approach was to implement
> __getattr__ for _weights but your's is less code :)

Besides, I tend to use __getattr__ only as a last resort. Indeed a
given class can only have one __getattr__ method. So if there are
simultaneous needs for __getattr__ (e.g. for categories, for cached
methods, for...), it gets messy to handle them all. Even more so while
keeping efficiency.

Cheers,
Nicolas
--
Nicolas M. Thi�ry "Isil" <nth...@users.sf.net>
http://Nicolas.Thiery.name/

Reply all
Reply to author
Forward
0 new messages