Re: [sage-devel] element_wrapper.py: Sage 4.3.2.alpha1 segfault on Mac OS X 10.6.2

8 views
Skip to first unread message
Message has been deleted

Nicolas M. Thiery

unread,
Feb 2, 2010, 4:16:00 PM2/2/10
to sage-...@googlegroups.com
On Wed, Feb 03, 2010 at 04:28:00AM +1100, Minh Nguyen wrote:
> Hi folks,
> > sage -t "devel/sage/sage/structure/element_wrapper.py" # Segfault
>
> I get the same result on bsd.math (Mac OS X 10.6.2). Doing a verbose
> long doctest, I get:
>
> Trying:
> Integer(1) < l11###line 213:_sage_ >>> 1 < l11
> Expecting:
> False

Ouch. Well, ElementWrapper is a pure python class, and here the crash
is probably in the comparison function for Integer's. Ah, but 1 and
l11 have the same parent (that's just because, for the tests, I needed
a parent to use). Maybe the comparison function of Integer then gets
confused by this, and wrongly assumes that l11 is an Integer and does
some wrong memory manipulation. If that's the case, then the following
should be a fairly minimal code causing the crash:

sage: class F(Element):
....: pass
....:
sage: x = F(ZZ)
sage: 1 < x
False

If this is confirmed, I don't mind using a more sane parent for the
tests. On the other hand, getting a segfault with an (admittedly ill)
piece of pure Python code is not good. Could any expert of the arcanes
of Integer comparison have a look?

Cheers,
Nicolas


> ----------------------------------------------------------------------
> The following tests failed:
>
>
> sage -t -verbose -long "devel/sage-main/sage/structure/element_wrapper.py"
>
> The relevant doctest from sage/structure/element_wrapper.py that
> causes the segfault is:
>
> sage: class MyElement(ElementWrapper):
> ... __lt__ = ElementWrapper._lt_by_value
> ...
> sage: parent1 = ZZ
> sage: parent2 = QQ
> sage: l11 = MyElement(1, parent = parent1)
> sage: l12 = MyElement(2, parent = parent1)
> sage: l21 = MyElement(1, parent = parent2)
> sage: l22 = MyElement(2, parent = parent2)
> sage: l11 < l11
> False
> sage: l11 < l12, l12 < l11 # values differ
> (True, False)
> sage: l11 < l21 # parents differ
> False
> sage: l11 < 1 # class differ
> False
> sage: 1 < l11
> False
>
> --
> Regards
> Minh Van Nguyen
>
> --
> To post to this group, send an email to sage-...@googlegroups.com
> To unsubscribe from this group, send an email to sage-devel+...@googlegroups.com
> For more options, visit this group at http://groups.google.com/group/sage-devel
> URL: http://www.sagemath.org
Nicolas
--
Nicolas M. Thi�ry "Isil" <nth...@users.sf.net>
http://Nicolas.Thiery.name/

Message has been deleted

Craig Citro

unread,
Feb 5, 2010, 5:44:59 AM2/5/10
to sage-devel, Nicolas M. Thiery
> If this is confirmed, I don't mind using a more sane parent for the
> tests. On the other hand, getting a segfault with an (admittedly ill)
> piece of pure Python code is not good. Could any expert of the arcanes
> of Integer comparison have a look?
>

Yep, using ZZ as a parent for something which isn't of class
sage.rings.integer.Integer is what was causing the segfault here.
There's actually a bit more to the story -- in particular, one should
wonder why it only segfaulted on OSX. I'll make some comments about
what was going on below; more importantly, though, is what we should
do about fixing it. In general, there are a lot of parents that assume
that elements with that parent are of some fixed type; breaking this
rule would be bad on two counts: (1) we'd pay the price by having to
do a bunch of typechecks everywhere, which is no good, and (2) the
logic that depends on this isn't clearly defined -- it's just an
*implicit* assumption in lots of code all over the place.

So how should we fix it? Robert Bradshaw pointed out that there should
probably be a corresponding ParentWrapper, that one could use to
create wrapped parents for the wrapped elements. In fact, I think we
should go one step further -- I don't see why you should be able to
end up with an ElementWrapper without the corresponding ParentWrapper.
So passing parent=P should probably just create a wrapper out of P, if
it isn't one already. In general, being able to just choose your
parent out of the blue is a dangerous thing ... this might be a
reasonably controlled way to do so.

In the short term, someone needs to change up the docstrings in
element_wrapper.py, and at the very least, put a big warning in about
the dangers of getting to choose your parents. (There's surely a good
joke I'm missing here.) Nicolas, it might make the most sense for you
to modify the docstrings, if you have time ...

Now for the interesting part: what was going on? Well, the example on
the ticket creates an integer wrapper object, and then calls into the
integer comparison code with it. Since the two have identical parents,
the code lets it in, and passes it off to a function that implicitly
assumes both inputs are of type sage.rings.integer.Integer. Once that
code (_cmp_c_impl) gets its hands on them, it calls off to the gmp
mpz_cmp function to do a comparison on the two underlying mpz_ts.
However, the second argument was some other random Python object --
so, on OS X, it ends up taking some random chunk of the object, and
dereferencing it ... boom.

But why only on OSX? The sage.rings.integer.Integer object stores just
a bit of information and an mpz_t; the mpz_t is the number of
allocated limbs, the number of used limbs, and a pointer to the limbs.
I just ended up printing out the relevant bytes on my laptop and
sage.math ... on my laptop, both of the first two entries were just
large numbers. However, on sage.math, it happens that in the same
situation, the second entry happened to be all zeros. That means that
when mpz_cmp looks at the number, it decides that it's just zero, and
does the comparison without ever trying to actually access the limbs
-- so it works, except that it pretends the number is 0. (Indeed, 0 ==
a will return True on sage.math, for a as in the description on
#8177.) I just chatted with Robert Bradshaw, and at least in one
example on my laptop, creating a weakref will "fix" the problem -- no
more segfault. But then, you've still got a nonsense value it found
elsewhere in memory. (Interestingly, on his machine, he has the
opposite behavior -- a acts like 0 before the weakref, and segfaults
once it's there.)

-cc

Robert Bradshaw

unread,
Feb 5, 2010, 6:32:48 AM2/5/10
to sage-...@googlegroups.com

For the curious, on my machine (little-endian 32-bit):

sage: hex(id(ZZ))
'0x191e6c0'

sage: hexdump(Integer(3))
000: 04 00 00 00
004: 00 34 4d 03
008: a0 b1 4e 03
012: c0 e6 91 01 <-- parent (=ZZ)
016: 0a 00 00 00 <-- mpz_t._alloc
020: 01 00 00 00 <-- mpz_t._size
024: 00 98 33 0a <-- mpz_t._d

sage: b = ElementWrapper(3, ZZ)
sage: hex(id(b.__dict__))
'0xb84b810'

sage: hexdump(ElementWrapper(b, ZZ))
000: 03 00 00 00
004: 00 0d a9 01
008: a0 31 d2 01
012: c0 e6 91 01 <-- parent (=ZZ)
016: 60 46 85 0b <-- __dict__
020: 00 00 00 00 <-- __weakref__ (=NULL)
<-- shorter struct

# acts like Integer(0) as the NULL __weakref__ is interpreted as
mpz_t._size = 0
sage: 5 + b
5
sage: 0 == b
True

sage: f = weakref.ref(b)
sage: hex(id(b.__weakref__))
'0xb86fe70'

sage: hexdump(b)
000: 04 00 00 00
004: 00 0d a9 01
008: a0 31 d2 01
012: c0 e6 91 01
016: 10 b8 84 0b
020: 70 fe 86 0b <-- __weakref__ (garbage used for mpz_t._size,
dereference off the end of the struct)

sage: 5 + b # BOOM

Nicolas M. Thiery

unread,
Feb 6, 2010, 5:27:59 AM2/6/10
to sage-...@googlegroups.com
Hi Craig!

Thanks for investigating this in detail!

On Fri, Feb 05, 2010 at 02:44:59AM -0800, Craig Citro wrote:
> Yep, using ZZ as a parent for something which isn't of class
> sage.rings.integer.Integer is what was causing the segfault here.
> There's actually a bit more to the story -- in particular, one should
> wonder why it only segfaulted on OSX. I'll make some comments about
> what was going on below; more importantly, though, is what we should
> do about fixing it. In general, there are a lot of parents that assume
> that elements with that parent are of some fixed type; breaking this
> rule would be bad on two counts: (1) we'd pay the price by having to
> do a bunch of typechecks everywhere, which is no good, and (2) the
> logic that depends on this isn't clearly defined -- it's just an
> *implicit* assumption in lots of code all over the place.

That's indeed a reasonable assumption, and I would not mind if
breaking this assumption would result in an exception, or some mild
unpredictable result. On the other hand, I am uncomfortable with
getting a segfault by writing rather innocent looking pure Python
code, especially if the segfault is non systematic or platform
dependent.

Just a quick rant. Here the thing is that we have an invariant for:

self.parent() = ZZ <=> self.__class__ = Integer

When some low-level method of IntegerRing / IntegerRing needs to check
than an object is an integer, and don't want to test both, wouldn't it
be safer (and as fast?) to test that the class of the object is
Integer, and then assume that its parent is ZZ, rather than the
converse? That sounds safer to me, because breaking the assumption
requires doing something wrong with the class Integer, which is well
localized. Whereas any element far far away in user's code can wrongly
set its parent to IntegerRing.

That being said, all of this is far from my area of expertise, and I
let you judge for the best.

> So how should we fix it? Robert Bradshaw pointed out that there should
> probably be a corresponding ParentWrapper, that one could use to
> create wrapped parents for the wrapped elements. In fact, I think we
> should go one step further -- I don't see why you should be able to
> end up with an ElementWrapper without the corresponding ParentWrapper.
> So passing parent=P should probably just create a wrapper out of P, if
> it isn't one already. In general, being able to just choose your
> parent out of the blue is a dangerous thing ... this might be a
> reasonably controlled way to do so.


For all practical use cases, the purpose of ElementWrapper is to
construct an element class for a specific parent that one is
implementing (see the category examples for extensive use
cases). Those parents are not wrapping anything in particular. The
fact that the data structure of an element consists of an integer does
not mean that the parent has anything to do with IntegerRing.


To make it short: the use of ZZ in the doctest is purely for testing
purposes, and does not reflect any practical use cases. Florent's fix
is good (thanks Florent by the way).

Cheers,

Robert Bradshaw

unread,
Feb 6, 2010, 6:21:54 AM2/6/10
to sage-...@googlegroups.com
On Feb 6, 2010, at 2:27 AM, Nicolas M. Thiery wrote:

> Hi Craig!
>
> Thanks for investigating this in detail!
>
> On Fri, Feb 05, 2010 at 02:44:59AM -0800, Craig Citro wrote:
>> Yep, using ZZ as a parent for something which isn't of class
>> sage.rings.integer.Integer is what was causing the segfault here.
>> There's actually a bit more to the story -- in particular, one should
>> wonder why it only segfaulted on OSX. I'll make some comments about
>> what was going on below; more importantly, though, is what we should
>> do about fixing it. In general, there are a lot of parents that
>> assume
>> that elements with that parent are of some fixed type; breaking this
>> rule would be bad on two counts: (1) we'd pay the price by having to
>> do a bunch of typechecks everywhere, which is no good, and (2) the
>> logic that depends on this isn't clearly defined -- it's just an
>> *implicit* assumption in lots of code all over the place.
>
> That's indeed a reasonable assumption, and I would not mind if
> breaking this assumption would result in an exception, or some mild
> unpredictable result. On the other hand, I am uncomfortable with
> getting a segfault by writing rather innocent looking pure Python
> code, especially if the segfault is non systematic or platform
> dependent.

It was just pure luck that it ever worked on any system... I agree
that an exception would be prettier, but the same parent => avoid
typechecks is an important optimization, especially for arithmetic of
basic types.

> Just a quick rant. Here the thing is that we have an invariant for:
>
> self.parent() = ZZ <=> self.__class__ = Integer
>
> When some low-level method of IntegerRing / IntegerRing needs to check
> than an object is an integer, and don't want to test both, wouldn't it
> be safer (and as fast?) to test that the class of the object is
> Integer, and then assume that its parent is ZZ, rather than the
> converse? That sounds safer to me, because breaking the assumption
> requires doing something wrong with the class Integer, which is well
> localized. Whereas any element far far away in user's code can wrongly
> set its parent to IntegerRing.

That wouldn't work because there is usually a many to one relationship
between parents and classes. Fortunately, users typically won't be
setting unrelated parents of elements directly.

- Robert

Nicolas M. Thiery

unread,
Feb 6, 2010, 7:28:00 AM2/6/10
to sage-...@googlegroups.com
On Sat, Feb 06, 2010 at 03:21:54AM -0800, Robert Bradshaw wrote:
> It was just pure luck that it ever worked on any system... I agree
> that an exception would be prettier, but the same parent => avoid
> typechecks is an important optimization, especially for arithmetic
> of basic types.
>
> >Just a quick rant. Here the thing is that we have an invariant for:
> >
> > self.parent() = ZZ <=> self.__class__ = Integer
> >
> >When some low-level method of IntegerRing / IntegerRing needs to check
> >than an object is an integer, and don't want to test both, wouldn't it
> >be safer (and as fast?) to test that the class of the object is
> >Integer, and then assume that its parent is ZZ, rather than the
> >converse? That sounds safer to me, because breaking the assumption
> >requires doing something wrong with the class Integer, which is well
> >localized. Whereas any element far far away in user's code can wrongly
> >set its parent to IntegerRing.
>
> That wouldn't work because there is usually a many to one
> relationship between parents and classes.

My suggestion was surely only for basic arithmetic types where speed
is an critical issue. For those, don't we have a one to one relation?

> Fortunately, users typically won't be setting unrelated parents of
> elements directly.

Users probably not. Bugs in the code might. And when this will occur
this will most likely be in an intricate situation. I am glad that you
are here volunteering for debugging any segfaults that might be of
this nature :-)

By the way: I have moved the "improvement to ElementWrapper's doctest"
to a new ticket #8200, and updated #8177 to focus on this specific
issue.

Robert Bradshaw

unread,
Feb 6, 2010, 1:57:09 PM2/6/10
to sage-...@googlegroups.com
On Feb 6, 2010, at 4:28 AM, Nicolas M. Thiery wrote:

> On Sat, Feb 06, 2010 at 03:21:54AM -0800, Robert Bradshaw wrote:
>> It was just pure luck that it ever worked on any system... I agree
>> that an exception would be prettier, but the same parent => avoid
>> typechecks is an important optimization, especially for arithmetic
>> of basic types.
>>
>>> Just a quick rant. Here the thing is that we have an invariant for:
>>>
>>> self.parent() = ZZ <=> self.__class__ = Integer
>>>
>>> When some low-level method of IntegerRing / IntegerRing needs to
>>> check
>>> than an object is an integer, and don't want to test both,
>>> wouldn't it
>>> be safer (and as fast?) to test that the class of the object is
>>> Integer, and then assume that its parent is ZZ, rather than the
>>> converse? That sounds safer to me, because breaking the assumption
>>> requires doing something wrong with the class Integer, which is well
>>> localized. Whereas any element far far away in user's code can
>>> wrongly
>>> set its parent to IntegerRing.
>>
>> That wouldn't work because there is usually a many to one
>> relationship between parents and classes.
>
> My suggestion was surely only for basic arithmetic types where speed
> is an critical issue. For those, don't we have a one to one relation?

For a couple of types, like ZZ and QQ, yes. RR and CC support
different precision parents (and rounding modes), and of course Z/nZ
is pretty basic (and should be insanely fast). The code path in
question here is the very top-level check in binary operations, where
if the Parent is the same we assume we can get directly to work.
Specific classes can do their own typechecking there if they feel its
necessary (that would be hard to do generically anyways).

Perhaps a better way to phrase the implicit assumption is that an
Element knows about all possible other Elements with the same parent.

- Robert

Reply all
Reply to author
Forward
0 new messages