How to proceed to reduce Sage's memory leaking?

Nils Bruin

unread,

Nov 3, 2012, 3:58:08 PM11/3/12

to sage-devel

Presently, Sage has a significant memory leak issue: Uniqueness of
parents is currently guaranteed by keeping them in memory
permanently.This prevents many computational strategies that are
otherwise perfectly legitimate, but require the construction of, for
instance, many finite fields and/or polynomial rings. A lot of
arithmetic geometric constructions fall in that category and current
approaches to tackle noncommutative algebra do too. Every single time
I have tried to use sage for some significant computation, this has
prevented me from completing it.

There has been work on resolving this issue by replacing permanent
storage by weak caching, so that parents can actually get delete; see
tickets #715 and #11521, for instance. The code on these tickets is by
now one of the most carefully reviewed (future) parts of sage.
However, time and again, issues elsewhere crop up because there is
broken code elsewhere that never got exercised because parents never
got deleted, even though they should.

We have been in shape a couple of times now, where all noticeable
issues were resolved. However, the merger of *other* tickets brought
to light even different issues, resulting in pulling #715 and #11521.

If we ever want sage to be usable on a reasonable scale, parents need
to be deleted every now and again. The basic design allows them to.
It's just that there is a lot code in sage that breaks when that
actually happens. Apparently, the normal ticket review and merger
process is not suitable for a change that's so fundamental to sage's
infrastructure, because it favours small-scale and superficial patches
(and hence keeps moving the goal posts for infrastructure changes).
Any ideas how to get this done?
For me this is a must-have to consider sage as viable platform and I
suspect I am not the only one for which it is.

Cheers,

Nils

Jeroen Demeyer

unread,

Nov 3, 2012, 4:12:54 PM11/3/12

to sage-...@googlegroups.com

Let me add to this that the bugs revealed by these tickets are often
quite complex. These are hard to debug, both for Nils Bruin and Simon
King working on the ticket, and for me as release manager.

For example, I remember in the past two seemingly unrelated tickets
which together caused a bug, but independently did not.

Travis Scrimshaw

unread,

Nov 3, 2012, 5:58:10 PM11/3/12

to sage-...@googlegroups.com

Here are my thoughts on the matter, but I'm not an expert on the innerworkings of sage, so please forgive/tell me if this is already done/impossible.

I propose limiting the size of the cache of parents and keep track of the references of parents. Thus parents with the fewest references should be replaced from the cache once we reach the maximum. Additionally if a parent has no references, we allow the garbage collector to take the parent. To get around referenced parents being spontaneously deleted, every time we return a parent object, we have a lightweight bridge class which recreates the parents if they've been deleted when called (which also gets notified when the parent is deleted). Something like this:

class ParentBridge:

def __init__(self, parent_class, data):

self._parent_class = parent_class

self._data = data # arguments passed to the parent

self._parent = None

def parent(self):

if self._parent is None:

self._create_parent()

return self._parent

def _create_parent(self):

# Do stuff to create parent in the cache

self._parent = # the created parent

def _parent_deleted(self):

self._parent = None

We also return the same ParentBridge when the parent is stored in the cache. This would basically be a slight modification of a weak reference which recreates the target object if it is invalid. Another variant is we implement some other type of cache replacement algorithm (http://en.wikipedia.org/wiki/Cache_algorithms).

Alternatively we could just allow parents with no references are allowed to be garbage collected. This will likely not break any doctests since checking parent identity is usually in successive lines and the garbage collector usually does not have time to collect anything within a few lines when doctesting. We might also want to add a flag for (very) select instances which says that it can never be collected.

In both of the above, there is at most 1 instance of a given parent at any one time, so I do not foresee any problems (as long as we can reconstruct the parent object and appropriate references if it's deleted). Nevertheless, how we implement this must minimally change the interface, and I suspect the first way I suggested may require substantial change...

Best,

Travis

Volker Braun

unread,

Nov 3, 2012, 7:18:11 PM11/3/12

to sage-...@googlegroups.com

I'd say talk to Jeroen to make collectable parents a priority for one release. For example, lets have 5.5 as a the release where we add the collectable parents. Push out a beta1 with these patches, then we'll have a month during Jeroen's holiday where we can check any other tickets. No other tickets get merged if they break the parents stuff.

Jeroen Demeyer

unread,

Nov 3, 2012, 7:41:56 PM11/3/12

to sage-...@googlegroups.com

An extra complication is that the breakage is often non-reproducible and
system-dependent. Together with the wierd interaction between seemingly
unrelated patches, even determining whether a patch breaks the parent
stuff is very non-trivial.

Volker Braun

unread,

Nov 3, 2012, 8:06:42 PM11/3/12

to sage-...@googlegroups.com

You make it sound like there is just not enough doctesting coverage. The Sage doctests generally do not generate a lot of parents in one go. Maybe its just that the coverage of this use case needs to be improved? E.g. create a list of thousands of parents, delete random subset, garbage collect, repeat?

I admit that I haven't followed these patches as much as I would. Its clear that deleting parents can trigger lots of nasty stuff. We need to understand how to exercise that code.

If we can agree to dedicating a point release to this issue then that just means that beta0 is going to be broken on some systems. I take it this is Nils' original objection: Not every beta has to work perfectly on every system. If you merge a hundred small patches then its reasonable to kick everything back out that triggers a doctest failure. But if you want to make progress on a big issue then you have to accept that a beta is going to be imperfect and meant to expose a ticket to a much wider audience.

Francois Bissey

unread,

Nov 3, 2012, 8:23:57 PM11/3/12

to sage-...@googlegroups.com

On 04/11/12 13:06, Volker Braun wrote:
> You make it sound like there is just not enough doctesting coverage. The
> Sage doctests generally do not generate a lot of parents in one go.
> Maybe its just that the coverage of this use case needs to be improved?
> E.g. create a list of thousands of parents, delete random subset,
> garbage collect, repeat?
>
> I admit that I haven't followed these patches as much as I would. Its
> clear that deleting parents can trigger lots of nasty stuff. We need to
> understand how to exercise that code.
>
> If we can agree to dedicating a point release to this issue then that
> just means that beta0 is going to be broken on some systems. I take it
> this is Nils' original objection: Not every beta has to work perfectly
> on every system. If you merge a hundred small patches then its
> reasonable to kick everything back out that triggers a doctest failure.
> But if you want to make progress on a big issue then you have to accept
> that a beta is going to be imperfect and meant to expose a ticket to a
> much wider audience.
>
>

Actually, because some of the bugs are platform dependent etc... the
audience from a beta may not be big enough.
But nevertheless we have to just bit the bullet, do the best we can
and fix things as they become apparent. We cannot stop moving forward
because we are afraid to break stuff accidentally forever.

Francois

Jeroen Demeyer

unread,

Nov 4, 2012, 3:29:16 AM11/4/12

to sage-...@googlegroups.com

On 2012-11-04 01:06, Volker Braun wrote:
> You make it sound like there is just not enough doctesting coverage. The
> Sage doctests generally do not generate a lot of parents in one go.
> Maybe its just that the coverage of this use case needs to be improved?
> E.g. create a list of thousands of parents, delete random subset,
> garbage collect, repeat?

It would be absolutely awesome if we would have good doctests for this.
Of all the tickets I have ever seen as release manager, this is
probably the single hardest ticket to debug and find out why stuff
breaks (with #12221 as honorable second).

Jeroen Demeyer

unread,

Nov 4, 2012, 3:36:48 AM11/4/12

to sage-...@googlegroups.com

On 2012-11-04 01:23, Francois Bissey wrote:
> But nevertheless we have to just bit the bullet, do the best we can
> and fix things as they become apparent. We cannot stop moving forward
> because we are afraid to break stuff accidentally forever.

OK, let's go for it!

Do you want also other tickets like #12215 and #12313 or should we do
just #715 + #11521?

Francois Bissey

unread,

Nov 4, 2012, 4:14:41 AM11/4/12

to sage-...@googlegroups.com

It may be best to do only one set of big changes at a time just not
confuse issues. But these two sets may be similar enough.
Any other opinions?

Francois

Robert Bradshaw

unread,

Nov 5, 2012, 3:12:02 PM11/5/12

to sage-...@googlegroups.com

+1. I've always been meaning to get back to this for ages, but just
haven't found the time. If we're going to make a big push to get this
in, I'll do what I can to help.

For testing, I would propose we manually insert gc operations
periodically to see if we can reproduce the failures more frequently.
We could then marks some (hopefully a very small number) parents as
"unsafe to garbage collect" and go forward with this patch, holding
hard references to all "unsafe" parents to look into them later (which
isn't a regression).

- Robert

Simon King

unread,

Nov 5, 2012, 6:25:20 PM11/5/12

to sage-...@googlegroups.com

Hi Robert,

On 2012-11-05, Robert Bradshaw <robe...@gmail.com> wrote:
> +1. I've always been meaning to get back to this for ages, but just
> haven't found the time. If we're going to make a big push to get this
> in, I'll do what I can to help.

I'd appreciate your support!

> For testing, I would propose we manually insert gc operations
> periodically to see if we can reproduce the failures more frequently.

How can one insert gc operations? You mean, by inserting gc.collect()
into doctests, or by manipulating the Python call hook?

> We could then marks some (hopefully a very small number) parents as
> "unsafe to garbage collect" and go forward with this patch, holding
> hard references to all "unsafe" parents to look into them later (which
> isn't a regression).

That actually was what we tried: There was some bug that has only
occurred on bsd.math, and could be fixed by keeping a strong cache for
polynomial rings (which is inacceptable for my own project, but which
is at least no regression).

Anyway. I did not look into the new problems yet. If it is (again) about
libsingular polynomial rings, then I think we should really make an
effort to get reference counting for libsingular rings right.

Best regards,
Simon

Robert Bradshaw

unread,

Nov 5, 2012, 8:15:07 PM11/5/12

to sage-...@googlegroups.com

On Mon, Nov 5, 2012 at 3:25 PM, Simon King <simon...@uni-jena.de> wrote:
> Hi Robert,
>
> On 2012-11-05, Robert Bradshaw <robe...@gmail.com> wrote:
>> +1. I've always been meaning to get back to this for ages, but just
>> haven't found the time. If we're going to make a big push to get this
>> in, I'll do what I can to help.
>
> I'd appreciate your support!
>
>> For testing, I would propose we manually insert gc operations
>> periodically to see if we can reproduce the failures more frequently.
>
> How can one insert gc operations? You mean, by inserting gc.collect()
> into doctests, or by manipulating the Python call hook?

I was thinking about inserting it into the doctesting code, e.g. with
a random (know seen) x% chance between any two statements.

>> We could then marks some (hopefully a very small number) parents as
>> "unsafe to garbage collect" and go forward with this patch, holding
>> hard references to all "unsafe" parents to look into them later (which
>> isn't a regression).
>
> That actually was what we tried: There was some bug that has only
> occurred on bsd.math, and could be fixed by keeping a strong cache for
> polynomial rings (which is inacceptable for my own project, but which
> is at least no regression).
>
> Anyway. I did not look into the new problems yet. If it is (again) about
> libsingular polynomial rings, then I think we should really make an
> effort to get reference counting for libsingular rings right.

True, but I'd rather no particular ring hold us back from getting the
general fix in.

- Robert

Jeroen Demeyer

unread,

Nov 12, 2012, 4:47:16 PM11/12/12

to sage-...@googlegroups.com

Bad news again. During a preliminary test of sage-5.5.beta2, I got again
a segmentation fault in
devel/sage/sage/schemes/elliptic_curves/ell_number_field.py
but this time on a different system (arando: Linux i686) and with a
different set of patches as before. And for added fun: this time the
error isn't always reproducible.

Nils Bruin

unread,

Nov 12, 2012, 10:16:15 PM11/12/12

to sage-devel

On Nov 12, 1:47 pm, Jeroen Demeyer <jdeme...@cage.ugent.be> wrote:
> And for added fun: this time the error isn't always reproducible.

That's excellent news! Just keep trying until it's not reproducible
anymore. Then we're fine!

Seriously though, given that the bug pops up in the same file as
before indicates that probably the deletion of a similar kind of
object is to blame here. We just need to keep trying until we find a
way to consistently produce the error on a platform with reasonable
debugging tools.

Incidentally: Are PPC-OSX4 (or where-ever the problem earlier arose)
and i686 both 32 bit platforms? My bet is singular, since we know
refcounting there (or at least our interfacing with it) is handled
fishily and a previous issue indicated that omalloc is almost taylor-
made to generate different problems on different wordlengths.

Michael Welsh

unread,

Nov 12, 2012, 10:17:57 PM11/12/12

to sage-...@googlegroups.com

On 13/11/2012, at 4:16 PM, Nils Bruin <nbr...@sfu.ca> wrote:
>
> Incidentally: Are PPC-OSX4 (or where-ever the problem earlier arose)
> and i686 both 32 bit platforms?

Yes.

Jean-Pierre Flori

unread,

Nov 13, 2012, 8:13:04 PM11/13/12

to sage-...@googlegroups.com

I'll try to setup a 32 bits (on i686) install of the latest beta this week end and give this a shot...
If I'm lucky enough, I'll be able to reproduce the problem and get a proper backtrace, hopefully pointing to libsingular.

Jeroen Demeyer

unread,

Nov 14, 2012, 11:29:48 AM11/14/12

to sage-...@googlegroups.com

It happens also on other systems, including 64-bit. It's easy to
reproduce on the Skynet machine "sextus" (Linux x86_64) where it happens
about 71% of the time.

Nils Bruin

unread,

Nov 14, 2012, 1:06:53 PM11/14/12

to sage-devel

On Nov 14, 8:29 am, Jeroen Demeyer <jdeme...@cage.ugent.be> wrote:
> It happens also on other systems, including 64-bit. It's easy to
> reproduce on the Skynet machine "sextus" (Linux x86_64) where it happens
> about 71% of the time.

That might be workable. What exact version/patches to reproduce the
problem? (I don't think I have a login on "sextus"). I don't promise
that I'll actually have time to build/test/track down this problem,
but I can see. Other people should definitely look at it too.

Jeroen Demeyer

unread,

Nov 14, 2012, 2:28:16 PM11/14/12

to sage-...@googlegroups.com

On 2012-11-14 19:06, Nils Bruin wrote:
> I don't think I have a login on "sextus"

FYI: it's a Fedora 16 system with an Intel(R) Pentium(R) 4 CPU 3.60GHz
processor running Linux 3.3.7-1.fc16.x86_64.

Nils Bruin

unread,

Nov 14, 2012, 5:34:27 PM11/14/12

to sage-devel

On Nov 14, 11:28 am, Jeroen Demeyer <jdeme...@cage.ugent.be> wrote:
> FYI: it's a Fedora 16 system with an Intel(R) Pentium(R) 4 CPU 3.60GHz
> processor running Linux 3.3.7-1.fc16.x86_64.

That sounded convenient because my desktop is similar:

Fedora 16 running 3.6.5-2.fc16.x86_64 #1 SMP on Intel(R) Core(TM)
i7-2600 CPU @ 3.40GHz

No such luck, however:

with

$ ./sage -v
Sage Version 5.5.beta2, Release Date: 2012-11-13

I ran

for i in `seq 100`; do
echo $i;
./sage -t devel/sage/sage/schemes/elliptic_curves/
ell_number_field.py || echo FAULT AT i is $i
done

which succeeded all 100 times.

Nils Bruin

unread,

Nov 14, 2012, 6:42:23 PM11/14/12

to sage-devel

However, in an effort to make memory errors during testing a little
more reproducible I made this little edit to local/bin/sagedoctest.py
to ensure the garbage collector is run before every doctested line:

--------------------------------------------------------------------
diff --git a/sagedoctest.py b/sagedoctest.py
--- a/sagedoctest.py
+++ b/sagedoctest.py
@@ -1,7 +1,9 @@
from __future__ import with_statement

import ncadoctest
+import gc
import sage.misc.randstate as randstate
+import sys

OrigDocTestRunner = ncadoctest.DocTestRunner
class SageDocTestRunner(OrigDocTestRunner):
@@ -35,6 +37,8 @@ class SageDocTestRunner(OrigDocTestRunne
except Exception, e:
self._timeit_stats[key] = e
# otherwise, just run the example
+ sys.stderr.write('testing example %s\n'%example)
+ gc.collect()
OrigDocTestRunner.run_one_example(self, test, example,
filename, compileflags)

def save_timeit_stats_to_file_named(self, output_filename):
--------------------------------------------------------------------

(i.e., just add a gc.collect() to run_one_example)

and it causes a reliable failure in crypto/mq/mpolynomialsystem.py:

Trying:
C[Integer(0)].groebner_basis()###line 84:_sage_ sage:
C[0].groebner_basis()
Expecting:
Polynomial Sequence with 26 Polynomials in 16 Variables
testing example <ncadoctest.Example instance at 0x69706c8>
ok
Trying:
A,v = mq.MPolynomialSystem(r2).coefficient_matrix()###line
87:_sage_ sage: A,v = mq.MPolynomialSystem(r2).coefficient_matrix()
Expecting nothing
testing example <ncadoctest.Example instance at 0x6970710>
*** glibc detected *** python: double free or corruption (out):
0x00000000075c58c0 ***
======= Backtrace: =========
/lib64/libc.so.6[0x31cfe7da76]
/lib64/libc.so.6[0x31cfe7ed5e]
/usr/local/sage/5.5b2/local/lib/python/site-packages/sage/rings/
polynomial/pbori.so(+0x880aa)[0x7fa5eba7e0aa]
/usr/local/sage/5.5b2/local/lib/python/site-packages/sage/rings/
polynomial/pbori.so(+0x1d993)[0x7fa5eba13993]
...

Running it under sage -t --gdb gives:

(gdb) bt
#0 0x00000031cfe36285 in raise () from /lib64/libc.so.6
#1 0x00000031cfe37b9b in abort () from /lib64/libc.so.6
#2 0x00000031cfe7774e in __libc_message () from /lib64/libc.so.6
#3 0x00000031cfe7da76 in malloc_printerr () from /lib64/libc.so.6
#4 0x00000031cfe7ed5e in _int_free () from /lib64/libc.so.6
#5 0x00007fffce5cb0aa in
Delete<polybori::groebner::ReductionStrategy> (mem=0x547db30)
at /usr/local/sage/5.5b2/local/include/csage/ccobject.h:77
#6
__pyx_pf_4sage_5rings_10polynomial_5pbori_17ReductionStrategy_2__dealloc__
(__pyx_v_self=<optimized out>)
at sage/rings/polynomial/pbori.cpp:37868
#7
__pyx_pw_4sage_5rings_10polynomial_5pbori_17ReductionStrategy_3__dealloc__
(__pyx_v_self=0x54bf390)
at sage/rings/polynomial/pbori.cpp:37834
#8
__pyx_tp_dealloc_4sage_5rings_10polynomial_5pbori_ReductionStrategy
(o=0x54bf390) at sage/rings/polynomial/pbori.cpp:52283
#9 0x00007fffce560993 in
__pyx_tp_clear_4sage_5rings_10polynomial_5pbori_GroebnerStrategy
(o=0x54baeb0)
at sage/rings/polynomial/pbori.cpp:52545
#10 0x00007ffff7d4b637 in delete_garbage (old=0x7ffff7fe19e0,
collectable=0x7fffffffbb60) at Modules/gcmodule.c:769
#11 collect (generation=2) at Modules/gcmodule.c:930
#12 0x00007ffff7d4bdc9 in gc_collect (self=<optimized out>,
args=<optimized out>, kws=<optimized out>) at Modules/gcmodule.c:1067

which should give a pretty good pointer for pbori people to figure out
which memory deallocation is actually botched.

Nils Bruin

unread,

Nov 14, 2012, 7:15:34 PM11/14/12

to sage-devel

<polybori problem>:
This is actually reproducible in plain 5.0. This is now

http://trac.sagemath.org/sage_trac/ticket/13710

Nils Bruin

unread,

Nov 14, 2012, 7:22:24 PM11/14/12

to sage-devel

Other consequences from gc.collect() insertions:

sage -t -force_lib devel/sage/sage/crypto/mq/mpolynomialsystem.py #
Killed/crashed
sage -t -force_lib devel/sage/sage/rings/polynomial/
multi_polynomial_sequence.py # Killed/crashed

(same problem; reported as above)

**********************************************************************
File "/usr/local/sage/5.5b2/devel/sage/sage/modular/abvar/
abvar_ambient_jacobian.py", line 345:
sage: J0(33).decomposition(simple=False)
Expected:
[
Abelian subvariety of dimension 2 of J0(33),
Simple abelian subvariety 33a(None,33) of dimension 1 of J0(33)
]
Got:
[
Abelian subvariety of dimension 2 of J0(33),
Abelian subvariety of dimension 1 of J0(33)
]
**********************************************************************

sage -t -force_lib devel/sage/sage/modular/abvar/
abvar_ambient_jacobian.py # 1 doctests failed

(i.e., doctest is relying on a previous copy of 33a remaining in
memory on which additional computations have changed the way it
prints. That's a violation of immutability anyway and the doctest
shouldn't rely on such behaviour)

**********************************************************************
File "/usr/local/sage/5.5b2/devel/sage/sage/modular/abvar/abvar.py",
line 2840:
sage: J0(33).is_simple(none_if_not_known=True)
Expected:
False
Got nothing
**********************************************************************
sage -t -force_lib devel/sage/sage/modular/abvar/abvar.py # 1
doctests failed

Same problem! Since J0(33) is freshly constructed, one should not rely
on anything being cached on it and the test explicitly asks to not
compute anything.

Jean-Pierre Flori

unread,

Nov 14, 2012, 9:58:45 PM11/14/12

to sage-...@googlegroups.com

We dealt with something very similar in one of the "memleaks" tickets.
Not sure it was 715 or 11521, but maybe 12313 (the figures here might be wrong...).
So the fix is potentially not included in 5.5.beta2 if it was in the later.

Jean-Pierre Flori

unread,

Nov 14, 2012, 10:00:15 PM11/14/12

to sage-...@googlegroups.com

Ok, I took the time to check and you actually posted in 13710 that the fix is included in 12313, so not in 5.5.beta2 if I'm not wrong (nor 5.0 of course).

Jeroen Demeyer

unread,

Nov 16, 2012, 2:59:02 AM11/16/12

to sage-...@googlegroups.com

On 2012-11-14 23:34, Nils Bruin wrote:
> On Nov 14, 11:28 am, Jeroen Demeyer <jdeme...@cage.ugent.be> wrote:
>> FYI: it's a Fedora 16 system with an Intel(R) Pentium(R) 4 CPU 3.60GHz
>> processor running Linux 3.3.7-1.fc16.x86_64.
>
> That sounded convenient because my desktop is similar:
>
> Fedora 16 running 3.6.5-2.fc16.x86_64 #1 SMP on Intel(R) Core(TM)
> i7-2600 CPU @ 3.40GHz

Could you try again with sage-5.5.beta1?

Nils Bruin

unread,

Nov 16, 2012, 1:35:52 PM11/16/12

to sage-devel

On Nov 15, 11:59 pm, Jeroen Demeyer <jdeme...@cage.ugent.be> wrote:
> Could you try again with sage-5.5.beta1?

Same behaviour. Was there a reason to expect differently?
I guess something is different on sextus. Bad memory/other hardware
problems?

I was surprised by how little issues arose from inserting garbage
collections between all doctests. That should upset the memory usage
patterns so much that I would expect it to shake out many problems.
Only things like singular's omalloc would be immune, because it hides
alloc/dealloc operations from the OS. You really need to wait for an
actual corruption to see a problem. The guarded malloc experiment on
OSX and similar operations took care of that. See

http://trac.sagemath.org/sage_trac/ticket/13447

for a dirty singular package that switches out omalloc for a system
malloc, which then allows normal OS tools to check memory allocation/
access/deallocation. See also the ticket for notes on how the approach
taken there can be adapted to let Singular use the system malloc under
linux (one singular malloc routine needs to know the size of an
allocated block, which is a non-POSIX malloc feature that both OSX and
linux support in different ways).

Do we have other memory managers in sage that play tricks like
omalloc? Things run a lot slower when you switch back to system malloc
for these, but it does enable conventional memory sanitation tests.

Valgrind produces way too much warnings to be useful. All you want is
a segfault on any access-after-dealloc or double-dealloc (out-of-
bounds access would be nice too). OSX's libgmalloc is perfect for
that. Is there a linux equivalent (or a way to configure valgrind to
do just this)?

I pose it as a challenge that no-one is able to do a comprehensive
testing of memory alloc/dealloc in sage. Even though I outline the
exact approach above that would make it a relatively straightforward
process to go through, no-one has the stamina and heroic hacker skills
to pull it off. Prove me wrong!

Jeroen Demeyer

unread,

Nov 17, 2012, 4:01:22 AM11/17/12

to sage-...@googlegroups.com

On 2012-11-16 19:35, Nils Bruin wrote:
> On Nov 15, 11:59 pm, Jeroen Demeyer <jdeme...@cage.ugent.be> wrote:
>> Could you try again with sage-5.5.beta1?
>
> Same behaviour. Was there a reason to expect differently?

After adding every single ticket, there is reason to expect differently.
This stuff is *so sensitive* to changes, even changes which look
completely unrelated.

For example, on first sight, the errors are gone again in sage-5.5.beta2.

Nils Bruin

unread,

Nov 17, 2012, 2:01:19 PM11/17/12

to sage-devel

On Nov 17, 1:01 am, Jeroen Demeyer <jdeme...@cage.ugent.be> wrote:
> On 2012-11-16 19:35, Nils Bruin wrote:> On Nov 15, 11:59 pm, Jeroen Demeyer <jdeme...@cage.ugent.be> wrote:
> After adding every single ticket, there is reason to expect differently.
> This stuff is *so sensitive* to changes, even changes which look
> completely unrelated.

That's why the effort to do strict checking on memory management
should help (and it was in that light that I interpreted your
request). I think the sensitivity comes from the fact that you have to
wait for the coincidence that a freed-too-early location gets reused
and *then* written in its own role (i.e., actual corruption).

gc.collect() all the time should make deletions a little more
predictable and a very strict malloc/free should detect the problem
sooner. I'm afraid that MALLOC_CHECK_ isn't as good as BSD's gmalloc,
where even an access-after-free is a segfault (and many out-of-bound
accesses too).

Once one gets a little better in writing valgrind suppressions it's
easy to let valgrind produce less irrelevant output, so perhaps
there's a future for that. Or perhaps a tool to query and sort
valgrind reports after the fact (basically filter after the fact).
Perhaps it's time for William to hire someone again who is really good
at this stuff, because mathematically it's utterly uninteresting work
(and it really is finding and cleaning other people's mess)

Ivan Andrus

unread,

Nov 17, 2012, 3:20:01 PM11/17/12

to sage-...@googlegroups.com

At one point I had the goal of creating a suppressions file so that the doctests passed "cleanly". I'm sure some of the suppressions were actual problems, but it would at least allow you to find new problems. I still have the scripts that I used to collect and remove duplicate suppressions. I would be happy to run them again if people thought it would be useful. Sadly my machine isn't the fastest, so it takes quite a while (running all the doctests under valgrind is _slow_). I never did make it all the way through the test suite. But especially if I knew the likely areas it wouldn't be too hard to run some overnight and see what turns up.

-Ivan

Nils Bruin

unread,

Nov 17, 2012, 3:56:14 PM11/17/12

to sage-devel

On Nov 17, 12:20 pm, Ivan Andrus <darthand...@gmail.com> wrote:

> At one point I had the goal of creating a suppressions file so that the doctests passed "cleanly". I'm sure some of the suppressions were actual problems, but it would at least allow you to find new problems. I still have the scripts that I used to collect and remove duplicate suppressions. I would be happy to run them again if people thought it would be useful. Sadly my machine isn't the fastest, so it takes quite a while (running all the doctests under valgrind is _slow_). I never did make it all the way through the test suite. But especially if I knew the likely areas it wouldn't be too hard to run some overnight and see what turns up.

Anything that has to do with libsingular. The problem is that OTHER
tests may well exercise this code much better than libsingular's own
doctests.

However, with an unmodified libsingular it's unlikely you'll find
anything. omalloc allocates pages of system memory and then manages
pieces of it by itself. So as far as valgrind is concerned, there is
relatively little allocation/deallocation activity. I think you can go
further and tell valgrind about the functioning of alternative memory
managers. That would improve diagnostics a little. But if the compact
memory layout of omalloc (the compactness is its purpose) isn't
changed, you still have a good chance that an access-after-free refers
to perfectly valid memory (a block that now has been reallocated for a
different purpose)

This is the issue I'm trying to address with malloc-version of
singular. Combined with a malloc implementation that puts blocks on
separate pages, on the edge of the page, unmaps any page upon
deallocation, and tries to avoid reusing or using adjacent logical
pages means that any illegal access is almost sure to segfault. BSD's
gmalloc does that. It seems glibc's malloc with MALLOC_CHECK_=2 or 3
does at least a bit of that.

The real problem here is that we (Simon, Volker or I) don't know for
sure what the refcount and deletion protocols are for Singular
objects. It seems to be the kind of thing that is folklore inside the
Singular group but was never properly documented. Singular was not
designed to be a clean library, but it does seem to be a direction
Singular is heading, so perhaps this might sometime get documented
properly. I just think Sage can't wait for the decade or so that this
is probably going to take.

Ivan Andrus

unread,

Nov 17, 2012, 4:42:07 PM11/17/12

to sage-...@googlegroups.com

Thanks for the explanation. That makes sense. It sounds like there's not much valgrind will help with, but I'll give it a go anyway.

-Ivan

Jeroen Demeyer

unread,

Dec 19, 2012, 5:16:50 AM12/19/12

to sage-...@googlegroups.com

Just when I thought the #715 + #11521 issues were fixed in sage-5.5.rc1...

Apparently, sage-5.6.beta0 has uncovered a new problem: with the current
sage-5.6.beta0, I get the following reproducible segfault on hawk
(OpenSolaris i386):

> sage -t --long -force_lib devel/sage/sage/modules/module.pyx
> The doctested process was killed by signal 11
> [24.3 s]

Removing #8992 (the only ticket in sage-5.6.beta0 which seems remotely
related) doesn't help. Reverting #715, #11521, #13746 does fix it. Now I
don't know how to proceed, I am tempted to revert these tickets in the
sage-5.5 release.

Jeroen.

Jean-Pierre Flori

unread,

Dec 19, 2012, 9:25:47 AM12/19/12

to sage-...@googlegroups.com

On Wednesday, December 19, 2012 11:16:50 AM UTC+1, Jeroen Demeyer wrote:

Just when I thought the #715 + #11521 issues were fixed in sage-5.5.rc1...

Apparently, sage-5.6.beta0 has uncovered a new problem: with the current
sage-5.6.beta0, I get the following reproducible segfault on hawk
(OpenSolaris i386):

> sage -t --long -force_lib devel/sage/sage/modules/module.pyx
> The doctested process was killed by signal 11
> [24.3 s]

More details are available somewhere?
A gdb backtrace?

Jeroen Demeyer

unread,

Dec 20, 2012, 12:11:32 PM12/20/12

to sage-...@googlegroups.com

> A gdb backtrace?

buildbot@hawk:~/sage-5.6.beta0$ ./sage -t --long --gdb
"devel/sage/sage/modules/module.pyx"
sage -t --long --gdb "devel/sage/sage/modules/module.pyx"
********************************************************************************
Type r at the (gdb) prompt to run the doctests.
Type bt if there is a crash to see a traceback.
********************************************************************************
GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i386-pc-solaris2.11"...
(gdb) r
Starting program: /export/home/buildbot/sage-5.6.beta0/local/bin/python
/export/home/buildbot/.sage//tmp/module_3741.py
warning: Lowest section in /lib/libdl.so.1 is .dynamic at 00000074
warning: Lowest section in /lib/libintl.so.1 is .dynamic at 00000074
warning: Lowest section in /lib/libpthread.so.1 is .dynamic at 00000074

Program received signal SIGSEGV, Segmentation fault.
PyObject_Malloc (nbytes=127) at Objects/obmalloc.c:788
788 Objects/obmalloc.c: No such file or directory.
in Objects/obmalloc.c
(gdb) bt
#0 PyObject_Malloc (nbytes=127) at Objects/obmalloc.c:788
#1 0xfee8919d in PyString_FromStringAndSize (str=0x0, size=106) at
Objects/stringobject.c:88
#2 0xfeeeeff8 in r_object (p=<value optimized out>) at Python/marshal.c:803
#3 0xfeeeee5b in r_object (p=0x80419f0) at Python/marshal.c:880
#4 0xfeeef0f9 in r_object (p=<value optimized out>) at
Python/marshal.c:1013
#5 0xfeeeee5b in r_object (p=0x80419f0) at Python/marshal.c:880
#6 0xfeeef0f9 in r_object (p=<value optimized out>) at
Python/marshal.c:1013
#7 0xfeeeee5b in r_object (p=0x80419f0) at Python/marshal.c:880
#8 0xfeeef0f9 in r_object (p=<value optimized out>) at
Python/marshal.c:1013
#9 0xfeeefb33 in PyMarshal_ReadObjectFromString (str=0xc11f7a0 "c",
len=46156) at Python/marshal.c:1181
#10 0xfeeefc77 in PyMarshal_ReadLastObjectFromFile (fp=0xfeda6838) at
Python/marshal.c:1142
#11 0xfeeebc90 in load_source_module (name=<value optimized out>,
pathname=<value optimized out>, fp=0xfeda6828) at Python/import.c:773
#12 0xfeeec9e8 in import_submodule (mod=0xb9fcbe4, subname=<value
optimized out>, fullname=0x80424bb "twisted.python.util")
at Python/import.c:2595
#13 0xfeeecf24 in ensure_fromlist (mod=<value optimized out>,
fromlist=<value optimized out>, buf=0x80424bb "twisted.python.util",
buflen=14, recursive=0) at Python/import.c:2506
#14 0xfeeed486 in import_module_level (name=0x0, globals=<value
optimized out>, locals=0xc2d402c, fromlist=0xb6dafcc, level=-1)
at Python/import.c:2174
#15 0xfeeed703 in PyImport_ImportModuleLevel (name=0xba19b24
"twisted.python", globals=0xc2d402c, locals=0xc2d402c, fromlist=0xb6dafcc,
level=-1) at Python/import.c:2188
#16 0xfeed25ea in builtin___import__ (self=0x0, args=0xc2d939c,
kwds=0x0) at Python/bltinmodule.c:49
#17 0xfee7d818 in PyCFunction_Call (func=0x807e5ec, arg=0xc2d939c,
kw=0xbcd6000) at Objects/methodobject.c:85
#18 0xfee3ad58 in PyObject_Call (func=0x807e5ec, arg=0xc2d939c, kw=0x0)
at Objects/abstract.c:2529
#19 0xfeed2b0e in PyEval_CallObjectWithKeywords (func=0x807e5ec,
arg=0xc2d939c, kw=0x0) at Python/ceval.c:3890
#20 0xfeed593c in PyEval_EvalFrameEx (f=0x9f88e54, throwflag=0) at
Python/ceval.c:2333
#21 0xfeed9d03 in PyEval_EvalCodeEx (co=0xc359ba8, globals=0xc2d402c,
locals=0xc2d402c, args=0x0, argcount=0, kws=0x0, kwcount=0,
defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3253
#22 0xfeed9d53 in PyEval_EvalCode (co=0xc359ba8, globals=0xc2d402c,
locals=0xc2d402c) at Python/ceval.c:667
#23 0xfeee95f3 in PyImport_ExecCodeModuleEx (name=0x804357b
"twisted.python.log", co=0xc359ba8,
pathname=0x8042beb
"/export/home/buildbot/sage-5.6.beta0/local/lib/python2.7/site-packages/Twisted-12.1.0-py2.7-solaris-2.11-i86pc.32bit.egg/twisted/python/log.pyc")
at Python/import.c:681
#24 0xfeeebd06 in load_source_module (name=<value optimized out>,
pathname=<value optimized out>, fp=0xfeda6818) at Python/import.c:1018
#25 0xfeeec9e8 in import_submodule (mod=0xb9fcbe4, subname=<value
optimized out>, fullname=0x804357b "twisted.python.log")
at Python/import.c:2595
#26 0xfeeecf24 in ensure_fromlist (mod=<value optimized out>,
fromlist=<value optimized out>, buf=0x804357b "twisted.python.log",
buflen=14, recursive=0) at Python/import.c:2506
#27 0xfeeed486 in import_module_level (name=0x0, globals=<value
optimized out>, locals=0xc071b54, fromlist=0xbfce6cc, level=-1)
at Python/import.c:2174
#28 0xfeeed703 in PyImport_ImportModuleLevel (name=0xba19b24
"twisted.python", globals=0xc071b54, locals=0xc071b54, fromlist=0xbfce6cc,
level=-1) at Python/import.c:2188
#29 0xfeed25ea in builtin___import__ (self=0x0, args=0xb68734c,
kwds=0x0) at Python/bltinmodule.c:49
#30 0xfee7d818 in PyCFunction_Call (func=0x807e5ec, arg=0xb68734c,
kw=0xbcd6000) at Objects/methodobject.c:85
#31 0xfee3ad58 in PyObject_Call (func=0x807e5ec, arg=0xb68734c, kw=0x0)
at Objects/abstract.c:2529
#32 0xfeed2b0e in PyEval_CallObjectWithKeywords (func=0x807e5ec,
arg=0xb68734c, kw=0x0) at Python/ceval.c:3890
#33 0xfeed593c in PyEval_EvalFrameEx (f=0xc3bfa54, throwflag=0) at
Python/ceval.c:2333
#34 0xfeed9d03 in PyEval_EvalCodeEx (co=0xc06d2a8, globals=0xc071b54,
locals=0xc071b54, args=0x0, argcount=0, kws=0x0, kwcount=0,
defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3253
#35 0xfeed9d53 in PyEval_EvalCode (co=0xc06d2a8, globals=0xc071b54,
locals=0xc071b54) at Python/ceval.c:667
#36 0xfeee95f3 in PyImport_ExecCodeModuleEx (name=0x804463b
"twisted.internet.pollreactor", co=0xc06d2a8,
pathname=0x8043cab
"/export/home/buildbot/sage-5.6.beta0/local/lib/python2.7/site-packages/Twisted-12.1.0-py2.7-solaris-2.11-i86pc.32bit.egg/twisted/internet/pollreactor.pyc")
at Python/import.c:681
#37 0xfeeebd06 in load_source_module (name=<value optimized out>,
pathname=<value optimized out>, fp=0xfeda6808) at Python/import.c:1018
#38 0xfeeec9e8 in import_submodule (mod=0xbfd7314, subname=<value
optimized out>, fullname=0x804463b "twisted.internet.pollreactor")
at Python/import.c:2595
#39 0xfeeecc8d in load_next (mod=<value optimized out>, altmod=<value
optimized out>, p_name=0x804462c,
buf=0x804463b "twisted.internet.pollreactor", p_buflen=0x8044a3c) at
Python/import.c:2415
#40 0xfeeed218 in import_module_level (name=0x0, globals=<value
optimized out>, locals=0xfef78608, fromlist=0xbfce7cc, level=-1)
at Python/import.c:2144
#41 0xfeeed703 in PyImport_ImportModuleLevel (name=0xc008494
"twisted.internet.pollreactor", globals=0xc0718ac, locals=0xfef78608,
fromlist=0xbfce7cc, level=-1) at Python/import.c:2188
#42 0xfeed25ea in builtin___import__ (self=0x0, args=0xb677bbc,
kwds=0x0) at Python/bltinmodule.c:49
#43 0xfee7d818 in PyCFunction_Call (func=0x807e5ec, arg=0xb677bbc,
kw=0xbcd6000) at Objects/methodobject.c:85
#44 0xfee3ad58 in PyObject_Call (func=0x807e5ec, arg=0xb677bbc, kw=0x0)
at Objects/abstract.c:2529
#45 0xfeed2b0e in PyEval_CallObjectWithKeywords (func=0x807e5ec,
arg=0xb677bbc, kw=0x0) at Python/ceval.c:3890
#46 0xfeed593c in PyEval_EvalFrameEx (f=0xc0602dc, throwflag=0) at
Python/ceval.c:2333
#47 0xfeed8c15 in PyEval_EvalFrameEx (f=0xc0ba6d4, throwflag=0) at
Python/ceval.c:4107
#48 0xfeed9d03 in PyEval_EvalCodeEx (co=0xbe97770, globals=0xc0718ac,
locals=0xc0718ac, args=0x0, argcount=0, kws=0x0, kwcount=0,
defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3253
#49 0xfeed9d53 in PyEval_EvalCode (co=0xbe97770, globals=0xc0718ac,
locals=0xc0718ac) at Python/ceval.c:667
#50 0xfeee95f3 in PyImport_ExecCodeModuleEx (name=0x80457bb
"twisted.internet.default", co=0xbe97770,
pathname=0x8044e2b
"/export/home/buildbot/sage-5.6.beta0/local/lib/python2.7/site-packages/Twisted-12.1.0-py2.7-solaris-2.11-i86pc.32bit.egg/twisted/internet/default.pyc")
at Python/import.c:681
#51 0xfeeebd06 in load_source_module (name=<value optimized out>,
pathname=<value optimized out>, fp=0xfeda67f8) at Python/import.c:1018
#52 0xfeeec9e8 in import_submodule (mod=0xbfd7314, subname=<value
optimized out>, fullname=0x80457bb "twisted.internet.default")
at Python/import.c:2595
#53 0xfeeecf24 in ensure_fromlist (mod=<value optimized out>,
fromlist=<value optimized out>, buf=0x80457bb "twisted.internet.default",
buflen=16, recursive=0) at Python/import.c:2506
#54 0xfeeed486 in import_module_level (name=0x0, globals=<value
optimized out>, locals=0xc071a44, fromlist=0xbfce56c, level=-1)
at Python/import.c:2174
#55 0xfeeed703 in PyImport_ImportModuleLevel (name=0x8165444
"twisted.internet", globals=0xc071a44, locals=0xc071a44,
fromlist=0xbfce56c, level=-1) at Python/import.c:2188
#56 0xfeed25ea in builtin___import__ (self=0x0, args=0xbfefacc,
kwds=0x0) at Python/bltinmodule.c:49
#57 0xfee7d818 in PyCFunction_Call (func=0x807e5ec, arg=0xbfefacc,
kw=0xbcd6000) at Objects/methodobject.c:85
#58 0xfee3ad58 in PyObject_Call (func=0x807e5ec, arg=0xbfefacc, kw=0x0)
at Objects/abstract.c:2529
#59 0xfeed2b0e in PyEval_CallObjectWithKeywords (func=0x807e5ec,
arg=0xbfefacc, kw=0x0) at Python/ceval.c:3890
#60 0xfeed593c in PyEval_EvalFrameEx (f=0xc3163dc, throwflag=0) at
Pyt#61 0xfeed9d03 in PyEval_EvalCodeEx (co=0xc000410, globals=0xc071a44,
locals=0xc071a44, args=0x0, argcount=0, kws=0x0, kwcount=0,
defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3253
#62 0xfeed9d53 in PyEval_EvalCode (co=0xc000410, globals=0xc071a44,
locals=0xc071a44) at Python/ceval.c:667
#63 0xfeee95f3 in PyImport_ExecCodeModuleEx (name=0x804687b
"twisted.internet.reactor", co=0xc000410,
pathname=0x8045eeb
"/export/home/buildbot/sage-5.6.beta0/local/lib/python2.7/site-packages/Twisted-12.1.0-py2.7-solaris-2.11-i86pc.32bit.egg/twisted/internet/reactor.pyc")
at Python/import.c:681
#64 0xfeeebd06 in load_source_module (name=<value optimized out>,
pathname=<value optimized out>, fp=0xfeda67e8) at Python/import.c:1018
#65 0xfeeec9e8 in import_submodule (mod=0xbfd7314, subname=<value
optimized out>, fullname=0x804687b "twisted.internet.reactor")
at Python/import.c:2595
#66 0xfeeecf24 in ensure_fromlist (mod=<value optimized out>,
fromlist=<value optimized out>, buf=0x804687b "twisted.internet.reactor",
buflen=16, recursive=0) at Python/import.c:2506
#67 0xfeeed486 in import_module_level (name=0x0, globals=<value
optimized out>, locals=0xfef78608, fromlist=0x81738ec, level=-1)
at Python/import.c:2174
#68 0xfeeed703 in PyImport_ImportModuleLevel (name=0x8165444
"twisted.internet", globals=0x816668c, locals=0xfef78608,
fromlist=0x81738ec, level=-1) at Python/import.c:2188
#69 0xfeed25ea in builtin___import__ (self=0x0, args=0xc27fbbc,
kwds=0x0) at Python/bltinmodule.c:49
#70 0xfee7d818 in PyCFunction_Call (func=0x807e5ec, arg=0xc27fbbc,
kw=0xbcd6000) at Objects/methodobject.c:85
#71 0xfee3ad58 in PyObject_Call (func=0x807e5ec, arg=0xc27fbbc, kw=0x0)
at Objects/abstract.c:2529
#72 0xfeed2b0e in PyEval_CallObjectWithKeywords (func=0x807e5ec,
arg=0xc27fbbc, kw=0x0) at Python/ceval.c:3890
#73 0xfeed593c in PyEval_EvalFrameEx (f=0xc053f04, throwflag=0) at
Python/ceval.c:2333
#74 0xfeed9d03 in PyEval_EvalCodeEx (co=0x81683c8, globals=0x816668c,
locals=0x0, args=0x81a9358, argcount=0, kws=0x81a9358, kwcount=1,
defs=0xbff1838, defcount=1, closure=0x0) at Python/ceval.c:3253
#75 0xfeed7b25 in PyEval_EvalFrameEx (f=0x81a921c, throwflag=0) at
Python/ceval.c:4117
#76 0xfeed9d03 in PyEval_EvalCodeEx (co=0x81682a8, globals=0x809235c,
locals=0x809235c, args=0x0, argcount=0, kws=0x0, kwcount=0,
defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3253
#77 0xfeed9d53 in PyEval_EvalCode (co=0x81682a8, globals=0x809235c,
locals=0x809235c) at Python/ceval.c:667
#78 0xfeef697e in PyRun_FileExFlags (fp=0xfeda67e8, filename=0x8047466
"/export/home/buildbot/.sage//tmp/module_3741.py", start=257,
globals=0x809235c, locals=0x809235c, closeit=1, flags=0x804721c) at
Python/pythonrun.c:1353
#79 0xfeef6b58 in PyRun_SimpleFileExFlags (fp=0xfeda67e8,
filename=0x8047466 "/export/home/buildbot/.sage//tmp/module_3741.py",
closeit=1, flags=0x804721c) at Python/pythonrun.c:943
#80 0xfeef6f21 in PyRun_AnyFileExFlags (fp=0xfeda67e8,
filename=0x8047466 "/export/home/buildbot/.sage//tmp/module_3741.py",
closeit=1,
flags=0x804721c) at Python/pythonrun.c:747
#81 0xfef0ae5f in Py_Main (argc=2, argv=0x8047294) at Modules/main.c:639
#82 0x08050d10 in main (argc=2, argv=0x8047294) at Modules/python.c:23
hon/ceval.c:2333

David Kirkby

unread,

Dec 20, 2012, 6:31:51 PM12/20/12

to sage-...@googlegroups.com

On 3 November 2012 19:58, Nils Bruin <nbr...@sfu.ca> wrote:
> Presently, Sage has a significant memory leak issue: Uniqueness of
> parents is currently guaranteed by keeping them in memory
> permanently.

I've compiled Sage on Solaris with Sun libraries which replace
malloc/free with versions which check for memory leaks. Sage has
leaked memory before the

sage:

prompt has appeared. But from what I gather, one of the culprits does
it own memory management, which would not be detected by those
libraries.

Simon King

unread,

Dec 21, 2012, 1:09:07 AM12/21/12

to sage-...@googlegroups.com

Hi David,

On 2012-12-20, David Kirkby <david....@onetel.net> wrote:
> I've compiled Sage on Solaris with Sun libraries which replace
> malloc/free with versions which check for memory leaks. Sage has
> leaked memory before the
>
> sage:
>
> prompt has appeared. But from what I gather, one of the culprits does
> it own memory management, which would not be detected by those
> libraries.

That sounds like libsingular.

Can you try to install the singular spkg from #13731, after doing
export SINGULAR_XALLOC=yes
followed by sage -b, please?

This will result in (lib)Singular being built with xalloc (a thin
compatibility layer on top of malloc) replacing Singular's usual memory
manager omalloc.

Moreover, the spkg backports a couple of upstream fixes for out-of-bound
errors detected by Sage's doctests run with
export MALLOC_CHECK_=3
using Singular with xalloc.

Best regards,
Simon

Jeroen Demeyer

unread,

Dec 21, 2012, 2:40:25 AM12/21/12

to sage-...@googlegroups.com

On 2012-12-21 07:09, Simon King wrote:
> Can you try to install the singular spkg from #13731, after doing
> export SINGULAR_XALLOC=yes
> followed by sage -b, please?

Then singular fails to install:

### Singular spkg-install: build_singular ###
make PIPE= install-nolns in omalloc
make[1]: Entering directory
`/export/home/buildbot/sage-5.6.beta0/spkg/build/singular-3-1-5.p2/src/omalloc'
gcc -O2 -g -fPIC -I/export/home/buildbot/sage-5.6.beta0/local/include
-c omFindExec.c
rm -f libomalloc.a
ar cr libomalloc.a omFindExec.o
ranlib libomalloc.a
install omalloc.h /export/home/buildbot/sage-5.6.beta0/local/include/
make[1]: install: Command not found
make[1]: *** [install] Error 127
make[1]: Leaving directory
`/export/home/buildbot/sage-5.6.beta0/spkg/build/singular-3-1-5.p2/src/omalloc'
make: *** [install-nolns] Error 1
Unable to build and install Singular
Error building Singular (error in build_singular).

Dima Pasechnik

unread,

Dec 21, 2012, 2:46:24 AM12/21/12

to sage-...@googlegroups.com

On 2012-12-21, Jeroen Demeyer <jdem...@cage.ugent.be> wrote:
> On 2012-12-21 07:09, Simon King wrote:
>> Can you try to install the singular spkg from #13731, after doing
>> export SINGULAR_XALLOC=yes
>> followed by sage -b, please?
>
> Then singular fails to install:
>
> ### Singular spkg-install: build_singular ###
> make PIPE= install-nolns in omalloc
> make[1]: Entering directory
> `/export/home/buildbot/sage-5.6.beta0/spkg/build/singular-3-1-5.p2/src/omalloc'
> gcc -O2 -g -fPIC -I/export/home/buildbot/sage-5.6.beta0/local/include
> -c omFindExec.c
> rm -f libomalloc.a
> ar cr libomalloc.a omFindExec.o
> ranlib libomalloc.a
> install omalloc.h /export/home/buildbot/sage-5.6.beta0/local/include/
> make[1]: install: Command not found

it is
/usr/sbin/install
on Solaris...
perhaps, no /usr/sbin in the PATH?

Nils Bruin

unread,

Dec 21, 2012, 4:05:50 AM12/21/12

to sage-devel

On Dec 19, 12:16 am, Jeroen Demeyer <jdeme...@cage.ugent.be> wrote:
> Just when I thought the #715 + #11521 issues were fixed in sage-5.5.rc1...
>
> Apparently, sage-5.6.beta0 has uncovered a new problem: with the current
> sage-5.6.beta0, I get the following reproducible segfault on hawk
> (OpenSolaris i386):
>
> > sage -t --long -force_lib devel/sage/sage/modules/module.pyx
> > The doctested process was killed by signal 11
> > [24.3 s]

I tried this on linux too:
1) I built 5.6b0 and tried the test: success
2) I set MALLOC_CHECK_=3. BOOM. Similar error as reported
Now comes the odd part:
If I turn off MALLOC_CHECK_ it now ALSO goes BOOM.
If I run with --verbose tests complete without problem
If I run with valgrind it STILL Segfaults (and I do get a mildly
informative report about
python's obmalloc.c:788, i.e.,
if ((pool->freeblock = *(block **)bp) != NULL) {
doing a read from an unallocated address)

Note that the segfault happens somewhere deep inside Python's import
machinery. My guess is something got corrupted (and written to a pyc
file?) and now spoils the fun every time.

Anyway, perhaps someone can replicate that this test fails on linux
with MALLOC_CHECK_=3 as well. Possibly valgrinding finds a useful
report.

Jeroen Demeyer

unread,

Dec 21, 2012, 4:44:07 AM12/21/12

to sage-...@googlegroups.com

On 2012-12-21 10:05, Nils Bruin wrote:
> Anyway, perhaps someone can replicate that this test fails on linux
> with MALLOC_CHECK_=3 as well.

I don't manage to replicate it on sage.math (Ubuntu 8.04, x86_64). The
test always succeeds.

Jeroen Demeyer

unread,

Dec 21, 2012, 6:13:35 AM12/21/12

to sage-...@googlegroups.com

On 2012-12-21 07:09, Simon King wrote:

> Can you try to install the singular spkg from #13731, after doing
> export SINGULAR_XALLOC=yes
> followed by sage -b, please?

That doesn't change anything. I still get the segmentation fault as before.

Jean-Pierre Flori

unread,

Dec 21, 2012, 10:54:58 AM12/21/12

to sage-...@googlegroups.com

I get segfaults on Ubuntu 12.04.1 x86_64 even without MALLOC_CHECK_ but not every time.
The Valgrind output is not that informative, it dies with a SIGILL in visit_decref (gcmodule.c: 320) where it jumps to a very fishy address.

If I use verbose I could not reproduce it.

David Kirkby

unread,

Dec 21, 2012, 12:18:29 PM12/21/12

to sage-...@googlegroups.com

On 21 December 2012 06:09, Simon King <simon...@uni-jena.de> wrote:
> Hi David,
>
> On 2012-12-20, David Kirkby <david....@onetel.net> wrote:
>> I've compiled Sage on Solaris with Sun libraries which replace
>> malloc/free with versions which check for memory leaks. Sage has
>> leaked memory before the
>>
>> sage:
>>
>> prompt has appeared. But from what I gather, one of the culprits does
>> it own memory management, which would not be detected by those
>> libraries.
>
> That sounds like libsingular.
>
> Can you try to install the singular spkg from #13731, after doing
> export SINGULAR_XALLOC=yes
> followed by sage -b, please?

This will have to wait until tommorow. I need to download Sage, then
find out how I link the special libraries. I've done it before, but a
long time ago. Sage leaked like a sieve, but that was 2-3 years ago.

Dave

Volker Braun

unread,

Dec 24, 2012, 8:23:24 AM12/24/12

to sage-...@googlegroups.com

Does anybody have an explanation why this segfaults while importing Twisted? You are doctesting sage/modules/module.pyx, that shouldn't have anything to do with web servers or internet access.

Jean-Pierre Flori

unread,

Dec 24, 2012, 9:04:55 AM12/24/12

to sage-...@googlegroups.com

On Monday, December 24, 2012 2:23:24 PM UTC+1, Volker Braun wrote:

Does anybody have an explanation why this segfaults while importing Twisted? You are doctesting sage/modules/module.pyx, that shouldn't have anything to do with web servers or internet access.

I think that is because the segfault occurs while quitting Sage, so this involves all Python parts of Sage which have been used.
(Not sure is loaded at startup when doctesting, but you can find some specific code involving and importing Twisted in qui_sage() in sage/all.py.)

rjf

unread,

Dec 24, 2012, 10:20:37 AM12/24/12

to sage-...@googlegroups.com

Sometimes the best strategy for fixing a bug is to not look for it, but
to rewrite the program from scratch.

I don't know what exactly you are trying to do, but surely you are not
using ALL of Sage.

If your concern is to complete some computation, can you do so without
trying to debug Sage -- indeed trying to debug Sage on multiple
platforms?

RJF

Jean-Pierre Flori

unread,

Dec 24, 2012, 3:38:44 PM12/24/12

to sage-...@googlegroups.com

Just a hint to continue debug that (unless someone wants to rewrite ALL of Sage):
it might be helpful to rebuild python with --without-pymalloc to get more hindisghtful backtraces and valgrind output, just as suggested by the Python spkg.
You can automagically do this by exporting SAGE_VALGRIND=yes and reinstalling Python.

Jean-Pierre Flori

unread,

Dec 24, 2012, 4:18:18 PM12/24/12

to sage-...@googlegroups.com

Not sure we got these so clearly before, but using --without-pymalloc and Valgrind (hint: finish and review #13060) I get lots of

==28631== Invalid read of size 8
==28631==    at 0x10429E50: __pyx_tp_dealloc_4sage_9structure_15category_object_CategoryObject (category_object.c:8990)
==28631==    by 0x4ED96C5: subtype_dealloc (typeobject.c:1014)
==28631==    by 0x4EBA106: insertdict (dictobject.c:530)
==28631==    by 0x4EBCB51: PyDict_SetItem (dictobject.c:775)
==28631==    by 0x4EC2517: _PyObject_GenericSetAttrWithDict (object.c:1524)
==28631==    by 0x4EC1F5E: PyObject_SetAttr (object.c:1247)
==28631==    by 0x4F21600: PyEval_EvalFrameEx (ceval.c:2004)
==28631==    by 0x4F26587: PyEval_EvalCodeEx (ceval.c:3253)
==28631==    by 0x4EA8F65: function_call (funcobject.c:526)
==28631==    by 0x4E7DFED: PyObject_Call (abstract.c:2529)
==28631==    by 0x4F1F6A6: PyEval_CallObjectWithKeywords (ceval.c:3890)
==28631==    by 0x4F23D5A: PyEval_EvalFrameEx (ceval.c:1739)
==28631==    by 0x4F26587: PyEval_EvalCodeEx (ceval.c:3253)
==28631==    by 0x4F266C1: PyEval_EvalCode (ceval.c:667)
==28631==    by 0x4F24388: PyEval_EvalFrameEx (ceval.c:4718)
==28631==    by 0x4F26587: PyEval_EvalCodeEx (ceval.c:3253)
==28631==    by 0x4EA8F65: function_call (funcobject.c:526)
==28631==    by 0x4E7DFED: PyObject_Call (abstract.c:2529)
==28631==    by 0x4E8C46F: instancemethod_call (classobject.c:2578)
==28631==    by 0x4E7DFED: PyObject_Call (abstract.c:2529)
==28631==    by 0x4F21828: PyEval_EvalFrameEx (ceval.c:4239)
==28631==    by 0x4F26587: PyEval_EvalCodeEx (ceval.c:3253)
==28631==    by 0x4F2422C: PyEval_EvalFrameEx (ceval.c:4117)
==28631==    by 0x4F26587: PyEval_EvalCodeEx (ceval.c:3253)
==28631==    by 0x4EA8F65: function_call (funcobject.c:526)
==28631== Address 0xbd30390 is 48 bytes inside a block of size 256 free'd
==28631==    at 0x4C28B16: free (vg_replace_malloc.c:446)
==28631==    by 0x4ED96C5: subtype_dealloc (typeobject.c:1014)
==28631==    by 0x4F5F112: collect (gcmodule.c:770)
==28631==    by 0x4F5FB06: _PyObject_GC_Malloc (gcmodule.c:996)
==28631==    by 0x4F5FB3C: _PyObject_GC_New (gcmodule.c:1467)
==28631==    by 0x4E98B97: PyWrapper_New (descrobject.c:1068)
==28631==    by 0x4EC2258: _PyObject_GenericGetAttrWithDict (object.c:1434)
==28631==    by 0x10A6CD28: __pyx_pw_4sage_9structure_11coerce_dict_16TripleDictEraser_3__call__ (coerce_dict.c:1225)
==28631==    by 0x4E7DFED: PyObject_Call (abstract.c:2529)
==28631==    by 0x4E7EB2D: PyObject_CallFunctionObjArgs (abstract.c:2760)
==28631==    by 0x4EEA350: PyObject_ClearWeakRefs (weakrefobject.c:881)
==28631==    by 0x10429E4F: __pyx_tp_dealloc_4sage_9structure_15category_object_CategoryObject (category_object.c:8989)
==28631==    by 0x4ED96C5: subtype_dealloc (typeobject.c:1014)
==28631==    by 0x4EBA106: insertdict (dictobject.c:530)
==28631==    by 0x4EBCB51: PyDict_SetItem (dictobject.c:775)
==28631==    by 0x4EC2517: _PyObject_GenericSetAttrWithDict (object.c:1524)
==28631==    by 0x4EC1F5E: PyObject_SetAttr (object.c:1247)
==28631==    by 0x4F21600: PyEval_EvalFrameEx (ceval.c:2004)
==28631==    by 0x4F26587: PyEval_EvalCodeEx (ceval.c:3253)
==28631==    by 0x4EA8F65: function_call (funcobject.c:526)
==28631==    by 0x4E7DFED: PyObject_Call (abstract.c:2529)
==28631==    by 0x4F1F6A6: PyEval_CallObjectWithKeywords (ceval.c:3890)
==28631==    by 0x4F23D5A: PyEval_EvalFrameEx (ceval.c:1739)
==28631==    by 0x4F26587: PyEval_EvalCodeEx (ceval.c:3253)
==28631==    by 0x4F266C1: PyEval_EvalCode (ceval.c:667)

and

==28631== Invalid read of size 8
==28631==    at 0x4F5FC1E: PyObject_GC_Del (gcmodule.c:210)
==28631==    by 0x4ED96C5: subtype_dealloc (typeobject.c:1014)
==28631==    by 0x4EBA106: insertdict (dictobject.c:530)
==28631==    by 0x4EBCB51: PyDict_SetItem (dictobject.c:775)
==28631==    by 0x4EC2517: _PyObject_GenericSetAttrWithDict (object.c:1524)
==28631==    by 0x4EC1F5E: PyObject_SetAttr (object.c:1247)
==28631==    by 0x4F21600: PyEval_EvalFrameEx (ceval.c:2004)
==28631==    by 0x4F26587: PyEval_EvalCodeEx (ceval.c:3253)
==28631==    by 0x4EA8F65: function_call (funcobject.c:526)
==28631==    by 0x4E7DFED: PyObject_Call (abstract.c:2529)
==28631==    by 0x4F1F6A6: PyEval_CallObjectWithKeywords (ceval.c:3890)
==28631==    by 0x4F23D5A: PyEval_EvalFrameEx (ceval.c:1739)
==28631==    by 0x4F26587: PyEval_EvalCodeEx (ceval.c:3253)
==28631==    by 0x4F266C1: PyEval_EvalCode (ceval.c:667)
==28631==    by 0x4F24388: PyEval_EvalFrameEx (ceval.c:4718)
==28631==    by 0x4F26587: PyEval_EvalCodeEx (ceval.c:3253)
==28631==    by 0x4EA8F65: function_call (funcobject.c:526)
==28631==    by 0x4E7DFED: PyObject_Call (abstract.c:2529)
==28631==    by 0x4E8C46F: instancemethod_call (classobject.c:2578)
==28631==    by 0x4E7DFED: PyObject_Call (abstract.c:2529)
==28631==    by 0x4F21828: PyEval_EvalFrameEx (ceval.c:4239)
==28631==    by 0x4F26587: PyEval_EvalCodeEx (ceval.c:3253)
==28631==    by 0x4F2422C: PyEval_EvalFrameEx (ceval.c:4117)
==28631==    by 0x4F26587: PyEval_EvalCodeEx (ceval.c:3253)
==28631==    by 0x4EA8F65: function_call (funcobject.c:526)
==28631== Address 0xbd30360 is 0 bytes inside a block of size 256 free'd
==28631==    at 0x4C28B16: free (vg_replace_malloc.c:446)
==28631==    by 0x4ED96C5: subtype_dealloc (typeobject.c:1014)
==28631==    by 0x4F5F112: collect (gcmodule.c:770)
==28631==    by 0x4F5FB06: _PyObject_GC_Malloc (gcmodule.c:996)
==28631==    by 0x4F5FB3C: _PyObject_GC_New (gcmodule.c:1467)
==28631==    by 0x4E98B97: PyWrapper_New (descrobject.c:1068)
==28631==    by 0x4EC2258: _PyObject_GenericGetAttrWithDict (object.c:1434)
==28631==    by 0x10A6CD28: __pyx_pw_4sage_9structure_11coerce_dict_16TripleDictEraser_3__call__ (coerce_dict.c:1225)
==28631==    by 0x4E7DFED: PyObject_Call (abstract.c:2529)
==28631==    by 0x4E7EB2D: PyObject_CallFunctionObjArgs (abstract.c:2760)
==28631==    by 0x4EEA350: PyObject_ClearWeakRefs (weakrefobject.c:881)
==28631==    by 0x10429E4F: __pyx_tp_dealloc_4sage_9structure_15category_object_CategoryObject (category_object.c:8989)
==28631==    by 0x4ED96C5: subtype_dealloc (typeobject.c:1014)
==28631==    by 0x4EBA106: insertdict (dictobject.c:530)
==28631==    by 0x4EBCB51: PyDict_SetItem (dictobject.c:775)
==28631==    by 0x4EC2517: _PyObject_GenericSetAttrWithDict (object.c:1524)
==28631==    by 0x4EC1F5E: PyObject_SetAttr (object.c:1247)
==28631==    by 0x4F21600: PyEval_EvalFrameEx (ceval.c:2004)
==28631==    by 0x4F26587: PyEval_EvalCodeEx (ceval.c:3253)
==28631==    by 0x4EA8F65: function_call (funcobject.c:526)
==28631==    by 0x4E7DFED: PyObject_Call (abstract.c:2529)
==28631==    by 0x4F1F6A6: PyEval_CallObjectWithKeywords (ceval.c:3890)
==28631==    by 0x4F23D5A: PyEval_EvalFrameEx (ceval.c:1739)
==28631==    by 0x4F26587: PyEval_EvalCodeEx (ceval.c:3253)
==28631==    by 0x4F266C1: PyEval_EvalCode (ceval.c:667)

Simon King

unread,

Dec 24, 2012, 4:54:48 PM12/24/12

to sage-...@googlegroups.com

Hi Jean-Pierre,

On 2012-12-24, Jean-Pierre Flori <jpf...@gmail.com> wrote:
> Not sure we got these so clearly before, but using --without-pymalloc and
> Valgrind (hint: finish and review #13060) I get lots of
>
>
>==28631== Invalid read of size 8
>==28631== at 0x10429E50:
> __pyx_tp_dealloc_4sage_9structure_15category_object_CategoryObject
> (category_object.c:8990)

> ...

>==28631== by 0x10A6CD28:
> __pyx_pw_4sage_9structure_11coerce_dict_16TripleDictEraser_3__call__
> (coerce_dict.c:1225)

OK, that's good, because it clearly points to the stuff from #715 and
thus gives hope to find a bug in the new code.

I hope I will be able to debug that after Christmas.

Best regards,
Simon

Jean-Pierre Flori

unread,

Dec 24, 2012, 5:18:26 PM12/24/12

to sage-...@googlegroups.com

On Monday, December 24, 2012 10:54:48 PM UTC+1, Simon King wrote:

Hi Jean-Pierre,

On 2012-12-24, Jean-Pierre Flori <jpf...@gmail.com> wrote:
> Not sure we got these so clearly before, but using --without-pymalloc and
> Valgrind (hint: finish and review #13060) I get lots of
>
>
>==28631== Invalid read of size 8
>==28631== at 0x10429E50:
> __pyx_tp_dealloc_4sage_9structure_15category_object_CategoryObject
> (category_object.c:8990)
> ...
>==28631== by 0x10A6CD28:
> __pyx_pw_4sage_9structure_11coerce_dict_16TripleDictEraser_3__call__
> (coerce_dict.c:1225)

OK, that's good, because it clearly points to the stuff from #715 and
thus gives hope to find a bug in the new code.

Yup, let's hope so.

Maybe the problem is with endomorphism rings, because we have the domain and codomain pointing to the same parent, that's a nice culprit for a superfluous decref.

Jean-Pierre Flori

unread,

Dec 24, 2012, 7:44:01 PM12/24/12

to sage-...@googlegroups.com

Any reason for calling directly _refcache.__delitem__ rather than del _refcache ?
Changing this solves the problem, but surely only by hiding the bug...

Jean-Pierre Flori

unread,

Dec 24, 2012, 8:52:37 PM12/24/12

to sage-...@googlegroups.com

Indeed, rebuilding everything with --with-pydebug is just scary.
You get the smae failure as above, but everywhere (because asserts are checked and you don't have to pray for a segfault)

Simon King

unread,

Dec 25, 2012, 11:04:12 AM12/25/12

to sage-...@googlegroups.com

Hi Jean-Pierre,

On 2012-12-25, Jean-Pierre Flori <jpf...@gmail.com> wrote:
>> Any reason for calling directly _refcache.__delitem__ rather than del
>> _refcache ?

No.

> Indeed, rebuilding everything with --with-pydebug is just scary.
> You get the smae failure as above, but everywhere (because asserts are
> checked and you don't have to pray for a segfault)

What commands does one need to issue in order to rebuild everything with
--with-pydebug?

Best regards,
Simon

Jean-Pierre Flori

unread,

Dec 25, 2012, 11:13:01 AM12/25/12

to sage-...@googlegroups.com

I think the easiest way is to tweak the spkg-install script so that it passes the option to configure.
Dirty but working ok for development, we should add some way to pass the option directly (for example when SAGE_DEBUG is yes or letting the user pass falgs to put into EXTRAFLAGS).
By the way, from http://docs.python.org/devguide/setup.html#compiling-for-debugging :
You should always develop under a pydebug build of CPython (the only instance of when you shouldn’t is if you are taking performance measurements). Even when working only on pure Python code the pydebug build provides several useful checks that one should not skip.

:)

I then had to rebuild Cython (and to rebuild the Sage library), and gdb does not seem to work anymore.

And when I say it fails quite everywhere when using the debug build, hoping that I did not break everything myself with the debug build, in fact it mostly fails before doctesting anything, but I still see pointers to category objects in the trace I get (without gdb unfortunately):
python: Modules/gcmodule.c:326: visit_decref: Assertion `gc->gc.gc_refs != 0' failed.
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libcsage.so(print_backtrace+0x31)[0x81edd2d]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libcsage.so(sigdie+0x14)[0x81edd5f]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libcsage.so(sage_signal_handler+0x1da)[0x81ed90b]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf310)[0x52ab310]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35)[0x5beddd5]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x17b)[0x5bf0efb]
/lib/x86_64-linux-gnu/libc.so.6(+0x2df0e)[0x5be6f0e]
/lib/x86_64-linux-gnu/libc.so.6(+0x2dfb2)[0x5be6fb2]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x1a3c11)[0x4fd3c11]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/python/site-packages/sage/structure/category_object.so(+0x269be)[0x105de9be]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/python/site-packages/sage/structure/parent.so(+0x5781e)[0x1039081e]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/python/site-packages/sage/structure/parent_old.so(+0x1baa1)[0x1012daa1]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/python/site-packages/sage/structure/parent_base.so(+0x5d03)[0xff0ad03]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0xde402)[0x4f0e402]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x1a3c86)[0x4fd3c86]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x1a4cd7)[0x4fd4cd7]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x1a5008)[0x4fd5008]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(_PyObject_GC_Malloc+0xcc)[0x4fd5ce1]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(_PyObject_GC_New+0x1c)[0x4fd5d19]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyCFunction_NewEx+0x70)[0x4ee55dc]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0xeb6cc)[0x4f1b6cc]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0xf2359)[0x4f22359]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyType_Ready+0x21c)[0x4f19578]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/python/site-packages/sage/rings/integer.so(initinteger+0x2121)[0x1563bf4e]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(_PyImport_LoadDynamicModule+0x12e)[0x4faa57e]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x1764a5)[0x4fa64a5]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x178776)[0x4fa8776]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x177c9a)[0x4fa7c9a]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x176f19)[0x4fa6f19]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyImport_ImportModuleLevel+0x3f)[0x4fa7330]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x1412e3)[0x4f712e3]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyCFunction_Call+0xbc)[0x4ee5884]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyObject_Call+0x7f)[0x4e81c61]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x51dc9)[0x4e81dc9]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyObject_CallFunction+0xfe)[0x4e81f3d]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyImport_Import+0x243)[0x4fa9076]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/python/site-packages/sage/rings/complex_double.so(+0x40ee3)[0x12aa3ee3]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/python/site-packages/sage/rings/complex_double.so(+0x41021)[0x12aa4021]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/python/site-packages/sage/rings/complex_double.so(initcomplex_double+0x2b89)[0x12a9e0d5]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(_PyImport_LoadDynamicModule+0x12e)[0x4faa57e]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x1764a5)[0x4fa64a5]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x178776)[0x4fa8776]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x177c9a)[0x4fa7c9a]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x176f19)[0x4fa6f19]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyImport_ImportModuleLevel+0x3f)[0x4fa7330]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x1412e3)[0x4f712e3]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyCFunction_Call+0xbc)[0x4ee5884]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyObject_Call+0x7f)[0x4e81c61]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyEval_CallObjectWithKeywords+0x170)[0x4f87abc]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x683b)[0x4f81a68]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x111c)[0x4f858ca]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyEval_EvalCode+0x5a)[0x4f7b20c]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyImport_ExecCodeModuleEx+0x196)[0x4fa3e9b]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x174b3d)[0x4fa4b3d]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x176465)[0x4fa6465]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x178776)[0x4fa8776]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x177c9a)[0x4fa7c9a]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x176ea3)[0x4fa6ea3]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyImport_ImportModuleLevel+0x3f)[0x4fa7330]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x1412e3)[0x4f712e3]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyCFunction_Call+0xbc)[0x4ee5884]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyObject_Call+0x7f)[0x4e81c61]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyEval_CallObjectWithKeywords+0x170)[0x4f87abc]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x683b)[0x4f81a68]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x111c)[0x4f858ca]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyEval_EvalCode+0x5a)[0x4f7b20c]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyImport_ExecCodeModuleEx+0x196)[0x4fa3e9b]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x174b3d)[0x4fa4b3d]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x176465)[0x4fa6465]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x178776)[0x4fa8776]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x177c9a)[0x4fa7c9a]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x176f19)[0x4fa6f19]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyImport_ImportModuleLevel+0x3f)[0x4fa7330]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x1412e3)[0x4f712e3]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyCFunction_Call+0xbc)[0x4ee5884]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyObject_Call+0x7f)[0x4e81c61]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyEval_CallObjectWithKeywords+0x170)[0x4f87abc]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x683b)[0x4f81a68]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x111c)[0x4f858ca]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyEval_EvalCode+0x5a)[0x4f7b20c]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyImport_ExecCodeModuleEx+0x196)[0x4fa3e9b]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x174b3d)[0x4fa4b3d]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x176465)[0x4fa6465]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x178776)[0x4fa8776]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x177c9a)[0x4fa7c9a]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x176f19)[0x4fa6f19]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyImport_ImportModuleLevel+0x3f)[0x4fa7330]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x1412e3)[0x4f712e3]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyCFunction_Call+0xbc)[0x4ee5884]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyObject_Call+0x7f)[0x4e81c61]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyEval_CallObjectWithKeywords+0x170)[0x4f87abc]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x683b)[0x4f81a68]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x111c)[0x4f858ca]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyEval_EvalCode+0x5a)[0x4f7b20c]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyImport_ExecCodeModuleEx+0x196)[0x4fa3e9b]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x174b3d)[0x4fa4b3d]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x176465)[0x4fa6465]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x178776)[0x4fa8776]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x177c9a)[0x4fa7c9a]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x176f19)[0x4fa6f19]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyImport_ImportModuleLevel+0x3f)[0x4fa7330]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x1412e3)[0x4f712e3]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyCFunction_Call+0xbc)[0x4ee5884]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyObject_Call+0x7f)[0x4e81c61]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyEval_CallObjectWithKeywords+0x170)[0x4f87abc]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x683b)[0x4f81a68]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x111c)[0x4f858ca]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyEval_EvalCode+0x5a)[0x4f7b20c]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(+0x189986)[0x4fb9986]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyRun_FileExFlags+0xbf)[0x4fb990c]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0x2be)[0x4fb8109]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(PyRun_AnyFileExFlags+0x88)[0x4fb7741]
/home/jp/boulot/sage/sage-5.6.beta0/local/lib/libpython2.7.so.1.0(Py_Main+0xd29)[0x4fd377e]
python(main+0x20)[0x40085c]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x5bda6ad]
python[0x400779]

------------------------------------------------------------------------
Unhandled SIGABRT: An abort() occurred in Sage.
This probably occurred because a *compiled* component of Sage has a bug
in it and is not properly wrapped with sig_on(), sig_off(). You might
want to run Sage under gdb with 'sage -gdb' to debug this.
Sage will now terminate.
------------------------------------------------------------------------
Aborted
[25204 refs]

Jean-Pierre Flori

unread,

Dec 25, 2012, 11:36:41 AM12/25/12

to sage-...@googlegroups.com

In fact it seems you already segfault when going through the line
from sage.misc.all import * #takes a while
in sage/all.py,
but not before.

And that there it is the
from function import (...
which makes the first hit.

Volker Braun

unread,

Dec 25, 2012, 11:42:16 AM12/25/12

to sage-...@googlegroups.com

On Tuesday, December 25, 2012 4:13:01 PM UTC, Jean-Pierre Flori wrote:

I think the easiest way is to tweak the spkg-install script so that it passes the option to configure.
Dirty but working ok for development, we should add some way to pass the option directly (for example when SAGE_DEBUG is yes

+1 for enabling that (and the Singular xalloc wrapper) if SAGE_DEBUG is set. Making a debug build shouldn't involve hunting around for obscure environment variables that one can set.

Jean-Pierre Flori

unread,

Dec 25, 2012, 6:01:48 PM12/25/12

to sage-...@googlegroups.com

Ok the (first) problematic line is in sage/interfaces/mathematica.py:
mathematica = Mathematica(script_subdirectory='user')

Jean-Pierre Flori

unread,

Dec 26, 2012, 9:07:09 AM12/26/12

to sage-...@googlegroups.com

Bad (?) news, with a debug build of Python, Sage 5.2 (without the memleak patches) fails the same way.
So potentially, all the hard work here maybe only dug up old horrible bugs.

Jean-Pierre Flori

unread,

Dec 26, 2012, 12:49:52 PM12/26/12

to sage-...@googlegroups.com

The offending object seems to be a weakref which is not refcounted correctly.
Not sure which yet.

Jean-Pierre Flori

unread,

Dec 26, 2012, 2:08:24 PM12/26/12

to sage-...@googlegroups.com

On Wednesday, December 26, 2012 6:49:52 PM UTC+1, Jean-Pierre Flori wrote:

The offending object seems to be a weakref which is not refcounted correctly.
Not sure which yet.

Not sure why, but it might be the one initialized by the Expect class at the line:
   quit.expect_objects.append(weakref.ref(self))

If I add:
     import gc
     print gc.get_referrers(quit.expect_objects[-1])

I get:
[Mathematica, [<weakref at 0x21c07c0; to 'Mathematica' at 0x21bd490>]]

Strange that Mathematica points to its weakref.
The second member is quit.expect_object I assume.

Jean-Pierre Flori

unread,

Dec 26, 2012, 2:20:03 PM12/26/12

to sage-...@googlegroups.com

Really strange.
If I del the weakref and then mathematica (from sage.interfaces.mathematica), everything is fine (until it fails later in InfinityRing).
But if I try to del mathematica directly, then bang!

Jean-Pierre Flori

unread,

Dec 26, 2012, 2:22:24 PM12/26/12

to sage-...@googlegroups.com

(Using sys.getrefcount shows there is no problem until the call to del)

Jean-Pierre Flori

unread,

Dec 26, 2012, 4:24:37 PM12/26/12

to sage-...@googlegroups.com

In fact it may be a Cython misuse (or bug?).
The following minimal example seems to reproduce the issue:
In some .pxd file:
cdef class CFake:
     cdef object __weakref__

In some .py file:
import gc, weakref, quit
from sage.structure.cfake import CFake
class Fake(CFake):
    def __init__(self):
        L.append(weakref.ref(self))

a=Fake()

import gc
print gc.get_referrers(quit.expect_objects[-1])

import sys
print sys.getrefcount(quit.expect_objects[-1])
gc.collect()

Jean-Pierre Flori

unread,

Dec 26, 2012, 4:28:01 PM12/26/12

to sage-...@googlegroups.com

And of course, I just found that...
https://groups.google.com/d/topic/cython-users/K5EFvq22UNI/discussion

Jean-Pierre Flori

unread,

Dec 26, 2012, 4:33:22 PM12/26/12

to sage-...@googlegroups.com

On Wednesday, December 26, 2012 10:28:01 PM UTC+1, Jean-Pierre Flori wrote:

And of course, I just found that...
https://groups.google.com/d/topic/cython-users/K5EFvq22UNI/discussion

But I'm not sure it actually solves our problem.

Jean-Pierre Flori

unread,

Dec 26, 2012, 4:34:49 PM12/26/12

to sage-...@googlegroups.com

But see
https://groups.google.com/d/topic/cython-users/4es75DeacRA/discussion
as well...
I'll give this new Cython 0.17.3 a try.

Jean-Pierre Flori

unread,

Dec 26, 2012, 4:36:18 PM12/26/12

to sage-...@googlegroups.com

And it is even packaged at
http://trac.sagemath.org/sage_trac/ticket/13832

Jeroen Demeyer

unread,

Dec 27, 2012, 6:13:29 PM12/27/12

to sage-...@googlegroups.com

Another issue came up: ticket #13566 also causes a segmentation fault
which goes away when #715 + #11521 are undone. As with other cases, the
segmentation fault also disappears when doctesting --verbose. Note that
this is with the new Cython 0.17.3.

My feeling is increasing that Sage isn't ready yet for #715. I certainly
believe that this isn't the fault of #715, but of other parts of Sage.
But for me that's irrelevant, what matters to me is that we don't get
unexpected segfaults.

I have created the very tentative ticket #13870 to undo #715 and #11521.
At least, it is useful for checking whether a segfault is "caused" by
#715 + #11521. I hope it won't be merged (note the milestone).

Jeroen.

Volker Braun

unread,

Dec 27, 2012, 6:44:03 PM12/27/12

to sage-...@googlegroups.com

I'll finish up the patches to allow the Python debug build tomorrow. Together with cygdb (Cython support for gdb) this should give us a much clearer picture of what is going on. I'm pretty confident that we can figure this out if we just get enough eyeballs. I don't think unmerging #715 is going to help anybody.

Nils Bruin

unread,

Dec 27, 2012, 6:58:55 PM12/27/12

to sage-devel

On Dec 27, 3:13 pm, Jeroen Demeyer <jdeme...@cage.ugent.be> wrote:
> Another issue came up: ticket #13566 also causes a segmentation fault
> which goes away when #715 + #11521 are undone.

[...]

> My feeling is increasing that Sage isn't ready yet for #715.

I feel compelled to repeat here that sage is not useful for my
research without it. Presently, I can only do toy examples with it. If
there is no clear road to improve sage's memory management I'll have
to discard my previous investments as sunk costs and abandon it. I'd
hate to do so. I don't see how we can expect that reverting #715 now
will lead to a smoother remerging process down the road. In fact, I
expect that with the lost momentum it will be harder later.

That said, previous work with singular showed that as soon as memory
management is exposed to the usual tools, valgrind and similar tools
make it often a trivial operation to find what's wrong and many of the
mistakes turn out quite shallow once you know where to look. Building
python with debug on sounds like a great step towards this and the
recent comments in this thread seem to indicate we're quite close to
making it easy to build sage with it. It sounds like quite a doable
project to get patch queue in place that allows `export
SAGE_DEBUG=yes` to do the desired thing. Once that is in place, I'm
happy to do some runs and help locate (and fix?) bugs. So ... Please
post instructions on how to get a sage build that allows valgrinding
python memory allocations!

There is a reasonable team of people presently who are willing to
invest some time in tracking down these issues. Please stick with it
for now!

In the mean time, I'd be happy to review #13870 should you need it ;-).

Nils Bruin

unread,

Dec 28, 2012, 7:45:50 PM12/28/12

to sage-devel

Good progress thanks to #13864 . After solving the ZODB issue I'm now
running into the following (this is just one example, the same issue
arises in many cases):

sage: sage.homology.chain_complex.HomologyGroup(100)

ERROR (at getWritePointer in /usr/local/sage/5.6b1/local/include/
linbox/matrix/permutation-matrix.h:175):
Precondition not met:P_.size()
terminate called after throwing an instance of
'LinBox::PreconditionFailed'
(SIGABRT)

The code referred to is:

173 _UnsignedInt* getWritePointer()
174 {
175 linbox_check(P_.size());
176 return &P_[0];
177 }

Can someone confirm this is an issue with "SAGE_DEBUG=yes" or is this
a peculiarity of my setup?

Volker Braun

unread,

Dec 28, 2012, 9:02:15 PM12/28/12

to sage-...@googlegroups.com

A side effect of using a debug Python build is that Cython extensions are no longer compiled with -DNDEBUG. So C level assertions in headers are enabled. I guess this is what is happening. Sage is almost certainly passing something thats not allowed to linbox here.

Simon King

unread,

Dec 29, 2012, 3:06:23 AM12/29/12

to sage-...@googlegroups.com

Hi Nils!

On 2012-12-29, Nils Bruin <nbr...@sfu.ca> wrote:
> Good progress thanks to #13864 . After solving the ZODB issue I'm now
> running into the following (this is just one example, the same issue
> arises in many cases):
>
> sage: sage.homology.chain_complex.HomologyGroup(100)
>
> ERROR (at getWritePointer in /usr/local/sage/5.6b1/local/include/
> linbox/matrix/permutation-matrix.h:175):
> Precondition not met:P_.size()
> terminate called after throwing an instance of
> 'LinBox::PreconditionFailed'
> (SIGABRT)

Confirmed (on my openSuse laptop). at least with MALLOC_CHECK_=3.

Cheers,
Simon

Francois Bissey

unread,

Dec 29, 2012, 3:39:35 AM12/29/12

to sage-...@googlegroups.com

I can confirm that. On sage-on-gentoo the system python hasn't provided
-DNDEBUG to us for years. We had to add it manually to any modules
compiling against libsingular. Since the linbox upgrade we had to add it
for linbox as well. We tagged it to givaro as a mean to ensure that it
was with any extension built on linbox and singular:
sed -i "s:-D__STDC_LIMIT_MACROS:-D__STDC_LIMIT_MACROS', '-DNDEBUG:g" \
module_list.py
modify the following line:
givaro_extra_compile_args =['-D__STDC_LIMIT_MACROS']
in module_list.py and every case where DNDEBUG is needed is covered exactly.

Francois

Nils Bruin

unread,

Dec 29, 2012, 8:13:23 PM12/29/12

to sage-devel

> > Maybe the problem is with endomorphism rings, because we have the domain
> > and codomain pointing to the same parent, that's a nice culprit for a
> > superfluous decref.

I spent some quality time with gdb and this failing doctest. The good
news: memory layout is deterministic, so you can just set a "watch" on
a memory location and get a break every time that location gets
touched. I've analyzed what goes wrong (at least for me) to a certain
degree:

PyMalloc uses pools for fixed sized memory blocks. Unused blocks in a
page are linked together in a "freelist". The segfault arises from
interpreting a non-pointer value as a pointer value. The "1" that ends
up there happens to be a refcount of a string. I got the strong
impression that that same block also got allocated as a "tuple", which
has a "GC block" in front of it. The 1 should wreak havoc there as
well, but apparently never does. It may well be that this block got
inserted twice into a freelist and hence gets allocated twice, which
later on messes up the memory management.

Just prior to the activity that I thought looked like a double free, I
noticed that this block is indeed used for allocation of
sage.categories.modules.Modules.HomCategory.parent_class. This is
happening around the doctest

sage: from sage.modules.module import Module
sage: M = Module(ZZ); M
sage: M.endomorphism_ring()

and some free activity afterwards did have weakref.callback stuff in
its stack trace.

I had good hopes that with a --without-pymalloc build, we'd catch a
double free like this red handed, but no such luck (I cannot reproduce
the problem at all on debug builds). My afternoon with gdb has
horribly little to show for, but at least my observations seem in line
with what JP has found.

Jean-Pierre Flori

unread,

Dec 30, 2012, 11:40:05 AM12/30/12

to sage-...@googlegroups.com

Me neither, too bad.
At least it got us to have a working Python debug build.

Nils Bruin

unread,

Dec 30, 2012, 6:12:52 PM12/30/12

to sage-devel

OK, caught redhanded. Indeed, the freelist of this pool ends up being
circular, due to a double free happening. The first free is
indeed coming from a TripleDictEraser callback. The second one doesn't
seem to be. Indeed, the second one doesn't seem to be
triggered by a GC, but simply by a refcount hitting zero. Either that
or someone holding a pointer to an object on which it blindly
calls a release. I imagine that could happen in an extension class
where object pointers are stored in c variables. Objects that get
cleaned up as circular garbage can't trust that everything they're
used to pointing at is still alive during a dealloc. Perhaps
__pyx_tp_dealloc_4sage_9structure_15category_object_CategoryObject
needs to be examined for how it clears its substructures. Once
circular references are involved (or hidden ones as with weakrefs) the
rug can really be pulled from under you.

tracebacks below

(here 0x7fffbeb3e008 == &(pool->freeblock) )

(gdb) p *(block **) 0x7fffbeb3e008
(gdb) p *(block **) 0x7fffbeb3e008
$9 = (block *) 0x7fffbeb3e330 "0\u4cfe\377\177"
(gdb) p *(block **) 0x7fffbeb3e330
$10 = (block *) 0x7fffbeb3e430 ""
(gdb) p *(block **) 0x7fffbeb3e430

traceback is:

#0 PyObject_Free (p=0x7fffbeb3e330) at Objects/obmalloc.c:981
#1 0x00007ffff7cc7b76 in subtype_dealloc (self=0x7fffbeb3e350) at
Objects/typeobject.c:1014
#2 0x00007ffff7d4b7ca in delete_garbage (old=0x7ffff7fe19e0,
collectable=0x7fffffffafb0) at Modules/gcmodule.c:770
#3 collect (generation=1) at Modules/gcmodule.c:930
#4 0x00007ffff7d4c1a8 in collect_generations () at Modules/gcmodule.c:
996
#5 _PyObject_GC_Malloc (basicsize=<optimized out>) at Modules/
gcmodule.c:1457
#6 _PyObject_GC_Malloc (basicsize=<optimized out>) at Modules/
gcmodule.c:1439
#7 0x00007ffff7d4c1cd in _PyObject_GC_New (tp=0x7ffff7fb7d00) at
Modules/gcmodule.c:1467
#8 0x00007ffff7c86cb8 in PyWrapper_New (d=0x7ffff7bce230,
self=0xb2eca0) at Objects/descrobject.c:1068
#9 0x00007ffff7cafcda in _PyObject_GenericGetAttrWithDict
(obj=0xb2eca0, name=0x7ffff7bbc180, dict=0x0) at Objects/object.c:1434
#10 0x00007fffe99685b9 in
__pyx_pf_4sage_9structure_11coerce_dict_16TripleDictEraser_2__call__
(__pyx_v_r=<optimized out>,
__pyx_v_self=<optimized out>) at sage/structure/coerce_dict.c:1225
#11
__pyx_pw_4sage_9structure_11coerce_dict_16TripleDictEraser_3__call__
(__pyx_v_self=0x7fffea8659b8,
__pyx_args=<optimized out>, __pyx_kwds=<optimized out>) at sage/
structure/coerce_dict.c:966
#12 0x00007ffff7c6f403 in PyObject_Call (func=0x7fffea8659b8,
arg=<optimized out>, kw=<optimized out>) at Objects/abstract.c:2529
#13 0x00007ffff7c6fcf0 in PyObject_CallFunctionObjArgs
(callable=0x7fffea8659b8) at Objects/abstract.c:2760
#14 0x00007ffff7cd7b0e in handle_callback (callback=0x7fffea8659b8,
ref=0x7fffbeb520b0) at Objects/weakrefobject.c:881
#15 PyObject_ClearWeakRefs (object=<optimized out>) at Objects/
weakrefobject.c:928
#16 0x00007fffe9f9f0d0 in
__pyx_tp_dealloc_4sage_9structure_15category_object_CategoryObject
(o=0x7fffbeb3e350)
at sage/structure/category_object.c:8989
#17 0x00007ffff7cc7b76 in subtype_dealloc (self=0x7fffbeb3e350) at
Objects/typeobject.c:1014

A little later on:

(gdb) p *(block **) 0x7fffbeb3e008
$24 = (block *) 0x7fffbeb3e330 "0\u3cfe\377\177"
(gdb) p *(block **) 0x7fffbeb3e330
$25 = (block *) 0x7fffbeb3e330 "0\u3cfe\377\177"

///THIS IS BAD! This is a circular freelist.

traceback:

#0 PyObject_Free (p=0x7fffbeb3e330) at Objects/obmalloc.c:980
#1 0x00007ffff7cc7b76 in subtype_dealloc (self=0x7fffbeb3e350) at
Objects/typeobject.c:1014
#2 0x00007ffff7ca7c97 in insertdict (mp=0x621180, key=0x7ffff7bc0b70,
hash=12160036574, value=0x7ffff7fc6840)
at Objects/dictobject.c:530
#3 0x00007ffff7caa3ce in PyDict_SetItem (op=0x621180, key=<optimized
out>, value=0x7ffff7fc6840) at Objects/dictobject.c:775
#4 0x00007ffff7caffcb in _PyObject_GenericSetAttrWithDict
(obj=<optimized out>, name=0x7ffff7bc0b70, value=0x7ffff7fc6840,
dict=0x621180) at Objects/object.c:1524
#5 0x00007ffff7caf9b7 in PyObject_SetAttr (v=0x7ffff7bbbad0,
name=0x7ffff7bc0b70, value=0x7ffff7fc6840) at Objects/object.c:1247
#6 0x00007ffff7d0f3d1 in PyEval_EvalFrameEx (f=<optimized out>,
throwflag=<optimized out>) at Python/ceval.c:2004
#7 0x00007ffff7d14275 in PyEval_EvalCodeEx (co=<optimized out>,
globals=<optimized out>, locals=<optimized out>,
args=<optimized out>, argcount=1, kws=0x0, kwcount=0, defs=0x0,
defcount=0, closure=0x0) at Python/ceval.c:3253
#8 0x00007ffff7c9717c in function_call (func=0x7fffc14af1b8,
arg=0x7fffbeb01190, kw=0x0) at Objects/funcobject.c:526
#9 0x00007ffff7c6f403 in PyObject_Call (func=0x7fffc14af1b8,
arg=<optimized out>, kw=<optimized out>) at Objects/abstract.c:2529
#10 0x00007ffff7d0cc97 in PyEval_CallObjectWithKeywords
(func=0x7fffc14af1b8, arg=0x7fffbeb01190, kw=<optimized out>)
at Python/ceval.c:3890
#11 0x00007ffff7d11d12 in PyEval_EvalFrameEx (f=<optimized out>,
throwflag=<optimized out>) at Python/ceval.c:1739
#12 0x00007ffff7d14275 in PyEval_EvalCodeEx (co=<optimized out>,
globals=<optimized out>, locals=<optimized out>,
args=<optimized out>, argcount=0, kws=0x0, kwcount=0, defs=0x0,
defcount=0, closure=0x0) at Python/ceval.c:3253
#13 0x00007ffff7d143b2 in PyEval_EvalCode (co=<optimized out>,
globals=<optimized out>, locals=<optimized out>)
at Python/ceval.c:667
#14 0x00007ffff7d123df in exec_statement (locals=0x2b65b60,
globals=0x2b65b60, prog=<optimized out>, f=0x2809550)
at Python/ceval.c:4718
#15 PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>)
at Python/ceval.c:1880
#16 0x00007ffff7d14275 in PyEval_EvalCodeEx (co=<optimized out>,
globals=<optimized out>, locals=<optimized out>,
args=<optimized out>, argcount=5, kws=0x0, kwcount=0, defs=0x0,
defcount=0, closure=0x0) at Python/ceval.c:3253

Nils Bruin

unread,

Dec 31, 2012, 1:34:07 AM12/31/12

to sage-devel

By the way, the object that is allocated in the block that eventually
gets freed twice is:

Set of Morphisms from <type 'sage.modules.module.Module'> to <type
'sage.modules.module.Module'> in Category of modules over Integer Ring

Note that the first dealloc above is part of a TripleDictEraser call,
that gets triggered by a PyObject_ClearWeakRefs as part of the dealloc
of that same object! Since that code reads:

struct ... *p = (struct ... *)o;
if (p->__weakref__) PyObject_ClearWeakRefs(o);
Py_CLEAR(p->_generators);
Py_CLEAR(p->_category);
...

this is already a problem: o and p point at the same location, so the
call to ClearWeakRefs should definitely not lead to deallocation of
its argument!

TripleDicts have weakrefs on their keys, so the object *p above must
appear as a key ... probably twice! So a reasonable assumption is that
there are 2 weakrefs to *p. The first one triggers the call to
TripleDictEraser. This cleans up the second weakref too. But if that
code does not know that *p is already in the process of being torn
down, it could decide that *p can be deleted and do that as well.

What python *SHOULD* do for cleaning the weakrefs is:
- collect all callbacks
- null out all weakrefs to the objects
- call the callbacks
that way there wouldn't be any weakref links left that could lead to
multiple "eligible for deletion" discoveries in processing the
callbacks. I haven't checked but I really hope Python does do this.

Another source of deletion triggers would be if we (still?) cast
stored object IDs to <object>, in which case they'd be increffed and
decreffed (leading to possible deletions!) However, when I inspected
TripleDictEraser I thought it was safe for that.

Nils Bruin

unread,

Dec 31, 2012, 1:59:21 AM12/31/12

to sage-devel

On Dec 24, 4:44 pm, Jean-Pierre Flori <jpfl...@gmail.com> wrote:

> Any reason for calling directly _refcache.__delitem__ rather than del
> _refcache ?

> Changing this solves the problem, but surely only by hiding the bug...

I'm not so sure this only hides it. If my analysis is correct then
this line is virtually the only possible culprit. Are we positive that
del A[b]
is exactly the same as
del A.__delitem__(b)
[obviously it's not!] It solves the problem for me, so I'd be tempted
to propose this as a solution.

(by the way, reading PyObject_ClearWeakRefs I think that python (of
course) does do the right thing)

Jean-Pierre Flori

unread,

Dec 31, 2012, 3:58:20 AM12/31/12

to sage-...@googlegroups.com

On Monday, December 31, 2012 7:59:21 AM UTC+1, Nils Bruin wrote:

On Dec 24, 4:44 pm, Jean-Pierre Flori <jpfl...@gmail.com> wrote:

> Any reason for calling directly _refcache.__delitem__ rather than del
> _refcache ?
> Changing this solves the problem, but surely only by hiding the bug...

I'm not so sure this only hides it. If my analysis is correct then
this line is virtually the only possible culprit. Are we positive that
del A[b]
is exactly the same as
del A.__delitem__(b)
[obviously it's not!] It solves the problem for me, so I'd be tempted
to propose this as a solution.

When I had a quick look at it, it thought they were not exactly the same.
But then I had a look at the C code Cython generated and thought it was the same...
So we should look back at the C code, or ask a Python/Cython guru.

Jean-Pierre Flori

unread,

Dec 31, 2012, 5:18:11 AM12/31/12

to sage-...@googlegroups.com

On Sunday, December 30, 2012 5:40:05 PM UTC+1, Jean-Pierre Flori wrote:

I had good hopes that with a --without-pymalloc build, we'd catch a
double free like this red handed, but no such luck (I cannot reproduce
the problem at all on debug builds).
Me neither, too bad.
At least it got us to have a working Python debug build.

And of course I cannot reproduce it on top of a usual build of sage-5.6.beta2.
Even after playing with MALLOC_CHECK_.

On what kind of system did you perform your last tests?
I feel kind of bored having to rebuild a previous beta once again and being even sure I'll get something to investigate.

Volker Braun

unread,

Dec 31, 2012, 5:33:25 AM12/31/12

to sage-...@googlegroups.com

If you want to use an external memory checker (like glibc MALLOC_CHECK_, electric fence, or valgrind) you still have to use SAGE_VALGRIND or python will use pymalloc. The latter has its own debugging code but is not transparent to external tools.

export SAGE_VALGRIND=yes

export SAGE_DEBUG=yes

sage -f python-x.y.z.spkg

Jean-Pierre Flori

unread,

Dec 31, 2012, 5:41:32 AM12/31/12

to sage-...@googlegroups.com

My main problem here is not to change debugging options or allocators used, but just to reproduce the bug in modules.pyx :)
If I cannot even get a segfault, having proper tools or not seems less important.

Anyway, I'm currently making four builds of sage-5.6.beta2 with the set of patches from #13864 plus the patches from #13889 (and dependencies) with all combinations of SAGE_VALGRIND and SAGE_DEBUG.

Jean-Pierre Flori

unread,

Dec 31, 2012, 9:55:03 AM12/31/12

to sage-...@googlegroups.com

I completely missed #13566 but even with that I cannot reproduce the segfault reported there on 5.6.beta2 + #13864 + #13889 + #13566 + deps without SAGE_DEBUG and SAGE_VALGRIND.

Jean-Pierre Flori

unread,

Dec 31, 2012, 10:05:58 AM12/31/12

to sage-...@googlegroups.com

On Monday, December 31, 2012 7:59:21 AM UTC+1, Nils Bruin wrote:

On Dec 24, 4:44 pm, Jean-Pierre Flori <jpfl...@gmail.com> wrote:

> Any reason for calling directly _refcache.__delitem__ rather than del
> _refcache ?
> Changing this solves the problem, but surely only by hiding the bug...

I'm not so sure this only hides it. If my analysis is correct then
this line is virtually the only possible culprit. Are we positive that
del A[b]

This calls the PyDict_DelItem CPython function with the dict and the key as arguments.

is exactly the same as
del A.__delitem__(b)

(I assume the del in front of A is a bad copy/paste)

This first gets the __delitem__ method from the dict using PyObject_GetAttr (with arguments the dict and the string) and then calls it using PyObject_Call with argument the result of the previous call and the key.

So I guess we have to check that PyDict_DelItem does something more than the latter calls.

Jean-Pierre Flori

unread,

Dec 31, 2012, 10:09:47 AM12/31/12

to sage-...@googlegroups.com

Not really sure but we might be interested in the Py_Clear macro to deal with such intricated cases.

Jean-Pierre Flori

unread,

Dec 31, 2012, 10:11:28 AM12/31/12

to sage-...@googlegroups.com

In particular, http://docs.python.org/2/c-api/refcounting.html states:
"It is a good idea to use this macro whenever decrementing the value of a variable that might be traversed during garbage collection."
Not sure if this is required though.

Nils Bruin

unread,

Dec 31, 2012, 10:29:18 AM12/31/12

to sage-devel

On Dec 31, 12:58 am, Jean-Pierre Flori <jpfl...@gmail.com> wrote:
> But then I had a look at the C code Cython generated and thought it was
> the same...
> So we should look back at the C code, or ask a Python/Cython guru.

The __delitem__ compiles as an ordinary method lookup and call,
whereas the "del A[..]" compiles as a straight call to PyDict_DelItem
(or whatever the name of the appropriate routine is). The preparation
of the parameter to either call seems to be the same. I agree it's
hard to argue that there's a fundamental difference between the two.
The "del ..." definitely seems preferable since it saves a method
lookup.

I have only been successful reproducing this bug on sage-5.6b0 on
fedora 16 (x86_64). I have tried advanced debugging tools on 5.6b1 but
failed to reproduce the behaviour there. Hence the efforts with plain
gdb.

So, the scenario we're seeing:

- We have an object o that's being deallocated
- Its deallocation triggers a weakref callback to an instance of
TripleDictEraser
- TripleDictEraser removes a key of the form (h1,h2,h3) from a
dictionary _refcache, where h1,h2,h3 are IDs of objects k1,k2,k3 (one
of which is o)
- Somehow this removal triggers another deallocate of o

Since h1,h2,h3 are just PyInt, their deletion can't possibly trigger
anything else, so the trigger must come from the value stored in the
dictionary. This value is (r1,r2,r3), where ri is a weakref (KeyedRef
to be precise) to ki (if possible). Python is a sane language, so I
trust that deleting a weakref will not trigger deletion of whatever
was pointed to.

However, when ki is not weakreffable the ri is taken to be ki. Our
object o is obviously weakreffable (that's how we got the callback in
the first place), However, if one of k1,k2,k3 is not weakreffable
(tuples and lists aren't ...) but contains a reference to o then we
would get exactly such a trigger.

Of course, the existence of a strong reference in _refcache to o would
preclude the collection of o ... unless they're all in the same cyclic
garbage. Then they could still end up being cleaned.

So my conclusion is: TripleDictEraser should NOT decref any ri that is
not a weakref, because that could lead to deallocation of objects that
are already scheduled for deallocation. Do we need a list, "deathrow",
where we can move such references? Should we just try harder to store
a weakref to potentially dangerous ki? (we could store them in a
container class instance)

I am not sure that this is exactly the issue we're running into, but
it seems plausible (it certainly needs a very brittle confluence of
circumstances to become apparent). Furthermore, I think it is a
scenario that does need addressing, because TripleDictEraser doesn't
seem to guard against this.

Simon King

unread,

Dec 31, 2012, 10:32:45 AM12/31/12

to sage-...@googlegroups.com

Hi Nils,

On 2012-12-30, Nils Bruin <nbr...@sfu.ca> wrote:
> OK, caught redhanded.

Great!

But I completely forgot: What exactly is the failing test, and what
patches need to be applied to make it fail?

Or is there a new trac ticket on which I can find that information?

Best regards,
Simon

Jean-Pierre Flori

unread,

Dec 31, 2012, 10:51:16 AM12/31/12

to sage-...@googlegroups.com

On Monday, December 31, 2012 4:29:18 PM UTC+1, Nils Bruin wrote:

On Dec 31, 12:58 am, Jean-Pierre Flori <jpfl...@gmail.com> wrote:
> But then I had a look at the C code Cython generated and thought it was
> the same...
> So we should look back at the C code, or ask a Python/Cython guru.

The __delitem__ compiles as an ordinary method lookup and call,
whereas the "del A[..]" compiles as a straight call to PyDict_DelItem
(or whatever the name of the appropriate routine is). The preparation
of the parameter to either call seems to be the same. I agree it's
hard to argue that there's a fundamental difference between the two.
The "del ..." definitely seems preferable since it saves a method
lookup.

Yup and it's not yet clear to me how __delitem__ is resolved.
In particular, in the dict objects definition, you have a nice correspondence set up for __getitem__ but nothing for __delitem__ (see mapp_methods in dictobjects.c), so this might in fact do complicated things.

Nils Bruin

unread,

Dec 31, 2012, 11:18:44 AM12/31/12

to sage-devel

On Dec 31, 7:32 am, Simon King <simon.k...@uni-jena.de> wrote:
> But I completely forgot: What exactly is the failing test, and what
> patches need to be applied to make it fail?

It's sage -t devel/sage/sage/modules/module.pyx on 5.6b0. However:
- I sometimes need to run it *twice* (I guess because some *.pyc
files must be up to date for the segfault to occur)
- I don't think everybody can reproduce this error. There is some
system dependence.

Jean-Pierre Flori

unread,

Dec 31, 2012, 11:23:42 AM12/31/12

to sage-...@googlegroups.com

On Monday, December 31, 2012 4:51:16 PM UTC+1, Jean-Pierre Flori wrote:

On Monday, December 31, 2012 4:29:18 PM UTC+1, Nils Bruin wrote:
On Dec 31, 12:58 am, Jean-Pierre Flori <jpfl...@gmail.com> wrote:
> But then I had a look at the C code Cython generated and thought it was
> the same...
> So we should look back at the C code, or ask a Python/Cython guru.

The __delitem__ compiles as an ordinary method lookup and call,
whereas the "del A[..]" compiles as a straight call to PyDict_DelItem
(or whatever the name of the appropriate routine is). The preparation
of the parameter to either call seems to be the same. I agree it's
hard to argue that there's a fundamental difference between the two.
The "del ..." definitely seems preferable since it saves a method
lookup.
Yup and it's not yet clear to me how __delitem__ is resolved.
In particular, in the dict objects definition, you have a nice correspondence set up for __getitem__ but nothing for __delitem__ (see mapp_methods in dictobjects.c), so this might in fact do complicated things.

After a first (not so) quick look, I think it might be the generic PyObject_DelItem which gets called in place of the PyDict_DelItem, not sure though, I wasn't patient enough to run this under gdb to get confirmation.

Jean-Pierre Flori

unread,

Dec 31, 2012, 11:25:29 AM12/31/12

to sage-...@googlegroups.com

On Monday, December 31, 2012 5:23:42 PM UTC+1, Jean-Pierre Flori wrote:

On Monday, December 31, 2012 4:51:16 PM UTC+1, Jean-Pierre Flori wrote:

On Monday, December 31, 2012 4:29:18 PM UTC+1, Nils Bruin wrote:
On Dec 31, 12:58 am, Jean-Pierre Flori <jpfl...@gmail.com> wrote:
> But then I had a look at the C code Cython generated and thought it was
> the same...
> So we should look back at the C code, or ask a Python/Cython guru.

The __delitem__ compiles as an ordinary method lookup and call,
whereas the "del A[..]" compiles as a straight call to PyDict_DelItem
(or whatever the name of the appropriate routine is). The preparation
of the parameter to either call seems to be the same. I agree it's
hard to argue that there's a fundamental difference between the two.
The "del ..." definitely seems preferable since it saves a method
lookup.
Yup and it's not yet clear to me how __delitem__ is resolved.
In particular, in the dict objects definition, you have a nice correspondence set up for __getitem__ but nothing for __delitem__ (see mapp_methods in dictobjects.c), so this might in fact do complicated things.

And it's not :)
It's the right PyDict_DelItem which is.

Nils Bruin

unread,

Dec 31, 2012, 11:32:05 AM12/31/12

to sage-devel

On Dec 31, 7:29 am, Nils Bruin <nbr...@sfu.ca> wrote:

> I am not sure that this is exactly the issue we're running into, but
> it seems plausible (it certainly needs a very brittle confluence of
> circumstances to become apparent). Furthermore, I think it is a
> scenario that does need addressing, because TripleDictEraser doesn't
> seem to guard against this.

In fact, upon further inspection, I don't think this issue is what
bites us here. The dictionary in question only contains weakrefs in
its values.

I now believe this is a genuine bug in Python. We don't have a good
testcase, though, so I don't know how to report. However, as you see
in the traceback:

#0 PyObject_Free (p=0x7fffbeb3e330) at Objects/obmalloc.c:981
#1 0x00007ffff7cc7b76 in subtype_dealloc (self=0x7fffbeb3e350) at
Objects/typeobject.c:1014
#2 0x00007ffff7d4b7ca in delete_garbage (old=0x7ffff7fe19e0,
collectable=0x7fffffffafb0) at Modules/gcmodule.c:770
#3 collect (generation=1) at Modules/gcmodule.c:930
#4 0x00007ffff7d4c1a8 in collect_generations () at Modules/
gcmodule.c:
996
#5 _PyObject_GC_Malloc (basicsize=<optimized out>) at Modules/
gcmodule.c:1457
#6 _PyObject_GC_Malloc (basicsize=<optimized out>) at Modules/
gcmodule.c:1439
#7 0x00007ffff7d4c1cd in _PyObject_GC_New (tp=0x7ffff7fb7d00) at
Modules/gcmodule.c:1467
#8 0x00007ffff7c86cb8 in PyWrapper_New (d=0x7ffff7bce230,
self=0xb2eca0) at Objects/descrobject.c:1068

(this is below the TripleDictEraser call). What you see is that during
our callback, a garbage collection is triggered! This of course
discovers all sorts of things that can be deleted, including things
that are already scheduled for deletion. Other interpretations would
be very welcome.

Jean-Pierre Flori

unread,

Dec 31, 2012, 11:39:26 AM12/31/12

to sage-...@googlegroups.com

It seems very plausible, whence the possible need to look at Py_Clear.

The difference between the "del A[1]" and "A.__delitem__(1)" might only be the fact that in the latter case (where more fucntions calls are performed) a gc is triggered whereas its not in the first case, that's just a wild guess though.

Jean-Pierre Flori

unread,

Dec 31, 2012, 11:46:41 AM12/31/12

to sage-...@googlegroups.com

See http://osdir.com/ml/python.python-3000.devel/2006-04/msg01017.html as well.

Of course, if Nils is right, it might not be up to us to solve such problems, but rather Python itself...

Volker Braun

unread,

Dec 31, 2012, 11:50:40 AM12/31/12

to sage-...@googlegroups.com

If you want to increase the chance of a garbage collection to happen in TripleDictEraser.__call__ then there is an obvious trick to do that. I've added the corresponding patch to #13864. With this, I get a crash in module.pyx every time.

http://trac.sagemath.org/sage_trac/attachment/ticket/13864/gc_tester.patch

Jean-Pierre Flori

unread,

Dec 31, 2012, 12:34:05 PM12/31/12

to sage-...@googlegroups.com

On Monday, December 31, 2012 5:50:40 PM UTC+1, Volker Braun wrote:

If you want to increase the chance of a garbage collection to happen in TripleDictEraser.__call__ then there is an obvious trick to do that. I've added the corresponding patch to #13864. With this, I get a crash in module.pyx every time.

http://trac.sagemath.org/sage_trac/attachment/ticket/13864/gc_tester.patch

Indeed with my 5.6.beta2 from yesterday and the above changes I now get segfaults back when testing module.pyx wit the __delitem__ variant, but not with the del one.
Thanks for the trick.