Unhandled SIGSEGV: Ticket #973

27 views
Skip to first unread message

Jaap Spies

unread,
Oct 23, 2007, 7:23:04 AM10/23/07
to sage-...@googlegroups.com
Last year after less than two days I could finish
a calculation and write to William:

> -------- Original Message --------
> Subject: dance(10)
> Date: Thu, 09 Mar 2006 00:10:19 +0100
> From: Jaap Spies <j.s...@hccnet.nl>
> To: William Stein <wst...@ucsd.edu>
>
> William,
>
> Some time ago, dance(10) finished:
>
> sage: time dance(10)
> [1, 90, 3405, 71040, 901152, 7225344, 36862960, 117340800, 221170456,
> 220658800, 87396728]
> h^10 - 35*h^9 + 675*h^8 - 8610*h^7 + 78435*h^6 - 523467*h^5 +
> 2562525*h^4 - 9008160*h^3 + 21623220*h^2 - 31840760*h + 21750840
> CPU times: user 141822.70 s, sys: 18387.03 s, total: 160209.74 s
> Wall time: 165997.13
>
>
> Jaap

To day I got with sage-2.8.8.1:

> sage: time dance(10)
>
>
> ------------------------------------------------------------
> Unhandled SIGSEGV: A segmentation fault occured in SAGE.
> This probably occured because a *compiled* component
> of SAGE has a bug in it (typically accessing invalid memory)
> or is not properly wrapped with _sig_on, _sig_off.
> You might want to run SAGE under gdb with 'sage -gdb' to debug this.
> SAGE will now terminate (sorry).
> ------------------------------------------------------------

With sage -gdb:

> sage: time dance(9)
> h^9 - 27*h^8 + 414*h^7 - 4158*h^6 + 29421*h^5 - 148743*h^4 + 530796*h^3 - 1276992*h^2 + 1866384*h - 1255608
> CPU times: user 1786.82 s, sys: 23.05 s, total: 1809.87 s
> Wall time: 1831.52
>
> sage: time dance(10)
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread -1208523072 (LWP 30162)]
> 0x0064d473 in strlen () from /lib/libc.so.6
> (gdb)

The program (see below) uses methods from sage.matrix.matrix2.pyx

Jaap


##########################################################################
# Copyright (C) 2006 Jaap Spies, jaap...@gmail.com
#
# Distributed under the terms of the GNU General Public License (GPL):
#
# http://www.gnu.org/licenses/
##########################################################################

"""
Usage from sage

sage: attach 'dancing.sage'

sage: dance(4)
h^4 - 2*h^3 + 9*h^2 - 8*h + 6

"""

# use variable 'h' in the polynomial ring over the rationals

h = QQ['h'].gen()

def dance(m):
"""
Generates the polynomial solutions of the Dancing School Problem
Based on a modification of theorem 7.2.1 from Brualdi and Ryser,
Combinatorial Matrix Theory.

See NAW 5/7 nr. 4 december 2006 p. 285

INPUT: integer m

OUTPUT: polynomial in 'h'

EXAMPLE:
sage: dance(4)
h^4 - 2*h^3 + 9*h^2 - 8*h + 6

AUTHOR: Jaap Spies (2006)
"""
n = 2*m-2
M = MatrixSpace(ZZ, m, n)
A = M([0 for i in range(m*n)])
for i in range(m):
for j in range(n):
if i > j or j > i + n - m:
A[i,j] = 1
rv = A.rook_vector()
# print rv
s = sum([(-1)^k*rv[k]*falling_factorial(m+h-k, m-k) for k in range(m+1)])
print s

mabshoff

unread,
Oct 23, 2007, 11:37:19 AM10/23/07
to sage-devel

Hello Jaap,

I am looking into this, but I am pressed for time until Thursday, so
don't expect any solution before that. It will take forever to
valgrind this anyway. If you still have that gdb session it would be
great to get a backtrace. I did run the code with smaller parameters,
i.e. dance(4) and dance(5), under valgrind and nothing popped up
there. So I am suspecting that the code might smash the stack with
large parameters, but that ought to be taken with a grain of salt
until I get a backtrace and some valgrind output

Cheers,

Michael

>
> ##########################################################################
> # Copyright (C) 2006 Jaap Spies, jaapsp...@gmail.com

mabshoff

unread,
Oct 23, 2007, 8:11:07 PM10/23/07
to sage-devel
Jaap,

dance(10) computes fine on sage.math (in about 6 hours under gdb), I
am running dance(11) to see if it finishes [I guess you would like the
result ;)]. So, any chance you are running the computation on a 32 bit
box and/or run out of memory/have highly fragmented memory after a
long uptime? I have to wait for the valgrind run to finish (which will
take a long, long time), to say something definite, but I will look
into it some more.

Cheers,

Michael

Jaap Spies

unread,
Oct 24, 2007, 6:31:48 AM10/24/07
to sage-...@googlegroups.com
Michael,

You wrote:
>
> dance(10) computes fine on sage.math (in about 6 hours under gdb), I
> am running dance(11) to see if it finishes [I guess you would like the
> result ;)].

Yes, sure!

So, any chance you are running the computation on a 32 bit
> box and/or run out of memory/have highly fragmented memory after a
> long uptime?

There is a long uptime, but I saw in the system console a steady line
on 80% memory usage.


I have to wait for the valgrind run to finish (which will
> take a long, long time), to say something definite, but I will look
> into it some more.
>

Below the output from bt

Thanks!

Jaap

sage: time dance(10)

Program received signal SIGSEGV, Segmentation fault.

[Switching to Thread -1208297792 (LWP 3914)]
0x0a6c2a64 in ?? ()
(gdb) bt
#0 0x0a6c2a64 in ?? ()
#1 0x08087aa2 in PyObject_RichCompare (v=0x8f94f44, w=0x8f94f44, op=2) at Objects/object.c:914
#2 0x080c3862 in PyEval_EvalFrameEx (f=0xa7bb864, throwflag=0) at Python/ceval.c:3980
#3 0x0810b4ed in gen_send_ex (gen=0xa70a3ec, arg=0x16, exc=0) at Objects/genobject.c:82
#4 0x080c00cd in PyEval_EvalFrameEx (f=0xa7bd5e4, throwflag=0) at Python/ceval.c:2164
#5 0x0810b4ed in gen_send_ex (gen=0xa70a32c, arg=0x16, exc=0) at Objects/genobject.c:82
#6 0x080c00cd in PyEval_EvalFrameEx (f=0xa7bd15c, throwflag=0) at Python/ceval.c:2164
#7 0x0810b4ed in gen_send_ex (gen=0xa70a72c, arg=0x16, exc=0) at Objects/genobject.c:82
#8 0x080c00cd in PyEval_EvalFrameEx (f=0xa7bc4fc, throwflag=0) at Python/ceval.c:2164
#9 0x0810b4ed in gen_send_ex (gen=0xa70a70c, arg=0x16, exc=0) at Objects/genobject.c:82
#10 0x080c00cd in PyEval_EvalFrameEx (f=0xa7bea2c, throwflag=0) at Python/ceval.c:2164
#11 0x0810b4ed in gen_send_ex (gen=0xa70a6ec, arg=0x16, exc=0) at Objects/genobject.c:82
#12 0x080c00cd in PyEval_EvalFrameEx (f=0xa7be134, throwflag=0) at Python/ceval.c:2164
#13 0x0810b4ed in gen_send_ex (gen=0xa70a68c, arg=0x16, exc=0) at Objects/genobject.c:82
#14 0x080c00cd in PyEval_EvalFrameEx (f=0xa7bf77c, throwflag=0) at Python/ceval.c:2164
#15 0x0810b4ed in gen_send_ex (gen=0xa70a6cc, arg=0x16, exc=0) at Objects/genobject.c:82
#16 0x080c00cd in PyEval_EvalFrameEx (f=0xa7be434, throwflag=0) at Python/ceval.c:2164
#17 0x0810b4ed in gen_send_ex (gen=0xa70aa2c, arg=0x16, exc=0) at Objects/genobject.c:82
#18 0x080c00cd in PyEval_EvalFrameEx (f=0xa7bb9dc, throwflag=0) at Python/ceval.c:2164
#19 0x0810b4ed in gen_send_ex (gen=0xa70a66c, arg=0x16, exc=0) at Objects/genobject.c:82
#20 0x0805a213 in PyIter_Next (iter=0xa70a66c) at Objects/abstract.c:2375
#21 0x0121c5bd in __pyx_f_py_7matrix2_6Matrix_permanent (__pyx_v_self=0x9c37194, unused=0x0) at sage/matrix/matrix2.c:1633
#22 0x0805a277 in PyObject_Call (func=0x92c1a54, arg=0xb7f6d02c, kw=0x0) at Objects/abstract.c:1860
---Type <return> to continue, or q <return> to quit---
#23 0x080be7cc in PyEval_CallObjectWithKeywords (func=0xa71282c, arg=0xb7f6d02c, kw=0x0) at Python/ceval.c:3433
#24 0x0805a490 in PyObject_CallObject (o=0xa71282c, a=0x0) at Objects/abstract.c:1851
#25 0x0121b491 in __pyx_f_py_7matrix2_6Matrix_permanental_minor (__pyx_v_self=0x9c37fa4, __pyx_arg_k=0x8f94ee4)
at sage/matrix/matrix2.c:2074
#26 0x0805a277 in PyObject_Call (func=0x92c1a54, arg=0x9186b4c, kw=0x0) at Objects/abstract.c:1860
#27 0x080be7cc in PyEval_CallObjectWithKeywords (func=0xa70ab2c, arg=0x9186b4c, kw=0x0) at Python/ceval.c:3433
#28 0x0805a490 in PyObject_CallObject (o=0xa70ab2c, a=0x9186b4c) at Objects/abstract.c:1851
#29 0x01212462 in __pyx_f_py_7matrix2_6Matrix_rook_vector (__pyx_v_self=0x9c37fa4, __pyx_args=0xb7f6d02c, __pyx_kwds=0x0)
at sage/matrix/matrix2.c:2399
#30 0x080c539c in PyEval_EvalFrameEx (f=0x9c58bf4, throwflag=0) at Python/ceval.c:3564
#31 0x080c57c5 in PyEval_EvalFrameEx (f=0x9c662b4, throwflag=0) at Python/ceval.c:3650
#32 0x080c65d5 in PyEval_EvalCodeEx (co=0x9c37068, globals=0xa7048ac, locals=0xa7048ac, args=0x0, argcount=0, kws=0x0,
kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2831
#33 0x080c6647 in PyEval_EvalCode (co=0x9c37068, globals=0xa7048ac, locals=0xa7048ac) at Python/ceval.c:494
#34 0x080bdc01 in builtin_eval (self=0x0, args=0xa70886c) at Python/bltinmodule.c:571
#35 0x080c539c in PyEval_EvalFrameEx (f=0x9ba3cbc, throwflag=0) at Python/ceval.c:3564
#36 0x080c65d5 in PyEval_EvalCodeEx (co=0x91c8338, globals=0x91bd1c4, locals=0x0, args=0x9c5af7c, argcount=2, kws=0x9c5af84,
kwcount=0, defs=0x91dc6d8, defcount=1, closure=0x0) at Python/ceval.c:2831
#37 0x080c4a89 in PyEval_EvalFrameEx (f=0x9c5ae2c, throwflag=0) at Python/ceval.c:3660
#38 0x080c57c5 in PyEval_EvalFrameEx (f=0x9b8a354, throwflag=0) at Python/ceval.c:3650
#39 0x080c65d5 in PyEval_EvalCodeEx (co=0x9ae78d8, globals=0xa7048ac, locals=0xa7048ac, args=0x0, argcount=0, kws=0x0,
kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2831
#40 0x080c5852 in PyEval_EvalFrameEx (f=0x9c5a61c, throwflag=0) at Python/ceval.c:494
---Type <return> to continue, or q <return> to quit---
#41 0x080c65d5 in PyEval_EvalCodeEx (co=0x91871d0, globals=0x91768ac, locals=0x0, args=0x9ba09c8, argcount=2, kws=0x9ba09d0,
kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2831
#42 0x080c4a89 in PyEval_EvalFrameEx (f=0x9ba087c, throwflag=0) at Python/ceval.c:3660
#43 0x080c65d5 in PyEval_EvalCodeEx (co=0x9187140, globals=0x91768ac, locals=0x0, args=0x9b925e4, argcount=3, kws=0x9b925f0,
kwcount=0, defs=0x91eb178, defcount=2, closure=0x0) at Python/ceval.c:2831
#44 0x080c4a89 in PyEval_EvalFrameEx (f=0x9b9249c, throwflag=0) at Python/ceval.c:3660
#45 0x080c57c5 in PyEval_EvalFrameEx (f=0xa743c0c, throwflag=0) at Python/ceval.c:3650
#46 0x080c65d5 in PyEval_EvalCodeEx (co=0x9177e78, globals=0x91768ac, locals=0x0, args=0xa7441c4, argcount=2, kws=0xa7441cc,
kwcount=0, defs=0x91eb4d8, defcount=1, closure=0x0) at Python/ceval.c:2831
#47 0x080c4a89 in PyEval_EvalFrameEx (f=0xa744084, throwflag=0) at Python/ceval.c:3660
#48 0x080c65d5 in PyEval_EvalCodeEx (co=0x9177da0, globals=0x91768ac, locals=0x0, args=0x906c7d0, argcount=2, kws=0x906c7d8,
kwcount=0, defs=0x91eb4b8, defcount=1, closure=0x0) at Python/ceval.c:2831
#49 0x080c4a89 in PyEval_EvalFrameEx (f=0x906c68c, throwflag=0) at Python/ceval.c:3660
#50 0x080c65d5 in PyEval_EvalCodeEx (co=0x9122d58, globals=0x912968c, locals=0x0, args=0x8f9fe4c, argcount=1, kws=0x8f9fe50,
kwcount=2, defs=0x915cb18, defcount=2, closure=0x0) at Python/ceval.c:2831
#51 0x080c4a89 in PyEval_EvalFrameEx (f=0x8f9fd14, throwflag=0) at Python/ceval.c:3660
#52 0x080c65d5 in PyEval_EvalCodeEx (co=0xb7f32530, globals=0xb7f85acc, locals=0xb7f85acc, args=0x0, argcount=0, kws=0x0,
kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2831
#53 0x080c6647 in PyEval_EvalCode (co=0xb7f32530, globals=0xb7f85acc, locals=0xb7f85acc) at Python/ceval.c:494
#54 0x080e52c8 in PyRun_FileExFlags (fp=0x8f93068, filename=0xbfe5fc5a "/usr/local/sage/local/bin/sage-gdb-pythonstartup",
start=257, globals=0xb7f85acc, locals=0xb7f85acc, closeit=0, flags=0xbfe5d618) at Python/pythonrun.c:1271
#55 0x080e5557 in PyRun_SimpleFileExFlags (fp=0x8f93068,
filename=0xbfe5fc5a "/usr/local/sage/local/bin/sage-gdb-pythonstartup", closeit=0, flags=0xbfe5d618)
---Type <return> to continue, or q <return> to quit---
at Python/pythonrun.c:877
#56 0x080571a6 in Py_Main (argc=0, argv=0xbfe5d6e4) at Modules/main.c:134
#57 0x08056432 in main (argc=1, argv=0x8144c20) at ./Modules/python.c:23
(gdb)

mabshoff

unread,
Oct 24, 2007, 7:07:14 AM10/24/07
to sage-devel

On Oct 24, 12:31 pm, Jaap Spies <j.sp...@hccnet.nl> wrote:
> Michael,

Hello Jaap,

>
> You wrote:
>
> > dance(10) computes fine on sage.math (in about 6 hours under gdb), I
> > am running dance(11) to see if it finishes [I guess you would like the
> > result ;)].
>
> Yes, sure!

Ok, should it finish I will send you the output.

>
> > So, any chance you are running the computation on a 32 bit
> > box and/or run out of memory/have highly fragmented memory after a
> > long uptime?
>
> There is a long uptime, but I saw in the system console a steady line
> on 80% memory usage.

Ok.

>
> I have to wait for the valgrind run to finish (which will> take a long, long time), to say something definite, but I will look
> > into it some more.
>
> Below the output from bt

Thanks.

The above corresponds to the following lines in matrix2.pyx:279-281:

tmp = []
for cols in lst:
tmp.append(self.prod_of_row_sums(cols))

The crash itself happens when the tmp.append() fails. That indicates a
problem with python's memory management and not an issue in the Sage
code. I would expect python to throw some kind of allocation error,
but since python uses pymalloc it usually asks for a "large" amount of
memory to parcel out smaller chunks for the requests it receives. That
could explain why despite having consumed roughly 80% of memory
available the allocation fails. The other 20% might also be taken up
to a large extent by the kernel and user space, so that makes a
situation where you encounter a "no more memory" situation very
likely. The matrix code in Sage does throw a RuntimeError when it
fails to allocate memory, so I think it is off the hook for now.

Because you run at about 80% memory consumption and have a large
uptime I would suggest to do a reboot and see if the computation still
fails; hopefully your heap fragmentation problem is gone due to the
reboot and the computation will succeed. Depending on the version of
Sage you last tried this with the memory footprint of Sage might have
grown in relative terms so that the computation no longer succeeds
with the amount of RAM you have, so another possibility would be to
increase the amount of swap you have in that box. Adding more physical
RAM will probably makes the problem go away, too.

If anything pops up from the still running debug sessions I will let
you know.

Cheers,

Michael

Jaap Spies

unread,
Oct 24, 2007, 12:20:13 PM10/24/07
to sage-...@googlegroups.com
mabshoff wrote:

I rebooted with no succes. Same error using 50-60% of 1 GB memory.
Swap space is 5 GB.


Jaap

William Stein

unread,
Oct 24, 2007, 12:27:58 PM10/24/07
to sage-...@googlegroups.com
On 10/24/07, Jaap Spies <j.s...@hccnet.nl> wrote:
> > The above corresponds to the following lines in matrix2.pyx:279-281:
> >
> > tmp = []
> > for cols in lst:
> > tmp.append(self.prod_of_row_sums(cols))
> >
> > The crash itself happens when the tmp.append() fails. That indicates a

It would perhaps be worth changing the code to

some_large_integer = 3**(10**5)
tmp = [some_large_integer]*len(lst)
for i in xrange(len(lst)):
tmp[i] = self.prod_of_row_sums(cols[i])

and see what happens. I mean, maybe the problem is in
prod_of_row_sums instead of append?

We really need to replicate this on another 32-bit machine.

By the way, could you remind me where dance is defined?

sage: search_src('dance')
[nothing]
sage:

> > problem with python's memory management and not an issue in the Sage
> > code. I would expect python to throw some kind of allocation error,
> > but since python uses pymalloc it usually asks for a "large" amount of
> > memory to parcel out smaller chunks for the requests it receives. That
> > could explain why despite having consumed roughly 80% of memory
> > available the allocation fails. The other 20% might also be taken up
> > to a large extent by the kernel and user space, so that makes a
> > situation where you encounter a "no more memory" situation very
> > likely. The matrix code in Sage does throw a RuntimeError when it
> > fails to allocate memory, so I think it is off the hook for now.
> >
> > Because you run at about 80% memory consumption and have a large
> > uptime I would suggest to do a reboot and see if the computation still
> > fails; hopefully your heap fragmentation problem is gone due to the
> > reboot and the computation will succeed. Depending on the version of
> > Sage you last tried this with the memory footprint of Sage might have
> > grown in relative terms so that the computation no longer succeeds
> > with the amount of RAM you have, so another possibility would be to
> > increase the amount of swap you have in that box. Adding more physical
> > RAM will probably makes the problem go away, too.
> >
>
> I rebooted with no succes. Same error using 50-60% of 1 GB memory.
> Swap space is 5 GB.
>
>
> Jaap
>
>
> > If anything pops up from the still running debug sessions I will let
> > you know.
> >
>
>
> >
>


--
William Stein
Associate Professor of Mathematics
University of Washington
http://wstein.org

mabshoff

unread,
Oct 24, 2007, 12:36:25 PM10/24/07
to sage-devel

On Oct 24, 6:27 pm, "William Stein" <wst...@gmail.com> wrote:


> On 10/24/07, Jaap Spies <j.sp...@hccnet.nl> wrote:
>
> > > The above corresponds to the following lines in matrix2.pyx:279-281:
>
> > > tmp = []
> > > for cols in lst:
> > > tmp.append(self.prod_of_row_sums(cols))
>
> > > The crash itself happens when the tmp.append() fails. That indicates a
>
> It would perhaps be worth changing the code to
>
> some_large_integer = 3**(10**5)
> tmp = [some_large_integer]*len(lst)
> for i in xrange(len(lst)):
> tmp[i] = self.prod_of_row_sums(cols[i])
>
> and see what happens. I mean, maybe the problem is in
> prod_of_row_sums instead of append?
>
> We really need to replicate this on another 32-bit machine.

I am on that, I got a 32 bit build of 2.8.9.alpha0.


>
> By the way, could you remind me where dance is defined?
>
> sage: search_src('dance')
> [nothing]
> sage:
>
>

It is in matrix/matrix2.pyx

>
> > > problem with python's memory management and not an issue in the Sage
> > > code. I would expect python to throw some kind of allocation error,
> > > but since python uses pymalloc it usually asks for a "large" amount of
> > > memory to parcel out smaller chunks for the requests it receives. That
> > > could explain why despite having consumed roughly 80% of memory
> > > available the allocation fails. The other 20% might also be taken up
> > > to a large extent by the kernel and user space, so that makes a
> > > situation where you encounter a "no more memory" situation very
> > > likely. The matrix code in Sage does throw a RuntimeError when it
> > > fails to allocate memory, so I think it is off the hook for now.
>
> > > Because you run at about 80% memory consumption and have a large
> > > uptime I would suggest to do a reboot and see if the computation still
> > > fails; hopefully your heap fragmentation problem is gone due to the
> > > reboot and the computation will succeed. Depending on the version of
> > > Sage you last tried this with the memory footprint of Sage might have
> > > grown in relative terms so that the computation no longer succeeds
> > > with the amount of RAM you have, so another possibility would be to
> > > increase the amount of swap you have in that box. Adding more physical
> > > RAM will probably makes the problem go away, too.
>
> > I rebooted with no succes. Same error using 50-60% of 1 GB memory.
> > Swap space is 5 GB.
>
> > Jaap
>

Ok, I will investigate on a 32 bit box then. Could you give us
distribution/gcc and so on please? It might be related to specific
compilers.

> > > If anything pops up from the still running debug sessions I will let
> > > you know.
>

Cheers,

Michael

Jaap Spies

unread,
Oct 24, 2007, 12:52:51 PM10/24/07
to sage-...@googlegroups.com
mabshoff wrote:
>
>
> On Oct 24, 6:27 pm, "William Stein" <wst...@gmail.com> wrote:
>> On 10/24/07, Jaap Spies <j.sp...@hccnet.nl> wrote:

>
> I am on that, I got a 32 bit build of 2.8.9.alpha0.
>> By the way, could you remind me where dance is defined?
>>
>> sage: search_src('dance')
>> [nothing]
>> sage:
>>
>>

It is not in matrix2.pyx, It is on the bottom of my first message and heer below.

It uses some functions/methods present in matrix2.pyx:
rook_vector, permanental_minor, prod_of_row_sums, permanent.


>>>> increase the amount of swap you have in that box. Adding more physical
>>>> RAM will probably makes the problem go away, too.
>>> I rebooted with no succes. Same error using 50-60% of 1 GB memory.
>>> Swap space is 5 GB.
>>> Jaap
>
> Ok, I will investigate on a 32 bit box then. Could you give us
> distribution/gcc and so on please? It might be related to specific
> compilers.
>

Linux paix 2.6.22.9-91.fc7 #1 SMP Thu Sep 27 23:10:59 EDT 2007 i686 i686 i386 GNU/Linux

gcc (GCC) 4.1.2 20070925 (Red Hat 4.1.2-27)

Jaap

-------

##########################################################################
# Copyright (C) 2006 Jaap Spies, jaap...@gmail.com

mabshoff

unread,
Oct 24, 2007, 1:00:14 PM10/24/07
to sage-devel

On Oct 24, 6:52 pm, Jaap Spies <j.sp...@hccnet.nl> wrote:
> mabshoff wrote:
>
> > On Oct 24, 6:27 pm, "William Stein" <wst...@gmail.com> wrote:
> >> On 10/24/07, Jaap Spies <j.sp...@hccnet.nl> wrote:
>
> > I am on that, I got a 32 bit build of 2.8.9.alpha0.
> >> By the way, could you remind me where dance is defined?
>
> >> sage: search_src('dance')
> >> [nothing]
> >> sage:
>
> It is not in matrix2.pyx, It is on the bottom of my first message and heer below.
>
> It uses some functions/methods present in matrix2.pyx:
> rook_vector, permanental_minor, prod_of_row_sums, permanent.
>

Yep, you are obviously right, sorry for the confusion.

> >>>> increase the amount of swap you have in that box. Adding more physical
> >>>> RAM will probably makes the problem go away, too.
> >>> I rebooted with no succes. Same error using 50-60% of 1 GB memory.
> >>> Swap space is 5 GB.
> >>> Jaap
>
> > Ok, I will investigate on a 32 bit box then. Could you give us
> > distribution/gcc and so on please? It might be related to specific
> > compilers.
>
> Linux paix 2.6.22.9-91.fc7 #1 SMP Thu Sep 27 23:10:59 EDT 2007 i686 i686 i386 GNU/Linux
>
> gcc (GCC) 4.1.2 20070925 (Red Hat 4.1.2-27)

Great, I have

Linux localhost.localdomain 2.6.22.1-41.fc7 #1 SMP Fri Jul 27 17:26:33


EDT 2007 i686 i686 i386 GNU/Linux

gcc version 4.1.2 20070626 (Red Hat 4.1.2-13)

so I will apply the latest FC7 updates.

>
> Jaap
>

Cheers,

Michael

> -------
>
> ##########################################################################
> # Copyright (C) 2006 Jaap Spies, jaapsp...@gmail.com

William Stein

unread,
Oct 24, 2007, 1:00:32 PM10/24/07
to sage-...@googlegroups.com
On 10/24/07, Jaap Spies <j.s...@hccnet.nl> wrote:
>
> It is not in matrix2.pyx, It is on the bottom of my first message and here below.

>
> It uses some functions/methods present in matrix2.pyx:
> rook_vector, permanental_minor, prod_of_row_sums, permanent.

I think you should submit a patch to Sage so that this code gets
included standard. It could go somewhere in the combinat directory.
Is it a sloane sequences? It shouldn't be in sage.all by default,
but should be easy for users to import.

mabshoff

unread,
Oct 25, 2007, 5:31:46 AM10/25/07
to sage-devel

On Oct 24, 7:00 pm, "William Stein" <wst...@gmail.com> wrote:


> On 10/24/07, Jaap Spies <j.sp...@hccnet.nl> wrote:
>
>
>
> > It is not in matrix2.pyx, It is on the bottom of my first message and here below.
>
> > It uses some functions/methods present in matrix2.pyx:
> > rook_vector, permanental_minor, prod_of_row_sums, permanent.
>
> I think you should submit a patch to Sage so that this code gets
> included standard. It could go somewhere in the combinat directory.
> Is it a sloane sequences? It shouldn't be in sage.all by default,
> but should be easy for users to import.
>
>
>
>
>
> > >>>> increase the amount of swap you have in that box. Adding more physical
> > >>>> RAM will probably makes the problem go away, too.
> > >>> I rebooted with no succes. Same error using 50-60% of 1 GB memory.
> > >>> Swap space is 5 GB.
> > >>> Jaap
>
> > > Ok, I will investigate on a 32 bit box then. Could you give us
> > > distribution/gcc and so on please? It might be related to specific
> > > compilers.
>
> > Linux paix 2.6.22.9-91.fc7 #1 SMP Thu Sep 27 23:10:59 EDT 2007 i686 i686 i386 GNU/Linux
>
> > gcc (GCC) 4.1.2 20070925 (Red Hat 4.1.2-27)
>
> > Jaap
>
> > -------
>
> > ##########################################################################

> > # Copyright (C) 2006 Jaap Spies, jaapsp...@gmail.com

Hello folks,

over night I got a better bt on my local 32 bit FC 7 box:

sage: time dance(10)

Program received signal SIGSEGV, Segmentation fault.

[Switching to Thread -1208650048 (LWP 3143)]
0x00786473 in strlen () from /lib/libc.so.6
(gdb)
(gdb) bt
#0 0x00786473 in strlen () from /lib/libc.so.6
#1 0x0809354f in PyString_FromFormatV (format=0x811ef6c "'%.200s'
object cannot be interpreted as an index",
vargs=<value optimized out>) at Objects/stringobject.c:202
#2 0x080d4ee0 in PyErr_Format (exception=0x813b560, format=0x811ef6c
"'%.200s' object cannot be interpreted as an index")
at Python/errors.c:522
#3 0x0805d578 in PyNumber_Index (item=0x0) at Objects/abstract.c:965
#4 0x011ac5ee in
__pyx_f_py_20matrix_integer_dense_20Matrix_integer_dense_prod_of_row_sums
(__pyx_v_self=0xa4a8104,
__pyx_v_cols=0xaf96ecc) at sage/matrix/matrix_integer_dense.c:
13576
#5 0x0805a277 in PyObject_Call (func=0x0, arg=0xa4a048c, kw=0x0) at
Objects/abstract.c:1860
#6 0x080be7cc in PyEval_CallObjectWithKeywords (func=0xaf9b62c,
arg=0xa4a048c, kw=0x0) at Python/ceval.c:3433
#7 0x0805a490 in PyObject_CallObject (o=0xaf9b62c, a=0xa4a048c) at
Objects/abstract.c:1851
#8 0x07db98c6 in __pyx_f_py_7matrix2_6Matrix_permanent
(__pyx_v_self=0xa4a8104, unused=0x0) at sage/matrix/matrix2.c:1657
#9 0x0805a277 in PyObject_Call (func=0x0, arg=0xb7f1702c, kw=0x0) at
Objects/abstract.c:1860
#10 0x080be7cc in PyEval_CallObjectWithKeywords (func=0xaf9694c,
arg=0xb7f1702c, kw=0x0) at Python/ceval.c:3433
#11 0x0805a490 in PyObject_CallObject (o=0xaf9694c, a=0x0) at Objects/
abstract.c:1851
#12 0x07db8711 in __pyx_f_py_7matrix2_6Matrix_permanental_minor
(__pyx_v_self=0xa4a8a94, __pyx_arg_k=0x97f8ee4)
at sage/matrix/matrix2.c:2077
#13 0x0805a277 in PyObject_Call (func=0x0, arg=0xa41c44c, kw=0x0) at
Objects/abstract.c:1860
#14 0x080be7cc in PyEval_CallObjectWithKeywords (func=0xaf87cec,
arg=0xa41c44c, kw=0x0) at Python/ceval.c:3433
#15 0x0805a490 in PyObject_CallObject (o=0xaf87cec, a=0xa41c44c) at
Objects/abstract.c:1851
#16 0x07daf6e2 in __pyx_f_py_7matrix2_6Matrix_rook_vector
(__pyx_v_self=0xa4a8a94, __pyx_args=0xb7f1702c, __pyx_kwds=0x0)
at sage/matrix/matrix2.c:2402
#17 0x080c539c in PyEval_EvalFrameEx (f=0xb00497c, throwflag=0) at
Python/ceval.c:3564
#18 0x080c57c5 in PyEval_EvalFrameEx (f=0xb004814, throwflag=0) at
Python/ceval.c:3650
#19 0x080c65d5 in PyEval_EvalCodeEx (co=0xa4a8338, globals=0xaf1035c,
locals=0xaf1035c, args=0x0, argcount=0, kws=0x0,


kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:
2831

#20 0x080c6647 in PyEval_EvalCode (co=0xa4a8338, globals=0xaf1035c,
locals=0xaf1035c) at Python/ceval.c:494
#21 0x080bdc01 in builtin_eval (self=0x0, args=0xaf96c8c) at Python/
bltinmodule.c:571
#22 0x080c539c in PyEval_EvalFrameEx (f=0xb00466c, throwflag=0) at
Python/ceval.c:3564
#23 0x080c65d5 in PyEval_EvalCodeEx (co=0x9a345c0, globals=0x9a2c3e4,
locals=0x0, args=0xb00463c, argcount=2,
kws=0xb004644, kwcount=0, defs=0x9a4a8d8, defcount=1, closure=0x0)
at Python/ceval.c:2831
#24 0x080c4a89 in PyEval_EvalFrameEx (f=0xb0044ec, throwflag=0) at
Python/ceval.c:3660
#25 0x080c57c5 in PyEval_EvalFrameEx (f=0xb004394, throwflag=0) at
Python/ceval.c:3650
#26 0x080c65d5 in PyEval_EvalCodeEx (co=0xa4a8218, globals=0xaf1035c,
locals=0xaf1035c, args=0x0, argcount=0, kws=0x0,


kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:
2831

#27 0x080c5852 in PyEval_EvalFrameEx (f=0xaff7aec, throwflag=0) at


Python/ceval.c:494
---Type <return> to continue, or q <return> to quit---

#28 0x080c65d5 in PyEval_EvalCodeEx (co=0x99ebad0, globals=0x99ddacc,
locals=0x0, args=0xaff7ab8, argcount=2,
kws=0xaff7ac0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at
Python/ceval.c:2831
#29 0x080c4a89 in PyEval_EvalFrameEx (f=0xaff796c, throwflag=0) at
Python/ceval.c:3660
#30 0x080c65d5 in PyEval_EvalCodeEx (co=0x99eba40, globals=0x99ddacc,
locals=0x0, args=0xaff7944, argcount=3,
kws=0xaff7950, kwcount=0, defs=0x9a5a338, defcount=2, closure=0x0)
at Python/ceval.c:2831
#31 0x080c4a89 in PyEval_EvalFrameEx (f=0xaff77fc, throwflag=0) at
Python/ceval.c:3660
#32 0x080c57c5 in PyEval_EvalFrameEx (f=0xafc9604, throwflag=0) at
Python/ceval.c:3650
#33 0x080c65d5 in PyEval_EvalCodeEx (co=0x99eb7b8, globals=0x99ddacc,
locals=0x0, args=0xaf85b7c, argcount=2,
kws=0xaf85b84, kwcount=0, defs=0x9a5a698, defcount=1, closure=0x0)
at Python/ceval.c:2831
#34 0x080c4a89 in PyEval_EvalFrameEx (f=0xaf85a3c, throwflag=0) at
Python/ceval.c:3660
#35 0x080c65d5 in PyEval_EvalCodeEx (co=0x99eb6e0, globals=0x99ddacc,
locals=0x0, args=0x98c98f0, argcount=2,
kws=0x98c98f8, kwcount=0, defs=0x9a5a678, defcount=1, closure=0x0)
at Python/ceval.c:2831
#36 0x080c4a89 in PyEval_EvalFrameEx (f=0x98c97ac, throwflag=0) at
Python/ceval.c:3660
#37 0x080c65d5 in PyEval_EvalCodeEx (co=0x999a410, globals=0x99988ac,
locals=0x0, args=0x980366c, argcount=1,
kws=0x9803670, kwcount=2, defs=0x99c1c38, defcount=2, closure=0x0)
at Python/ceval.c:2831
#38 0x080c4a89 in PyEval_EvalFrameEx (f=0x9803534, throwflag=0) at
Python/ceval.c:3660
#39 0x080c65d5 in PyEval_EvalCodeEx (co=0xb7edd5c0,
globals=0xb7f2facc, locals=0xb7f2facc, args=0x0, argcount=0, kws=0x0,


kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:
2831

#40 0x080c6647 in PyEval_EvalCode (co=0xb7edd5c0, globals=0xb7f2facc,
locals=0xb7f2facc) at Python/ceval.c:494
#41 0x080e52c8 in PyRun_FileExFlags (fp=0x982c788,
filename=0xbff57c9e "/tmp/Work/sage-2.8.9.alpha0/local/bin/sage-
gdb-pythonstartup", start=257, globals=0xb7f2facc,
locals=0xb7f2facc, closeit=0, flags=0xbff55698) at Python/
pythonrun.c:1271
#42 0x080e5557 in PyRun_SimpleFileExFlags (fp=0x982c788,
filename=0xbff57c9e "/tmp/Work/sage-2.8.9.alpha0/local/bin/sage-
gdb-pythonstartup", closeit=0, flags=0xbff55698)
at Python/pythonrun.c:877
#43 0x080571a6 in Py_Main (argc=0, argv=0xbff55764) at Modules/main.c:
134
#44 0x08056432 in main (argc=Cannot access memory at address 0x0
) at ./Modules/python.c:23
(gdb)

I will have a look into this later. Any help is appreciated ;)

Cheers,

Michael

mabshoff

unread,
Oct 26, 2007, 5:43:07 PM10/26/07
to sage-devel
Hello Jaap,

I assume you care about the following (computed on sage.math):

sage: dance(11)
h^11 - 44*h^10 + 1045*h^9 - 16500*h^8 + 187935*h^7 - 1595748*h^6 +
10199343*h^5 - 48691500*h^4 + 169140180*h^3 - 405230320*h^2 +
600311624*h - 415232800

I am currently computing dance(12) on sage.math. William: feel free to
kill pid 5655 in case you consider this a waste of your resources.

I had to kill the valgrind process on my local box because valgrind
stopped counting errors after the 1,000,000th. I also extrapolated how
long it would take and I am not willing to wait a month for it to
finish (if it does at all).

I started looking at the code itself to see if we cannot come up with
an easier to run test case that segfaults and currently I am looking
at the components like permanent() and so on that are using via
rook_vector. So if you have any ideas where that code could overflow
on 32 bit please let me know.

Cheers,

Michael

Jaap Spies

unread,
Oct 26, 2007, 6:25:17 PM10/26/07
to sage-...@googlegroups.com
mabshoff wrote:
> Hello Jaap,
>
> I assume you care about the following (computed on sage.math):
>
> sage: dance(11)
> h^11 - 44*h^10 + 1045*h^9 - 16500*h^8 + 187935*h^7 - 1595748*h^6 +
> 10199343*h^5 - 48691500*h^4 + 169140180*h^3 - 405230320*h^2 +
> 600311624*h - 415232800
>

Whow! This is great! I can now updat the corresponding entry in the OEIS.
My next wish is to find a formula for this kind of permanents!


I'm improving speed of the methods in matrix2.pyx in the meantime.
I'm makeing progress.

>> sage: time dance(9)
>> h^9 - 27*h^8 + 414*h^7 - 4158*h^6 + 29421*h^5 - 148743*h^4 + 530796*h^3 - 1276992*h^2 + 1866384*h - 1255608

>> CPU times: user 1453.75 s, sys: 8.03 s, total: 1461.78 s
>> Wall time: 1487.04

Actually I'm running dance(10) now on my machine and I hope there will be no
segfaults, because I simplified the code. It will take less than 3.5 hours to finish.

> I am currently computing dance(12) on sage.math. William: feel free to
> kill pid 5655 in case you consider this a waste of your resources.
>
> I had to kill the valgrind process on my local box because valgrind
> stopped counting errors after the 1,000,000th. I also extrapolated how
> long it would take and I am not willing to wait a month for it to
> finish (if it does at all).
>

Cheers!!!

> I started looking at the code itself to see if we cannot come up with
> an easier to run test case that segfaults and currently I am looking
> at the components like permanent() and so on that are using via
> rook_vector. So if you have any ideas where that code could overflow
> on 32 bit please let me know.
>

There are some issues related to the very art of permanents. There is no
one who want to compute the permanent of a matrix larger than an int x int!
In the literature you only will find calculations of certain types
of (0,1) matrices larger than 50 x 50, except the trivial ones.

I did send a patch for trac #931 to speed up the _choose part,
and also boost things up with a local variant of the binomial
function, more to come!(?)

Jaap


Jaap Spies

unread,
Oct 26, 2007, 7:57:46 PM10/26/07
to sage-...@googlegroups.com
Jaap Spies wrote:

>
> Actually I'm running dance(10) now on my machine and I hope there will be no
> segfaults, because I simplified the code. It will take less than 3.5 hours to finish.
>

Sorry, my hope was idle: same error!!

Jaap

mabshoff

unread,
Oct 27, 2007, 2:06:55 AM10/27/07
to sage-devel

On Oct 27, 1:57 am, Jaap Spies <j.sp...@hccnet.nl> wrote:
> Jaap Spies wrote:

Hello Jaap,

>
> > Actually I'm running dance(10) now on my machine and I hope there will be no
> > segfaults, because I simplified the code. It will take less than 3.5 hours to finish.
>
> Sorry, my hope was idle: same error!!

Arrg, change of strategy: Since rook_vector() consumes the vast amount
of time to compute I am trying to compute it on my box. If it does
compute without segfaulting for dance(10) I will then run the rest of
the code under valgrind, otherwise I will start adding debug print
statements into the code the rook_vector() code and the components
used. I am also running 2.8.9 + your patch from #931, which has
already been merged for 2.8.10.alpha0.

On a side note: Could you comment on #217? It is rather vague and
there have been some improvements. If you have more improvements
relative to #931 you should open another/more tickets for it/them.

>
> Jaap

Cheers,

Michael

Jaap Spies

unread,
Oct 27, 2007, 7:48:21 AM10/27/07
to sage-...@googlegroups.com
Hi Michael,

You wrote:


> On a side note: Could you comment on #217? It is rather vague and
> there have been some improvements. If you have more improvements
> relative to #931 you should open another/more tickets for it/them.
>

Now you have changed #217 to milestone sage-2.8.10, I'll
better send a patch here, so this old ticket can be closed.

To be honnest, I never saw this ticket before!

Jaap

mabshoff

unread,
Oct 27, 2007, 8:06:26 AM10/27/07
to sage-devel

On Oct 27, 1:48 pm, Jaap Spies <j.sp...@hccnet.nl> wrote:
> Hi Michael,
>
> You wrote:
> > On a side note: Could you comment on #217? It is rather vague and
> > there have been some improvements. If you have more improvements
> > relative to #931 you should open another/more tickets for it/them.
>
> Now you have changed #217 to milestone sage-2.8.10, I'll
> better send a patch here, so this old ticket can be closed.

:), but feel free to change the title of the ticket and also the
milestone it is tagged against. William said that it is one of those
very vague generic milestones from the old days and we have been
invalidating those and usually open more specific tickets in place of
them.

>
> To be honnest, I never saw this ticket before!

I came across it today by accident searching for another ticket, so
you are not alone.

dance(10) crashed for me inside rook_vector(), so I will be taking it
from here by adding debug output to nail the exact place where it
happens.

> Jaap

Cheers,

Michael

Jaap Spies

unread,
Oct 27, 2007, 8:57:02 AM10/27/07
to sage-...@googlegroups.com
mabshoff wrote:
>
>
> On Oct 27, 1:48 pm, Jaap Spies <j.sp...@hccnet.nl> wrote:

>> Now you have changed #217 to milestone sage-2.8.10, I'll
>> better send a patch here, so this old ticket can be closed.
>
> :), but feel free to change the title of the ticket and also the
> milestone it is tagged against.

I attached a patch bundle. If it gets accepted #217 can be closed
and, as you said, more specific tickets can be opened.


Jaap


Permanents are here forever!

mabshoff

unread,
Oct 28, 2007, 4:44:54 AM10/28/07
to sage-devel
Okay, I got some new results:

Changing permanent() slightly in matrix2.pyx:

print "entering permanent() - m: ",m," n: ",n
from sage.rings.arith import binomial
for r from 1 <= r < m+1:
lst = _choose(n, r)
print "lst", lst


tmp = []
for cols in lst:
tmp.append(self.prod_of_row_sums(cols))

s = sum(tmp)
# sn = (-1)^(m-r)
if (m - r) % 2 == 0:
sn = 1
else:
sn = -1
perm = perm + sn * binomial(n-r, m-r) * s
print "exiting permanent() - perm: ", perm

Where _choose is defined as:

def _choose(int n, int t):
"""
Returns all possible sublists of length t from range(n)

Based on algoritm L from Knuth's taocp part 4: 7.2.1.3 p.4

AUTHOR:
-- Jaap Spies (2007-10-22)
"""

x = []
c = range(t)
c.append(n)
c.append(0)
j = 0

while j < t:
x.append(c[:t])
j = 0
while c[j]+1 == c[j+1]:
c[j] = j
j = j+1
c[j] = c[j]+1

return x
[end of _choose code]

At some point the refcount for the list elements goes FUBAR (this is
not the first occurence of this, but those scrolled off my screen):

We should get (r=1 in this case):
entering permanent() - m: 7 n: 7
lst [[1],[2],[3],[4],[5],[6]]

But we get:

entering permanent() - m: 7 n: 7
lst [[<refcnt -2137432134 at 0x92f70ac>], [<refcnt -2137432183 at
0x92f70a0>], [<refcnt -2137464436 at 0x92f7094>], [<refcnt -2138866030
at 0x92f7088>], [4], [5], [6]]

I assume on 64 bit boxen the refcount is 64 bit, so we never see the
issue on sage.math.

So does anybody have any idea what is wrong?

Cheers,

Michael

mabshoff

unread,
Oct 29, 2007, 3:53:47 PM10/29/07
to sage-devel
I did look into this some more and I did verify that the refcounts
wraps:

lst [[0, 1, 2, 3, 4, 5, 6]]
lst [[0], [1], [2], [3], [4], [5], [6]]
lst [[0, 1], [0, 2], [1, 2], [0, 3], [1, 3], [2, 3], [0, 4], [1, 4],
[2, 4], [3, 4], [0, 5], [1, 5], [2, 5], [3, 5], [4, 5],
[0, 6], [1, 6], [2, 6], [3, 6], [4, 6], [5, 6]]
lst [[0, 1, 2], [0, 1, 3], [0, 2, 3], [1, 2, 3], [0, 1, 4], [0, 2, 4],
[1, 2, 4], [0, 3, 4], [1, 3, 4], [2, 3, 4], [0, 1, 5]
, [0, 2, 5], [1, 2, 5], [0, 3, 5], [1, 3, 5], [2, 3, 5], [0, 4, 5],
[1, 4, 5], [2, 4, 5], [3, 4, 5], [0, 1, 6], [0, 2, 6], [
1, 2, 6], [0, 3, 6], [1, 3, 6], [2, 3, 6], [0, 4, 6], [1, 4, 6], [2,
4, 6], [3, 4, 6], [0, 5, 6], [1, 5, 6], [2, 5, 6], [3,
5, 6], [4, 5, 6]]
lst [[<refcnt -2147483582 at 0x97550ac>, 1, 2, 3], [<refcnt
-2147483582 at 0x97550ac>, 1, 2, 4], [<refcnt -2147483582 at 0x9
7550ac>, 1, 3, 4], [<refcnt -2147483582 at 0x97550ac>, 2, 3, 4], [1,
2, 3, 4], [<refcnt -2147483582 at 0x97550ac>, 1, 2, 5],
[<refcnt -2147483582 at 0x97550ac>, 1, 3, 5], [<refcnt -2147483582 at
0x97550ac>, 2, 3, 5], [1, 2, 3, 5], [<refcnt -2147483
582 at 0x97550ac>, 1, 4, 5], [<refcnt -2147483582 at 0x97550ac>, 2, 4,
5], [1, 2, 4, 5], [<refcnt -2147483582 at 0x97550ac>,
3, 4, 5], [1, 3, 4, 5], [2, 3, 4, 5], [<refcnt -2147483582 at
0x97550ac>, 1, 2, 6], [<refcnt -2147483582 at 0x97550ac>, 1,
3, 6], [<refcnt -2147483582 at 0x97550ac>, 2, 3, 6], [1, 2, 3, 6],
[<refcnt -2147483582 at 0x97550ac>, 1, 4, 6], [<refcnt -2
147483582 at 0x97550ac>, 2, 4, 6], [1, 2, 4, 6], [<refcnt -2147483582
at 0x97550ac>, 3, 4, 6], [1, 3, 4, 6], [2, 3, 4, 6], [
<refcnt -2147483582 at 0x97550ac>, 1, 5, 6], [<refcnt -2147483582 at
0x97550ac>, 2, 5, 6], [1, 2, 5, 6], [<refcnt -214748358
2 at 0x97550ac>, 3, 5, 6], [1, 3, 5, 6], [2, 3, 5, 6], [<refcnt
-2147483582 at 0x97550ac>, 4, 5, 6], [1, 4, 5, 6], [2, 4, 5,
6], [3, 4, 5, 6]]
lst [[<refcnt -2147483447 at 0x97550ac>, <refcnt -2147483523 at
0x97550a0>, 2, 3, 4], [<refcnt -2147483447 at 0x97550ac>, <r
efcnt -2147483523 at 0x97550a0>, 2, 3, 5], [<refcnt -2147483447 at
0x97550ac>, <refcnt -2147483523 at 0x97550a0>, 2, 4, 5],
[<refcnt -2147483447 at 0x97550ac>, <refcnt -2147483523 at 0x97550a0>,
3, 4, 5], [<refcnt -2147483447 at 0x97550ac>, 2, 3, 4
, 5], [<refcnt -2147483523 at 0x97550a0>, 2, 3, 4, 5], [<refcnt
-2147483447 at 0x97550ac>, <refcnt -2147483523 at 0x97550a0>
, 2, 3, 6], [<refcnt -2147483447 at 0x97550ac>, <refcnt -2147483523 at
0x97550a0>, 2, 4, 6], [<refcnt -2147483447 at 0x97550
ac>, <refcnt -2147483523 at 0x97550a0>, 3, 4, 6], [<refcnt -2147483447
at 0x97550ac>, 2, 3, 4, 6], [<refcnt -2147483523 at 0
x97550a0>, 2, 3, 4, 6], [<refcnt -2147483447 at 0x97550ac>, <refcnt
-2147483523 at 0x97550a0>, 2, 5, 6], [<refcnt -214748344
7 at 0x97550ac>, <refcnt -2147483523 at 0x97550a0>, 3, 5, 6], [<refcnt
-2147483447 at 0x97550ac>, 2, 3, 5, 6], [<refcnt -214
7483523 at 0x97550a0>, 2, 3, 5, 6], [<refcnt -2147483447 at
0x97550ac>, <refcnt -2147483523 at 0x97550a0>, 4, 5, 6], [<refcn
t -2147483447 at 0x97550ac>, 2, 4, 5, 6], [<refcnt -2147483523 at
0x97550a0>, 2, 4, 5, 6], [<refcnt -2147483447 at 0x97550ac
>, 3, 4, 5, 6], [<refcnt -2147483523 at 0x97550a0>, 3, 4, 5, 6], [2, 3, 4, 5, 6]]

The crash happens when the refcount for [0] goes up to zero. I now
tend to blame Cython for this, so can somebody take a closer look at
the code Cython generates for _choose.

I would also conjecture that these kinds of refcounting issues cause a
lot of the "still reachable memory" in a valgrind run.

Carl Witty suggested to turn the refcount macros into functions that
jump to a predefined function in case the refcount exceeds a certain
threshold like 1,000 or even 1,000,000. That way we can search for
those issues and hopefully eliminate them by setting a breakpoint on
that predefined function.

Cheers,

Michael

Jaap Spies

unread,
Oct 29, 2007, 4:18:42 PM10/29/07
to sage-...@googlegroups.com
Michael,

Remembering that dance(10) has a segmentation fault long before
there was a function _choose, is it possible that some other
part of the code is going wild?

Jaap

mabshoff

unread,
Oct 29, 2007, 4:49:23 PM10/29/07
to sage-devel

On Oct 29, 9:18 pm, Jaap Spies <j.sp...@hccnet.nl> wrote:
> Michael,
>

Hello Jaap,

> Remembering that dance(10) has a segmentation fault long before
> there was a function _choose, is it possible that some other
> part of the code is going wild?
>

Sure, but so far no lead. Which specific version did exhibit that
behavior? I can build older releases locally and see if I can come up
with anything.

Now that 2.8.10 with all your speed improvements is out I will do some
more testing with that release on sage.math with a 32 bit binary to
see if I can flush anything out with that.

As I wrote above: The issues with the refcount going insane is a nice
result from the debugging and that will hopefully lead to some
improvements in our fight against memleaks.

> Jaap

Cheers,

Michael

Jaap Spies

unread,
Oct 29, 2007, 5:03:17 PM10/29/07
to sage-...@googlegroups.com
Hi Machael,

You wrote:
>
>
> On Oct 29, 9:18 pm, Jaap Spies <j.sp...@hccnet.nl> wrote:
>> Michael,
>>
>
> Hello Jaap,
>
>> Remembering that dance(10) has a segmentation fault long before
>> there was a function _choose, is it possible that some other
>> part of the code is going wild?
>>
>
> Sure, but so far no lead. Which specific version did exhibit that
> behavior? I can build older releases locally and see if I can come up
> with anything.
>

On 23 Oct I wrote:

To day I got with sage-2.8.8.1:

> > sage: time dance(10)
> >
> >
> > ------------------------------------------------------------
> > Unhandled SIGSEGV: A segmentation fault occured in SAGE.
> > This probably occured because a *compiled* component
> > of SAGE has a bug in it (typically accessing invalid memory)
> > or is not properly wrapped with _sig_on, _sig_off.
> > You might want to run SAGE under gdb with 'sage -gdb' to debug this.
> > SAGE will now terminate (sorry).
> > ------------------------------------------------------------

> Now that 2.8.10 with all your speed improvements is out I will do some
> more testing with that release on sage.math with a 32 bit binary to
> see if I can flush anything out with that.
>

Warning: see the reopened trac #217, I stupid changed some Py_ssize_t
in ints, but knowing the nrows and ncols are small, there will be no
problem, I hope!

> As I wrote above: The issues with the refcount going insane is a nice
> result from the debugging and that will hopefully lead to some
> improvements in our fight against memleaks.
>

Thanks and good luck,

Jaap

mabshoff

unread,
Oct 30, 2007, 3:19:07 AM10/30/07
to sage-devel

On Oct 29, 10:03 pm, Jaap Spies <j.sp...@hccnet.nl> wrote:
> Hi Machael,
>

Hi Jaap,

>
>
> You wrote:
>
> > On Oct 29, 9:18 pm, Jaap Spies <j.sp...@hccnet.nl> wrote:
> >> Michael,
>
> > Hello Jaap,
>
> >> Remembering that dance(10) has a segmentation fault long before
> >> there was a function _choose, is it possible that some other
> >> part of the code is going wild?
>
> > Sure, but so far no lead. Which specific version did exhibit that
> > behavior? I can build older releases locally and see if I can come up
> > with anything.
>
> On 23 Oct I wrote:
>
> To day I got with sage-2.8.8.1:
>

Well, I just looked at the patch where you introduced _choose and the
lst you create is indentical.

> > > sage: time dance(10)
> > >
> > >
> > > ------------------------------------------------------------
> > > Unhandled SIGSEGV: A segmentation fault occured in SAGE.
> > > This probably occured because a *compiled* component
> > > of SAGE has a bug in it (typically accessing invalid memory)
> > > or is not properly wrapped with _sig_on, _sig_off.
> > > You might want to run SAGE under gdb with 'sage -gdb' to debug this.
> > > SAGE will now terminate (sorry).
> > > ------------------------------------------------------------
>
> > Now that 2.8.10 with all your speed improvements is out I will do some
> > more testing with that release on sage.math with a 32 bit binary to
> > see if I can flush anything out with that.
>
> Warning: see the reopened trac #217, I stupid changed some Py_ssize_t
> in ints, but knowing the nrows and ncols are small, there will be no
> problem, I hope!
>

Nope, I understand the argument to change it back, but the change
shouldn't case any trouble.

> > As I wrote above: The issues with the refcount going insane is a nice
> > result from the debugging and that will hopefully lead to some
> > improvements in our fight against memleaks.
>
> Thanks and good luck,

Carl had some interesting input:

[02:03] <cwitty> I was looking at those refcounts some more.
[02:03] <mabshoff> ok
[02:03] <cwitty> It turns out that Python caches and shares small int
objects:
[02:04] <cwitty> sage: a = int(0)
[02:04] <cwitty> sage: sys.getrefcount(a)
[02:04] <cwitty> 6288
[02:04] <cwitty> sage: b = int(0)
[02:04] <cwitty> sage: sys.getrefcount(a)
[02:04] <cwitty> 6289
[02:04] <cwitty> sage: a is b
[02:04] <cwitty> True
[02:04] <mabshoff> ok
[02:04] <cwitty> So the reference-counting problem could be anywhere
that deals with small Python ints.
[02:05] <mabshoff> mmmh.
[02:05] <mabshoff> So it is not the list [0], but the smallint 0 that
has the reference counter overflow?
[02:06] <cwitty> Let me go look at your printouts again. I thought
so, but I didn't look closely.
[02:06] <cwitty> Yeah, that's what it looks like to me. (Just judging
from your next-to-latest e-mail.)
[02:07] <mabshoff> Because the numbers "jump", while I would expect
them to increase by 1 each time.
[02:11] <cwitty> So it looks like between your first and second
printouts, the refcount for "0" jumped by 135, and the refcount for
"1" jumped by at least 125 (to make it wrap).
[02:11] <mabshoff> Yep, otherwise it would take *forever* to even wrap
on a 32 bit box.

So now comes the question: Why doesn't the refcount get decremented?
How do we fix this?

>
> Jaap

Cheers,

Michael

Robert Bradshaw

unread,
Oct 30, 2007, 3:27:43 AM10/30/07
to sage-...@googlegroups.com
I had an error like this and it turned out to be caused by a GC
error. Specifically, an object A was allocated, then erroneously
deallocated (but there was still stuff pointing to it). Object B was
then allocated, with memory overlapping that of object A. Object A
was then modified, and that modified the refcount field of object B.
That might be a variant of what's going on here.

I tracked it down by using gdb's watch command on the object with the
weird ref count...

- Robert

mabshoff

unread,
Oct 30, 2007, 10:39:38 PM10/30/07
to sage-devel

Robert Bradshaw wrote:

Hi Robert,

> I had an error like this and it turned out to be caused by a GC
> error. Specifically, an object A was allocated, then erroneously
> deallocated (but there was still stuff pointing to it). Object B was
> then allocated, with memory overlapping that of object A. Object A
> was then modified, and that modified the refcount field of object B.
> That might be a variant of what's going on here.
>
> I tracked it down by using gdb's watch command on the object with the
> weird ref count...
>
> - Robert

As it turned out it was a bug in 2.8.9's Cython [and probably at least
some of the earlier versions] which Carl tracked down after I isolated
the cause:

[02:30] <mabshoff> Ok, every time c=0 the refcount for int(0) goes up
by one.
[02:42] <cwitty> It's a Cython bug.
[02:42] <mabshoff> Yep, for c in cols:
[02:42] <mabshoff> increments the refcount on int(?)
[02:42] <cwitty> I've been having a hard time debugging it because the
bug doesn't occur any more in the current Cython (so it's gone in
2.8.10).
[02:43] <mabshoff> Ok, let me upgrade and see if the problem goes
away.
[02:44] <mabshoff> Got any more technical details?
[02:45] <cwitty> In 2.8.9's version of Cython, converting a Python
object (like the int objects we're dealing with) to a Py_Ssize_t goes
via this line of code:
[02:45] <cwitty> #define __pyx_PyIndex_AsSsize_t(b)
PyInt_AsSsize_t(PyNumber_Index(b))
[02:46] <cwitty> PyNumber_Index(b) returns a new object (although in
this case it's actually the exact same int object), with its refcount
incremented.
[02:46] <mabshoff> Ok, that makes sense.
[02:46] <cwitty> So the caller of PyNumber_Index() is supposed to
decref the return value at some point.

With 2.8.10 the refcount of int(0) doesn't increase by invoking
dance(m) any more:

sage: a=int(0)
sage: sys.getrefcount(a)
5304
sage: dance(5)
h^5 - 5*h^4 + 25*h^3 - 55*h^2 + 80*h - 46
sage: sys.getrefcount(a)
5304
sage: dance(6)
h^6 - 9*h^5 + 60*h^4 - 225*h^3 + 555*h^2 - 774*h + 484
sage: sys.getrefcount(a)
5304

It used to be that sys.getrefcount(a) would end up around 48,000 after
dance(5), so now I consider this issues closed.

Cheers,

Michael

Reply all
Reply to author
Forward
0 new messages