"Fatal Python error: Cannot recover from stack overflow." in Python 3 tests

2,447 views
Skip to first unread message

Aaron Meurer

unread,
Apr 6, 2012, 3:19:57 AM4/6/12
to sy...@googlegroups.com
Hi.

Recently, we started getting "Fatal Python error: Cannot recover from
stack overflow." in the Python 3 tests (when running setup.py test),
which is immediately followed by "Abort trap: 6", which kills Python.
In Mac OS X, this generates a problem report automatically with
logging information. I've uploaded the test output and the logging
info at https://gist.github.com/2317869. This has shown up in at
least two other people's SymPy-Bot tests (see
https://github.com/sympy/sympy/pull/1208), so I'm pretty sure it's not
a problem on my end.

Does anyone have any idea what could be causing this? I don't see how
it could be anything but a Python bug. I'm currently in the process
of bisecting the history to find the offending commit. It's taking a
while, though, as I have to rerun ./bin/use2to3 and setup.py test to
reproduce it (just running ./bin/test on the failing test file does
not reproduce the error). I'll report back when I get it.

Aaron Meurer

Chris Smith

unread,
Apr 6, 2012, 3:54:21 AM4/6/12
to sy...@googlegroups.com
Note that running tests via bin/test does not generate the error. 

Aaron Meurer

unread,
Apr 6, 2012, 4:03:28 AM4/6/12
to sy...@googlegroups.com
I bisected it to

commit 0856119bd7399a416c21e1692855a1077164f21c
Author: Aaron Meurer <asme...@gmail.com>
Date: Mon Mar 12 23:05:30 2012 -0600

Factor out setup.py test into run_all_tests() in sympy/utilities/runtests.py

This way, we can easily get the functionality of running all the tests in a
forward compatible way, but still have the ability to pass arguments and
keyword arguments to the various test functions.

This would explain why bin/test does not generate the error. The
problem has something to do with setup.py.

I don't know how this causes a problem. I don't think it's related to
the translation with 2to3. A diff on the Python 2 file and the
translated file reveals that the only changes it made to the contents
of that commit are to add () to the end of the "print" lines. I think
it has something to do with calling a function from the sympy module
inside setup.py somehow.

If necessary, this commit can be reverted. Nothing else depends on this change.

By the way, apparently this failure actually showed up in the pull
request implementing this (https://github.com/sympy/sympy/pull/1115),
but it went unnoticed. I think we need to make it more clear in the
bot test summary if there were failures, as it looks too much like the
passed test case in passing.

Aaron Meurer

> --
> You received this message because you are subscribed to the Google Groups
> "sympy" group.
> To post to this group, send email to sy...@googlegroups.com.
> To unsubscribe from this group, send email to
> sympy+un...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/sympy?hl=en.

Aaron Meurer

unread,
Apr 6, 2012, 4:24:38 AM4/6/12
to sy...@googlegroups.com
On Fri, Apr 6, 2012 at 2:03 AM, Aaron Meurer <asme...@gmail.com> wrote:
> I bisected it to
>
> commit 0856119bd7399a416c21e1692855a1077164f21c
> Author: Aaron Meurer <asme...@gmail.com>
> Date:   Mon Mar 12 23:05:30 2012 -0600
>
>    Factor out setup.py test into run_all_tests() in sympy/utilities/runtests.py
>
>    This way, we can easily get the functionality of running all the tests in a
>    forward compatible way, but still have the ability to pass arguments and
>    keyword arguments to the various test functions.
>
> This would explain why bin/test does not generate the error.  The
> problem has something to do with setup.py.
>
> I don't know how this causes a problem.  I don't think it's related to
> the translation with 2to3.  A diff on the Python 2 file and the
> translated file reveals that the only changes it made to the contents
> of that commit are to add () to the end of the "print" lines.  I think
> it has something to do with calling a function from the sympy module
> inside setup.py somehow.
>
> If necessary, this commit can be reverted.  Nothing else depends on this change.
>
> By the way, apparently this failure actually showed up in the pull
> request implementing this (https://github.com/sympy/sympy/pull/1115),
> but it went unnoticed.  I think we need to make it more clear in the
> bot test summary if there were failures, as it looks too much like the
> passed test case in passing.

I created https://github.com/sympy/sympy-bot/issues/106 for this.

Aaron Meurer

Joachim Durchholz

unread,
Apr 6, 2012, 12:42:49 PM4/6/12
to sy...@googlegroups.com
Am 06.04.2012 09:19, schrieb Aaron Meurer:
> Recently, we started getting "Fatal Python error: Cannot recover from
> stack overflow." in the Python 3 tests (when running setup.py test),
> which is immediately followed by "Abort trap: 6", which kills Python.
> In Mac OS X, this generates a problem report automatically with
> logging information. I've uploaded the test output and the logging
> info at https://gist.github.com/2317869. This has shown up in at
> least two other people's SymPy-Bot tests (see
> https://github.com/sympy/sympy/pull/1208), so I'm pretty sure it's not
> a problem on my end.
>
> Does anyone have any idea what could be causing this?

My first bet would be an endless recursion, possibly in SymPy's test
routines, possibly in the Python runtime.
The stack depth is a bit small for that though.

Aaron Meurer

unread,
Apr 7, 2012, 12:33:22 AM4/7/12
to sy...@googlegroups.com
Well, it is infinite recursion (though this is not a surprize, as the
error is a stack overflow). This is the test that causes the problem:

@XFAIL
def test_complex_2899():
# infinite recursion in coth,
https://code.google.com/p/sympy/issues/detail?id=2899
a,b = symbols('a,b', real=True)
for deep in [True,False]:
for func in [sinh, cosh, tanh, coth]:
assert func(a).expand(complex=True,deep=deep) == func(a)

The Python stack is never supposed to overflow, though. It should
raise a RecursionError.

A print statement (actually print function, since this is Python 3)
reveals that the error happens with deep=True, func=coth, which is
actually the first one to cause a recursion error.

I wonder if my patch caused the error simply because it added another
function call to the stack. I can't see how else it would affect it.

Aaron Meurer

Joachim Durchholz

unread,
Apr 8, 2012, 4:13:21 AM4/8/12
to sy...@googlegroups.com
Am 07.04.2012 06:33, schrieb Aaron Meurer:
> Well, it is infinite recursion (though this is not a surprize, as the
> error is a stack overflow). [...]

>
> The Python stack is never supposed to overflow, though. It should
> raise a RecursionError.

This could be a bug in Python's stack overflow detection.
That wouldn't be a surprise, detecting stack overflows is a trade-off
between being accurate and being fast, and bugs are hard to trigger
reliably so it's not easy to test all potential cases. Such a
constellation can start producing bugs at the drop of a hat.

Aaron Meurer

unread,
Apr 9, 2012, 2:50:42 PM4/9/12
to sy...@googlegroups.com
Yeah, this is clearly a Python bug. I've submitted
http://bugs.python.org/issue14537 for this.

In the meantime, we need to do one of the following:

- Revert the above commit
- Fix the recursion error bug
- Comment out the XFAIL test.

So that the tests can be run in Python 3 again. If we choose one of
the second two options, there may be other similar XFAIL tests that
we'll have to do the same for.

Aaron Meurer

Aaron Meurer

unread,
Apr 9, 2012, 11:31:11 PM4/9/12
to sy...@googlegroups.com
Well, they've already got a patch for that on the Python issue. It
will likely be fixed in Python 3.3.

In the meanwhile, if I understand the issue correctly, it should be
enough to just fix this one recursion error. So I think we should
just fix issue 2899, and even if there are other recursion problems
they shouldn't show up (unless we are very unlucky). We need to fix
that issue anyway.

Aaron Meurer

Ondřej Čertík

unread,
Apr 10, 2012, 12:48:39 AM4/10/12
to sy...@googlegroups.com
On Mon, Apr 9, 2012 at 8:31 PM, Aaron Meurer <asme...@gmail.com> wrote:
> Well, they've already got a patch for that on the Python issue.  It
> will likely be fixed in Python 3.3.
>
> In the meanwhile, if I understand the issue correctly, it should be
> enough to just fix this one recursion error.  So I think we should
> just fix issue 2899, and even if there are other recursion problems
> they shouldn't show up (unless we are very unlucky).  We need to fix
> that issue anyway.

I would just comment out that particular test, since it is XFAILing anyway.

Ondrej

Joachim Durchholz

unread,
Apr 10, 2012, 1:03:01 AM4/10/12
to sy...@googlegroups.com
Am 10.04.2012 05:31, schrieb Aaron Meurer:
> So I think we should
> just fix issue 2899, and even if there are other recursion problems
> they shouldn't show up (unless we are very unlucky). We need to fix
> that issue anyway.

As far as I understand the situation, we have an infinite recursion on
the SymPy side, which also crashes the Python side of things.

So we need to fix the recursion anyway.
Python crashing means we get a stack dump and all tests further down the
line get ignored. So the consequence is that some recursion bugs (which
need to be fixed anyway) need a higher priority than they would have
gotten otherwise, is it?

Aaron Meurer

unread,
Apr 10, 2012, 1:49:37 AM4/10/12
to sy...@googlegroups.com
On Mon, Apr 9, 2012 at 11:03 PM, Joachim Durchholz <j...@durchholz.org> wrote:
> Am 10.04.2012 05:31, schrieb Aaron Meurer:
>
>> So I think we should
>> just fix issue 2899, and even if there are other recursion problems
>> they shouldn't show up (unless we are very unlucky).  We need to fix
>> that issue anyway.
>
>
> As far as I understand the situation, we have an infinite recursion on the
> SymPy side, which also crashes the Python side of things.
>
> So we need to fix the recursion anyway.

Yes, this is right. The test is an XFAIL test, so it's just testing
the broken behavior. But we should fix it.

> Python crashing means we get a stack dump and all tests further down the
> line get ignored. So the consequence is that some recursion bugs (which need
> to be fixed anyway) need a higher priority than they would have gotten
> otherwise, is it?

From my understanding, some C Python function was suppressing the
error, which usually is a more subtle bug, but in this case it
prevented it from stopping the Python stack from overflowing. I think
the chances of hitting this particular bug are pretty slim. You have
to call whatever Python function was suppressing the error exactly
when the Python stack fills up. That's why my patch made the error
show up: it added another function call to the stack, making the error
show up in a different place. Also, if it were easy to hit, someone
else would have found it by now.

Aaron Meurer

Reply all
Reply to author
Forward
0 new messages