sporadic segfault

151 views
Skip to first unread message

Angus Griffith

unread,
Jan 6, 2014, 8:58:31 AM1/6/14
to pytho...@googlegroups.com
I'm not exactly sure what's going on here, but I'm getting sporadic crashes using cffi with pypy.

Steps to reproduce
git clone git@github.com:sn6uv/gmpy_cffi.git
cd gmpy_cffi

Run
py.test-pypy
repeatedly.

Details
About 90% of the time the tests run and pass, but about 10% of the time I get some form of error. So far I've seen a few different errors
  1. Most commonly zsh: segmentation fault (core dumped)  /opt/pypy/bin/py.test
  2. double free e.g.. *** Error in `/usr/bin/pypy': double free or corruption (out): 0x00002b101d3c55d0 ***
  3. corrupted double-linked list e.g * Error in `/usr/bin/pypy': corrupted double-linked list: 0x00000000041fcdf0 ***
  4. corrupted double-linked malloc variant list e.g. *** Error in `/usr/bin/pypy': malloc(): smallbin double linked list corrupted: 0x00000000057bcfa0 ***
  5. free()/realloc() invalid pointer e.g. *** Error in `/usr/bin/pypy': realloc(): invalid pointer: 0x0000000005c388b0 ***
  6. free() invalid size e.g. .*** Error in `/usr/bin/pypy': free(): invalid size: 0x00000000077e6820 ***
  7. Kernel panic requiring hard restart of my system. This one is exceedingly rare (<< 1%).
The errors occur at different points throughout the tests, but it's always in the same test file. Upon removing that file from the test suite the errors just occur in the next file.

Occasionally I get a backtrace [1] accompanying the error but most of the time not.

I've confirmed this on my desktop and my laptop (both run archlinux with PyPy 2.2.1-final, cffi version 0.8). Also on the travis CI tests [2] (which seem to error more regularly than my local systems).

Angus Griffith

unread,
Jan 9, 2014, 7:22:26 AM1/9/14
to pytho...@googlegroups.com
On the pypy bug tracker [1] it was pointed out that this issue affects CPython as well as PyPy, furthermore this is a keepalive issue.

To quote the pypy bug tracker discussion
CPython crashes consistently
on a line like this:
 
    print gmpy_cffi.mpc(1.5, 2.3).real
 
I guess it's because the .real returns a reference inside the data
managed by the mpc object, but it is deallocated while the .real
object is still alive, which ends up in a crash. With CPython, you
can find it reliably by compiling _cffi_backend with -DCFFI_MEM_DEBUG
before running the tests. (I guess that it is a trick that should be
mentioned on the doc page...)

 I don't quite understand this. I'd expect the following to crash:

>>>> import gmpy_cffi
>>>> x = gmpy_cffi.mpc(1,2)
>>>> y = x.real
>>>> del(x)
>>>> y
mpfr('1.0')

but it doesn't. Calling del doesn't touch (deallocate) the c objects. Is there a way I can deallocate the c objects directly?

Armin Rigo

unread,
Jan 10, 2014, 4:28:38 AM1/10/14
to pytho...@googlegroups.com
Hi Angus,

On Thu, Jan 9, 2014 at 1:22 PM, Angus Griffith <16s...@gmail.com> wrote:
>> >>>> import gmpy_cffi
>> >>>> x = gmpy_cffi.mpc(1,2)
>> >>>> y = x.real
>> >>>> del(x)
>> >>>> y
>> mpfr('1.0')

For me this happened to segfault when I tried, but it's not reliable
with PyPy. The GC runs at random times. It might be that x is not
allocated for a while. Try simply to ask for "y" repeatedly; I'm sure
it will eventually crash. Or use "import gc; gc.collect()" to be sure
that x is gone now.

To fix the problem, a simple workaround would be: in the "real"
property getter, where you build a new mpfr instance and copy cffi's
internal pointer, also add to the mpfr object an attribute "_keepalive"
that points back to the original object. This should make sure that the
original object is kept alive as long as needed (and thus the C data).


A bientôt,

Armin.

Angus Griffith

unread,
Jan 10, 2014, 9:24:20 AM1/10/14
to pytho...@googlegroups.com, ar...@tunes.org
Firstly, thanks so much for your help.

I created a file
import gmpy_cffi
import gc


x
= gmpy_cffi.mpc(1,2)
y
= x.real
del(x)
gc
.collect()
print(y)

without your suggested fix it crashes, but by adding a _keepalive reference it is now fixed!

The problem is that the tests are still crashing sometimes. Moreover, when I add the above test with gc.collect() to my test suite (called by py.test) it crashes every time (even with the patch).

Even stranger, I tried to raise a TypeError while fetching the .real property and managed to cause an INTERNALERROR> MemoryError [1] consistently (no gc.collect or even del just accessing x.real).

Perhaps there is another keepalive issue. Any tips on how to hunt these down?

Thanks,
Angus

Armin Rigo

unread,
Jan 10, 2014, 9:51:10 AM1/10/14
to pytho...@googlegroups.com
Hi,

On Fri, Jan 10, 2014 at 3:24 PM, Angus Griffith <16s...@gmail.com> wrote:
> Perhaps there is another keepalive issue. Any tips on how to hunt these
> down?

Here's how I found the previous bug. Try to run it on CPython.
Download CFFI, and compile the _cffi_backend module with
-DCFFI_MEM_DEBUG or -DCFFI_MEM_LEAK (see steps below.) Then use gdb
when running.

$ cd c
$ gcc -pthread -DUSE__THREAD -fno-strict-aliasing -g -fwrapv -Wall
-Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c _cffi_backend.c
-o _cffi_backend.o -I/usr/include/libffi -DCFFI_MEM_DEBUG
$ gcc -pthread -DUSE__THREAD -shared _cffi_backend.o -o
../_cffi_backend.so -lffi -g

Also, I'd recommend that you understand exactly (if needed obviously
:-) why it crashed, and then review the code with this in mind. In
particular, calls similar to mpfr._from_c_mpfr() are all potentially
dangerous: you have to make very sure that the cdata pointer points to
memory that stays alive as long as the new object.


A bientôt,

Armin.
Reply all
Reply to author
Forward
0 new messages