cython memory leak when one's forget to cdef the for loop variable

1,139 views
Skip to first unread message

Sébastien Labbé

unread,
Oct 19, 2016, 6:33:47 PM10/19/16
to sage-devel
Dear sage-devel,

Writing cython code, I was having a problem with memory leaks, and I managed to simplify the problem to a simple for loop computing a sum.

If the loop variable a is cdef, everything is fine:

sage: %%cython
....: def test_with_cdef_a(int N):
....:     cdef long S = 0
....:     cdef int a
....:     for a in range(1, N):
....:         sig_check() # Check for Keyboard interupt
....:         S += a
....:     return S
....:
sage: %time test_with_cdef_a(10**8)      # fast, takes no memory, great
CPU times: user 103 ms, sys: 2.64 ms, total: 105 ms
Wall time: 106 ms
4999999950000000

As expected, if I forget the "cdef int a" line, it takes longer. But most surprisingly, it uses a *lot* of memory during the computation (40%) and not all of the memory is freed after he computation (30%).

sage: %%cython
....: def test_no_cdef_a(int N):
....:     cdef long S = 0
....:     for a in range(1, N):
....:         sig_check() # Check for Keyboard interupt
....:         S += a
....:     return S
....:
sage: %time test_no_cdef_a(10**8)        # this takes a lot of memory (40%)  + memory leaks (30% of the memory after computation)
CPU times: user 8.36 s, sys: 787 ms, total: 9.14 s
Wall time: 9.24 s
4999999950000000
sage: %time test_no_cdef_a(10**9)        # this takes a lot of memory (all of it, starts swaping)

I am using:

$ sage -cython -V
Cython version 0.24.1
$ sage -version  
SageMath version 7.4.beta6, Release Date: 2016-09-24

Are you able to reproduce?

Sébastien


Volker Braun

unread,
Oct 19, 2016, 7:11:14 PM10/19/16
to sage-devel
Thats the expected behavoir. Without type annotation, cython just does the same as Python (create a list of 10**8 elements and iterate over). With type annotation it is a C-level for loop.

Volker Braun

unread,
Oct 19, 2016, 7:12:17 PM10/19/16
to sage-devel
PS: Write your code in a file and compile it with "cython -a myfile.pyx", that generates a html file with explanations.

Sébastien Labbé

unread,
Oct 19, 2016, 7:19:51 PM10/19/16
to sage-devel
Does this also explain the leak?

Vincent Delecroix

unread,
Oct 20, 2016, 2:42:05 AM10/20/16
to sage-devel
It might not be a leak. *After* the loop the memory should be back to
normal. The very same as with

sage: a = range(10**8) # takes a lot of memory
sage: del a # free the memory

Vincent

PS: This would have been different with Python 3.

Volker Braun

unread,
Oct 20, 2016, 2:48:51 AM10/20/16
to sage-devel
On Thursday, October 20, 2016 at 1:19:51 AM UTC+2, Sébastien Labbé wrote:
Does this also explain the leak?

Freed memory is not immediately returned to the system (mostly because it would be hilariously slow for small allocations). Whether a one-off computation increases process memory usage depends on many things like the details of the malloc() implementation, the heap layout, and, for Python objects, if the garbage collector has been run. To identify a memory leak by looking at "top" you need to run your code in a loop and check whether memory usage is roughly linear to the number of calls.

Sébastien Labbé

unread,
Oct 20, 2016, 5:16:36 AM10/20/16
to sage-devel
On Thursday, October 20, 2016 at 8:42:05 AM UTC+2, vdelecroix wrote:
> It might not be a leak. *After* the loop the memory should be back to
> normal. The very same as with
>
> sage: a = range(10**8)  # takes a lot of memory
> sage: del a                    # free the memory

Ok, so now, I understand why it takes the memory: a list was created.

My previous example can now be simplified to:

sage: %%cython
....: def f(int N):
....:     L = range(N)
....:     del L
....:
sage: f(10**8)     # computation takes 40% of memory, 30% is not freed after computation

Thanks to Volker, I can confirm that everything is ok after the garbage collector has done its job:

sage: import gc
sage: gc.collect()   # the 30% of occupied memory gets freed
7

Thank you!

Sébastien

Johan S. H. Rosenkilde

unread,
Oct 20, 2016, 5:47:38 AM10/20/16
to sage-...@googlegroups.com
>> sage: a = range(10**8) # takes a lot of memory
>> sage: del a # free the memory
>
> Ok, so now, I understand why it takes the memory: a list was created.

Using xrange instead of range will also avoid creating the list even
without cdef'ing a (the code is still slow of course).

But then the code will not immediately work in Python 3, I guess.

Best,
Johan

Frédéric Chapoton

unread,
Oct 20, 2016, 7:53:45 AM10/20/16
to sage-devel
xrange will stilll be allowed in cython files, even after (if) we switch to python3

Frederic

Vincent Delecroix

unread,
Oct 20, 2016, 8:00:42 AM10/20/16
to sage-devel
It might be allowed but I do not see the point of using it. The most
reasonable way is

cdef int a # or possibly, unsigned int, size_t, etc
for a in range(100):
...

And this has to be thought as the C for loop

int a;
for(a = 0; a < 100; a++) ...

And as Volker said, it is always a good idea to look at the generated
C code in HTML form.
Reply all
Reply to author
Forward
0 new messages