Memory leak (quite bad)

153 views
Skip to first unread message

Gonzalo Tornaria

unread,
Jul 5, 2023, 3:24:27 PM7/5/23
to sage-devel
This slowly and inexorably goes on. Computing `sqrt(T2)` leaks 32 bytes each and every time (asymptotically).

Found by a student who, through no fault of himself, brought down our server (unable to ssh in until the OOM triggered -- but since the leak is slow it takes a while to trash 16G of swap).

===
$ cat memleak.py
from sage.all import sqrt
T2 = sqrt(2)
import psutil
ps = psutil.Process()
base = ps.memory_info().rss
for a in range(1, 10):
    for b in range(num := 100_000):
        C = sqrt(T2)
    mem = ps.memory_info().rss - base
    print(f"{mem/1e6 :.2f} MB ({mem/a/num :.2f} bytes/iter)")
$ sage memleak.py
2.70 MB (27.03 bytes/iter)
5.95 MB (29.74 bytes/iter)
9.19 MB (30.64 bytes/iter)
12.44 MB (31.09 bytes/iter)
15.41 MB (30.82 bytes/iter)
18.65 MB (31.09 bytes/iter)
21.90 MB (31.28 bytes/iter)
25.14 MB (31.43 bytes/iter)
28.39 MB (31.54 bytes/iter)
===

Replace the 10 in the outer loop by something larger at your own peril (each outer iteration will take 3.2M so 10_000 should kill a laptop in an hour or two).

This is with system sagemath 10.0 but it also happens with 9.6, 9.7, 9.8 and 10.0 in cocalc.com.

Best,
Gonzalo

Edgar Costa

unread,
Jul 5, 2023, 3:29:44 PM7/5/23
to sage-...@googlegroups.com
Hi Gonzalo,

I highly recommend using https://github.com/rfjakob/earlyoom instead of waiting for OOM to kick in.

Cheers,
Edgar

--
You received this message because you are subscribed to the Google Groups "sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sage-devel/26ec41f7-c591-48b6-ab57-463b8b8a1675n%40googlegroups.com.

Nils Bruin

unread,
Jul 5, 2023, 4:39:32 PM7/5/23
to sage-devel
The leak does not seem to be on the python heap, so Pynac is the next likely candidate (I don't think this code should be hitting maxima_lib)

Nils Bruin

unread,
Jul 6, 2023, 4:16:46 PM7/6/23
to sage-devel
On Wednesday, 5 July 2023 at 08:29:44 UTC-7 Edgar Costa wrote:
Hi Gonzalo,

I highly recommend using https://github.com/rfjakob/earlyoom instead of waiting for OOM to kick in.

Wouldn't setting ulimit with -m (memory) or -v (virtual memory) for the process that is liable to exceed its memory quota be a more sensible thing? It's not unusual to set such bounds by default for users on multi-user machines. Then the process should just die by itself before your whole server becomes unresponsive (if you have many processes making large memory demands you could still end up thrashing and triggering oom but that would require extreme bad luck or malice) 

Michael Orlitzky

unread,
Jul 7, 2023, 12:56:27 AM7/7/23
to sage-...@googlegroups.com
On 2023-07-06 09:16:46, Nils Bruin wrote:
> > On Wednesday, 5 July 2023 at 08:29:44 UTC-7 Edgar Costa wrote:
> >
> > Hi Gonzalo,
> >
> > I highly recommend using https://github.com/rfjakob/earlyoom instead of
> > waiting for OOM to kick in.
>
> Wouldn't setting ulimit with -m (memory) or -v (virtual memory) for the
> process that is liable to exceed its memory quota be a more sensible thing?

If you don't need a portable solution, then on linux, cgroups are the
most flexible way to leave yourself juuust enough RAM to be able to
move your mouse over the X.

Georgi Guninski

unread,
Jul 7, 2023, 8:54:42 AM7/7/23
to sage-...@googlegroups.com
Simpler testcase is to replace `C = sqrt(T2)`
with `C=SR(int(2)).sqrt()`
Both int() and sqrt() appear necessary, sin() doesn't leak for me.

Edgar Costa

unread,
Jul 7, 2023, 2:53:22 PM7/7/23
to sage-...@googlegroups.com
Nils,

This is my recommendation to avoid my worst-case scenario, where someone must go to some far away basement and power cycle the server manually after waiting a couple of days for OOM kicking in.
I'm okay with a user using 90% of the ram, if that becomes an issue, I can always email them or kill their process, but more often than not, until I started to use earlyoom is that the memory usage slowly creeps to 100% and the culprit process only gets killed by oom or a power cycle.

Cheers,
Edgar

--
You received this message because you are subscribed to the Google Groups "sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+...@googlegroups.com.

Nils Bruin

unread,
Jul 7, 2023, 3:01:39 PM7/7/23
to sage-devel
On Friday, 7 July 2023 at 07:53:22 UTC-7 Edgar Costa wrote:
I'm okay with a user using 90% of the ram, if that becomes an issue, I can always email them or kill their process, but more often than not, until I started to use earlyoom is that the memory usage slowly creeps to 100% and the culprit process only gets killed by oom or a power cycle.
 
In that case, would you get the desired result by setting the relevant ulimit to 90% ? I'm trying to understand what earlyoom offers over the quota system that posix offers out of the box.

Edgar Costa

unread,
Jul 7, 2023, 3:13:52 PM7/7/23
to sage-...@googlegroups.com
Two users try to use 51% of the memory simultaneously.

--
You received this message because you are subscribed to the Google Groups "sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+...@googlegroups.com.

Volker Braun

unread,
Aug 3, 2023, 10:59:51 PM8/3/23
to sage-devel
A quick valgrind run for 

    from sage.all import sqrt
    T2 = sqrt(2)
    for b in range(num := 100_000):
        C = sqrt(T2)

confirms that it is in pynac:

==3947957== 799,912 bytes in 99,989 blocks are definitely lost in loss record 1,299 of 1,300
==3947957==    at 0x484182F: malloc (vg_replace_malloc.c:431)
==3947957==    by 0x1A6D3B07: sig_malloc (memory.c:1898)
==3947957==    by 0x1A6D3B07: __pyx_f_4sage_3ext_6memory_sage_sig_malloc (memory.c:1517)
==3947957==    by 0x13D0EC2B: ???
==3947957==    by 0x13D0FEFD: ???
==3947957==    by 0x1FBA2393: GiNaC::numeric::integer_rational_power(GiNaC::numeric&, GiNaC::numeric const&, GiNaC::numeric const&) (numeric.cpp:1621)
==3947957==    by 0x1FBA266D: GiNaC::numeric::integer_rational_power(GiNaC::numeric&, GiNaC::numeric const&, GiNaC::numeric const&) (numeric.cpp:1614)
==3947957==    by 0x1FBA94DC: GiNaC::rational_power_parts(GiNaC::numeric const&, GiNaC::numeric const&, GiNaC::numeric&, GiNaC::numeric&, bool&) (numeric.cpp:1692)
==3947957==    by 0x1FBAA2AF: GiNaC::numeric::power(GiNaC::numeric const&) const (numeric.cpp:1916)
==3947957==    by 0x1FBBA918: GiNaC::power::eval(int) const (power.cpp:536)
==3947957==    by 0x1FB09107: GiNaC::ex::construct_from_basic(GiNaC::basic const&) (ex.cpp:923)
==3947957==    by 0x1FBB9D89: ex (ex.h:314)
==3947957==    by 0x1FBB9D89: GiNaC::power::eval(int) const (power.cpp:507)
==3947957==    by 0x1FB09107: GiNaC::ex::construct_from_basic(GiNaC::basic const&) (ex.cpp:923)
==3947957==

Dima Pasechnik

unread,
Aug 3, 2023, 11:46:15 PM8/3/23
to sage-devel

--
You received this message because you are subscribed to the Google Groups "sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+...@googlegroups.com.

Volker Braun

unread,
Aug 8, 2023, 9:58:57 AM8/8/23
to sage-devel
I've created a PR with the missing mpz_clear and a unit test strategy for memory leaks at https://github.com/sagemath/sage/pull/36046
Reply all
Reply to author
Forward
0 new messages