Significant slowdown of basic arithmetic

152 views
Skip to first unread message

Marc Mezzarobba

unread,
Nov 23, 2019, 1:41:31 PM11/23/19
to sage-...@googlegroups.com
Something like object creation, memory allocation, basic arithmetic, or
cython calls seems to have become a lot slower recently.

Overall, my Python code using sage runs about 10% slower with 9.0.beta6
than with 8.9. The slowdown seems to be spread evenly across many
different functions.

But here is a simple example that shows a significant difference:

# 8.9
sage: def f():
....: a = 0
....: for i in xrange(10^7):
....: a += 1
sage: %time f()
CPU times: user 1.27 s, sys: 3 ms, total: 1.28 s
Wall time: 1.27 s

# 9.0.beta6
sage: def f():
....: a = 0
....: for i in range(10^7):
....: a += 1
sage: %time f()
CPU times: user 1.71 s, sys: 0 ns, total: 1.71 s
Wall time: 1.7 s

Note that a similar function running an empty loop is quite a bit
*faster* with 9.0.beta6 (as one may have expected with the Python
upgrade): ~260 ms vs ~300 ms.

Also,

sage: %time a = 3**(2**30+1)

and other computations using GMP with no back-and-forth with Python run
at the same speed with both versions, so GMP/MPIR versions or build
options are probably not to blame.

Any clue what is going on?

--
Marc

Vincent Delecroix

unread,
Nov 23, 2019, 1:49:25 PM11/23/19
to sage-...@googlegroups.com
This is very annoying... and should have been detected automatically.
Julian started [1] and it would be good to finish setting it up!

@Marc: could you perform some C profiling (it might work directly
inside Sage via %crun [2]).

[1] https://trac.sagemath.org/ticket/25262

[2]
http://doc.sagemath.org/html/en/thematic_tutorials/profiling.html#c-level-function-calls-crun

Marc Mezzarobba

unread,
Nov 23, 2019, 2:13:26 PM11/23/19
to sage-...@googlegroups.com
Vincent Delecroix wrote:
> @Marc: could you perform some C profiling (it might work directly
> inside Sage via %crun [2]).

Yes, I'll try to investigate a bit more, but I first wanted to ask if it
was a known issue.

--
Marc

Vincent Delecroix

unread,
Nov 23, 2019, 2:15:55 PM11/23/19
to sage-...@googlegroups.com
And could you also time compare

a = Integer(1r)

on both versions?

John H Palmieri

unread,
Nov 23, 2019, 2:34:51 PM11/23/19
to sage-devel
It's not actually Sage 8.9 vs. 9.0, it is? Rather it's Python 2 vs. Python 3. On my computer, I see the same timings with Python 3 builds of 8.9 and 9.0.beta6, but things look faster with Python 2 builds of 8.9 and 9.0.beta6.

By the way, Integer(1r) is also faster with Python 2 than with Python 3.

  John

Vincent Delecroix

unread,
Nov 23, 2019, 3:38:35 PM11/23/19
to sage-...@googlegroups.com
Le 23/11/2019 à 11:34, John H Palmieri a écrit :
> By the way, Integer(1r) is also faster with Python 2 than with Python 3.

By how much? Does it explain the 25% slowdown of the original post?

Eric Gourgoulhon

unread,
Nov 23, 2019, 3:50:53 PM11/23/19
to sage-devel
Le samedi 23 novembre 2019 20:34:51 UTC+1, John H Palmieri a écrit :
It's not actually Sage 8.9 vs. 9.0, it is? Rather it's Python 2 vs. Python 3.

 I confirm this: since at least Sage 8.5, Python 3 is roughly 10% slower than Python 2 in running the manifold doctests via
sage -tp --long src/sage/manifolds

Here are some benchmarks, with respectively Total time, CPU time, Cumulative wall time in seconds:

- Sage 8.5.beta2:
  - py2: 153        1241          1178
  - py3: 164        1315          1258

- Sage 8.8.beta2
   -py2: 154        1269          1224
   -py3: 162        1287          1242

Eric. 

John H Palmieri

unread,
Nov 23, 2019, 4:06:47 PM11/23/19
to sage-devel
Python 2:

sage: %timeit Integer(1r)
The slowest run took 41.65 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 45.8 ns per loop
sage: %time Integer(2r)
CPU times: user 3 µs, sys: 2 µs, total: 5 µs
Wall time: 5.01 µs
2

Python 3:

sage: %timeit Integer(1r)
The slowest run took 25.70 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 5: 71.2 ns per loop
sage: %time Integer(2r)
CPU times: user 3 µs, sys: 1 µs, total: 4 µs
Wall time: 6.91 µs
2

Vincent Delecroix

unread,
Nov 23, 2019, 4:20:54 PM11/23/19
to sage-...@googlegroups.com
Then I guess the reason of the slowdown comes from the change in
the integer types in Python 3 and the way we handle the conversion
from Python integers to Sage integers.

Namely we have low level mpz_set_pylong (for "long" integers) and
mpz_set_si (for machine integers, which is a GMP function). But in
Python3 there is no more distinction and every integer is a "long"
integer. Hence on Sagemath Python3, the conversion always go
through mpz_set_pylong which seems 25% slower than mpz_set_si.

Marc Mezzarobba

unread,
Nov 24, 2019, 2:34:41 AM11/24/19
to sage-...@googlegroups.com
Vincent Delecroix wrote:
> Then I guess the reason of the slowdown comes from the change in
> the integer types in Python 3 and the way we handle the conversion
> from Python integers to Sage integers

This may be part of it, but I don't think it explains everything:

sage: a = 1
sage: %timeit a+a
The slowest run took 103.92 times longer than the fastest. This could
mean that an intermediate result is being cached.
10000000 loops, best of 3: 57.4 ns per loop

vs

sage: %timeit a+a
The slowest run took 93.42 times longer than the fastest. This could
mean that an intermediate result is being cached.
10000000 loops, best of 5: 65.1 ns per loop

--
Marc

Marc Mezzarobba

unread,
Nov 24, 2019, 2:57:32 AM11/24/19
to sage-...@googlegroups.com
Some profiling data (via linux-perf) for

sage: def f():
....: for i in [x]range(100000000):
....: a+a

Py2:

Samples: 26K of event 'cycles', Event count (approx.): 17630706503
Children Self Command Shared Object Symbol
+ 77,85% 47,79% python2 libpython2.7.so.1.0 [.] PyEval_EvalFrameEx
+ 75,42% 0,00% python2 libpython2.7.so.1.0 [.] PyEval_EvalCodeEx
+ 75,42% 0,00% python2 libpython2.7.so.1.0 [.] call_function (inlined)
+ 75,42% 0,00% python2 libpython2.7.so.1.0 [.] fast_function (inlined)
+ 75,39% 0,00% python2 libpython2.7.so.1.0 [.] PyObject_Call
+ 75,38% 0,00% python2 libpython2.7.so.1.0 [.] PyEval_EvalCode
+ 75,38% 0,00% python2 libpython2.7.so.1.0 [.] Py_Main
+ 75,38% 0,00% python2 libpython2.7.so.1.0 [.] PyRun_SimpleFileExFlags
+ 75,38% 0,00% python2 libpython2.7.so.1.0 [.] PyRun_FileExFlags
+ 75,38% 0,00% python2 libpython2.7.so.1.0 [.] run_mod (inlined)
+ 75,38% 0,00% python2 libpython2.7.so.1.0 [.] function_call
+ 75,37% 0,00% python2 python2.7 [.] _start
+ 75,37% 0,00% python2 libc-2.29.so [.] 0x00007fde64129bba
+ 75,34% 0,00% python2 libpython2.7.so.1.0 [.] exec_statement (inlined)
+ 75,34% 0,00% python2 libpython2.7.so.1.0 [.] ext_do_call (inlined)
- 75,33% 0,00% python2 libpython2.7.so.1.0 [.] builtin_eval
builtin_eval
PyEval_EvalCode
PyEval_EvalCodeEx
PyEval_EvalFrameEx
call_function (inlined)
fast_function (inlined)
- PyEval_EvalFrameEx
+ 15,44% PyNumber_Add
6,93% lookdict <---
+ 2,28% __pyx_f_4sage_5rings_7integer_fast_tp_dealloc
- 1,82% PyInt_FromLong <---
PyInt_FromLong
0,91% rangeiter_next
0,82% PyInt_FromLong@plt
0,66% 0x7fde644a9060
0,61% PyThread_acquire_lock
+ 17,25% 6,03% python2 libpython2.7.so.1.0 [.] binary_op1
+ 16,84% 1,76% python2 libpython2.7.so.1.0 [.] PyNumber_Add
+ 15,05% 5,02% python2 integer.so [.] __pyx_pw_4sage_5rings_7integer_7Integer_63__add__
+ 13,43% 13,43% python2 libpython2.7.so.1.0 [.] lookdict
+ 12,52% 0,00% python2 integer.so [.] __pyx_pf_4sage_5rings_7integer_7Integer_62__add__ (inlined)
+ 9,98% 9,56% python2 libgmp.so.23.0.3 [.] __gmpz_add
+ 6,01% 0,00% python2 anon [.] 0x00007fde62ec9d3f
+ 3,70% 3,70% python2 integer.so [.] __pyx_f_4sage_5rings_7integer_fast_tp_dealloc
+ 3,29% 0,00% python2 integer.so [.] __pyx_f_4sage_3ext_7stdsage_PY_NEW (inlined)
+ 3,10% 3,10% python2 integer.so [.] __pyx_f_4sage_5rings_7integer_fast_tp_new
+ 3,09% 0,00% python2 [unknown] [k] 0xffffffffffffffff
+ 2,96% 0,00% python2 libgmp.so.23.0.3 [.] __gmpn_add (inlined)

Py3:

Samples: 39K of event 'cycles', Event count (approx.): 26191091752
Children Self Command Shared Object Symbol
+ 79,05% 24,87% python3 libpython3.7m.so.1.0 [.] _PyEval_EvalFrameDefault
+ 78,13% 0,00% python3 [unknown] [.] 0xffffffffffffffff
+ 77,94% 0,00% python3 libpython3.7m.so.1.0 [.] _PyEval_EvalCodeWithName
+ 77,94% 0,00% python3 libpython3.7m.so.1.0 [.] call_function (inlined)
+ 77,94% 0,00% python3 libpython3.7m.so.1.0 [.] function_code_fastcall
+ 77,94% 0,00% python3 libpython3.7m.so.1.0 [.] _PyFunction_FastCallKeywords
+ 77,89% 0,00% python3 libpython3.7m.so.1.0 [.] PyEval_EvalCodeEx
+ 77,89% 0,00% python3 libpython3.7m.so.1.0 [.] PyEval_EvalCode
+ 77,89% 0,00% python3 libpython3.7m.so.1.0 [.] _PyMethodDef_RawFastCallKeywords
+ 77,88% 0,00% python3 libpython3.7m.so.1.0 [.] _PyFunction_FastCallDict
+ 77,86% 0,00% python3 libpython3.7m.so.1.0 [.] _PyObject_Call_Prepend
+ 77,85% 0,00% python3 libpython3.7m.so.1.0 [.] _PyCFunction_FastCallKeywords
+ 77,85% 0,00% python3 libpython3.7m.so.1.0 [.] PyObject_Call
+ 77,84% 0,00% python3 libpython3.7m.so.1.0 [.] do_call_core (inlined)
+ 77,84% 0,00% python3 libpython3.7m.so.1.0 [.] builtin_exec
+ 77,84% 0,00% python3 libpython3.7m.so.1.0 [.] builtin_exec_impl (inlined)
- 77,83% 0,00% python3 libpython3.7m.so.1.0 [.] builtin_eval
builtin_eval
builtin_eval_impl (inlined)
PyEval_EvalCode
PyEval_EvalCodeEx
_PyEval_EvalCodeWithName
_PyEval_EvalFrameDefault
call_function (inlined)
function_code_fastcall
- _PyEval_EvalFrameDefault
- 22,27% _PyDict_LoadGlobal <---
- 17,48% lookdict_unicode_nodummy
- lookdict_unicode_nodummy
4,12% dk_get_index (inlined)
+ 10,77% PyNumber_Add
- 9,66% PyLong_FromLong <---
- 7,44% _PyLong_New
- 5,22% _PyObject_Malloc
+ pymalloc_alloc (inlined)
- 5,85% _PyObject_Free
- _PyObject_Free
- pymalloc_free (inlined)
1,16% address_in_range (inlined)
+ 1,86% __pyx_f_4sage_5rings_7integer_fast_tp_dealloc
1,41% _PyDict_LoadGlobal@plt
0,78% rangeiter_next
0,55% PyLong_FromLong@plt
+ 77,83% 0,00% python3 libpython3.7m.so.1.0 [.] builtin_eval_impl (inlined)
+ 25,69% 8,20% python3 libpython3.7m.so.1.0 [.] _PyDict_LoadGlobal
+ 20,53% 20,51% python3 libpython3.7m.so.1.0 [.] lookdict_unicode_nodummy
+ 12,20% 3,16% python3 libpython3.7m.so.1.0 [.] binary_op1
+ 11,68% 1,05% python3 libpython3.7m.so.1.0 [.] PyNumber_Add
+ 10,97% 3,73% python3 integer.cpython-37m-x86_64-linux-gnu.so [.] __pyx_pw_4sage_5rings_7integer_7Integer_63__add__
+ 10,64% 2,79% python3 libpython3.7m.so.1.0 [.] PyLong_FromLong
+ 9,20% 0,00% python3 integer.cpython-37m-x86_64-linux-gnu.so [.] __pyx_pf_4sage_5rings_7integer_7Integer_62__add__ (inlined)
+ 7,92% 2,02% python3 libpython3.7m.so.1.0 [.] _PyLong_New
+ 7,49% 7,15% python3 libgmp.so.23.0.3 [.] __gmpz_add
+ 7,29% 1,95% python3 libpython3.7m.so.1.0 [.] _PyObject_Free
+ 6,42% 0,00% python3 libpython3.7m.so.1.0 [.] pymalloc_free (inlined)
+ 6,42% 6,42% python3 libpython3.7m.so.1.0 [.] pymalloc_free.isra.0
+ 5,40% 2,43% python3 libpython3.7m.so.1.0 [.] _PyObject_Malloc
+ 4,54% 0,00% python3 libpython3.7m.so.1.0 [.] pymalloc_alloc (inlined)
+ 4,54% 4,54% python3 libpython3.7m.so.1.0 [.] pymalloc_alloc.isra.6.part.7
+ 4,12% 0,00% python3 libpython3.7m.so.1.0 [.] dk_get_index (inlined)
+ 3,89% 0,00% python3 anon [.] 0x00007fb9cc7a983f
+ 3,58% 0,00% python3 [heap] [.] 0x0000558b4e6c58af
+ 3,33% 0,00% python3 libpython3.7m.so.1.0 [.] pymalloc_alloc (inlined)
+ 2,38% 2,38% python3 integer.cpython-37m-x86_64-linux-gnu.so [.] __pyx_f_4sage_5rings_7integer_fast_tp_dealloc
+ 2,31% 0,00% python3 anon [.] 0x00007fb9cbe17047
+ 2,26% 0,00% python3 integer.cpython-37m-x86_64-linux-gnu.so [.] __pyx_f_4sage_3ext_7stdsage_PY_NEW (inlined)
+ 2,22% 0,00% python3 libgmp.so.23.0.3 [.] __gmpn_add (inlined)

--
Marc

Marc Mezzarobba

unread,
Nov 24, 2019, 3:13:29 AM11/24/19
to sage-...@googlegroups.com
Marc Mezzarobba wrote:

> Some profiling data (via linux-perf) for
>
> sage: def f():
> ....: for i in [x]range(100000000):
> ....: a+a

where a = Integer(1).

If a is made local to the function, the very expensive lookups
disappear, and we still see calls to PyLong_FromLong taking much longer
than PyInt_FromLong, as Vincent suspected. However, I don't understand
why PyLong_FromLong needs to be called at all here.

- 79,71% 0,00% python2 libpython2.7.so.1.0 [.] builtin_eval
builtin_eval
PyEval_EvalCode
PyEval_EvalCodeEx
PyEval_EvalFrameEx
call_function (inlined)
fast_function (inlined)
- PyEval_EvalFrameEx
+ 19,47% PyNumber_Add
+ 4,23% __pyx_f_4sage_5rings_7integer_fast_tp_dealloc
- 2,19% PyInt_FromLong
PyInt_FromLong
1,65% rangeiter_next
1,12% PyInt_FromLong@plt
0,67% 0x7fde644a906

- 77,70% 0,00% python3 libpython3.7m.so.1.0
[.] builtin_eval
builtin_eval
builtin_eval_impl (inlined)
PyEval_EvalCode
PyEval_EvalCodeEx
_PyEval_EvalCodeWithName
_PyEval_EvalFrameDefault
call_function (inlined)
function_code_fastcall
- _PyEval_EvalFrameDefault
+ 15,77% PyNumber_Add
- 13,30% PyLong_FromLong
+ 10,01% _PyLong_New
0,63% _PyLong_New@plt
+ 8,37% _PyObject_Free
+ 2,70% __pyx_f_4sage_5rings_7integer_fast_tp_dealloc
1,20% rangeiter_next
0,83% PyLong_FromLong@plt
0,83% PyNumber_Add@plt

--
Marc

Marc Mezzarobba

unread,
Nov 24, 2019, 1:04:33 PM11/24/19
to sage-...@googlegroups.com
Marc Mezzarobba wrote:
>> sage: def f():
>> ....: for i in [x]range(100000000):
>> ....: a+a
>
> where a = Integer(1).
>
> If a is made local to the function, the very expensive lookups
> disappear, and we still see calls to PyLong_FromLong taking much
> longer than PyInt_FromLong, as Vincent suspected.

...But these seem to come from the loop indices. And indeed:

sage: def f():
....: a = Integer(1)
....: for i in range(100000000):
....: a+a

sage: %time f() # py3
CPU times: user 5.58 s, sys: 0 ns, total: 5.58 s
Wall time: 5.58 s

sage: import itertools
sage: def f():
....: a = Integer(1)
....: for i in itertools.repeat(1, 100000000):
....: a+a

sage: %time f() # py3
CPU times: user 3.7 s, sys: 3.92 ms, total: 3.71 s
Wall time: 3.71 s

sage: %time f() # py2
CPU times: user 4.83 s, sys: 3.92 ms, total: 4.83 s
Wall time: 4.83 s

But then I don't understand why the empty loop ran faster under py3...
But I also no longer manage to reproduce that observation. So maybe what
one observes with that particular example is just that range() has got
slower with the int/long merge? I'd be surprised, however, if this
explained a 10% slowdown in so many things.

--
Marc

TB

unread,
Nov 24, 2019, 1:28:50 PM11/24/19
to sage-...@googlegroups.com
With xsrange(100000000) I see that py3 is a bit faster, because then the
indices are indeed Sage's Integer. The reverse is true with
xsrange(100000000r).

Regards,
TB
Reply all
Reply to author
Forward
0 new messages