Pure python mode for numerical code: performance

38 views
Skip to first unread message

BattleSushi

unread,
Jun 20, 2024, 1:20:39 AM (11 days ago) Jun 20
to cython-users
I hope I am in the right place here to discuss the usage of Cython. If not, I apologize and hope you can redirect me.

With the new pure Python mode of Cython 3, I thought about (optionally) compile some of my numerical codes for better speed and simpler code using the cython.ufunc decorator.

See the following code:

import cython
from math import pi as pypi

from cython.cimports.libcpp import cmath
from cython.cimports.scipy.special.cython_special import ellipk as ellipk_r


cnan: cython.doublecomplex = complex("nan+nanj")
pi: cython.double = pypi
inf: cython.double = float("inf")

@cython.ufunc
@cython.cfunc
@cython.nogil
@cython.exceptval(check=False)
@cython.cdivision(True)
def cayley(z: cython.doublecomplex, half_bandwidth: cython.double) -> cython.doublecomplex:
    d_inv = 1 / half_bandwidth
    z_rel = z * d_inv
    return 2 * d_inv * z_rel * (1 - (1 - z_rel**-2)**0.5)


@cython.cfunc
@cython.exceptval(check=False)
@cython.inline
def isfinite(z: cython.doublecomplex) -> cython.bint:
    return cmath.isfinite(z.real) and cmath.isfinite(z.imag)


@cython.cfunc
@cython.exceptval(check=False)
def agm(a: cython.doublecomplex, b: cython.doublecomplex) -> cython.doublecomplex:
    if not (isfinite(a) and isfinite(b)):
        return cnan
    if 0 + 0j in (a, b):
        return 0 + 0j
    if -a == b:
        return 0 + 0j
    eps = 1.49e-8  # TODO: data-type dependent type
    while True:
        a, b = 0.5 * (a + b), (a * b)**0.5
        size = max(abs(a), abs(b))
        err = abs(a - b)
        if size == 0 or err < eps*size:
            return a


@cython.ccall
@cython.exceptval(check=False)
def ellipk(z: cython.doublecomplex) -> cython.doublecomplex:
    re: cython.double = z.real
    im: cython.double = z.imag
    if im == 0:
        if re == +inf:
            return 0 + 0j
        if re <= 1:
            return ellipk_r(re) + 0j
    a = (1 - z)**0.5
    v = agm(1, a)
    return cython.cdiv(0.5 * pi, v)

Sadly, the performance isn't as good as I expected, and the annotation is still quite yellow in some places. The compiled cayley function is much slower, both for scalars and for NumPy arrays. My guess is:
  • for scalars, the overhead of using Cython code is greater than the  benefit
  • for NumPy can leverage some SIMD which more than compensates its cost for the allocation of intermediate arrays
Are there any more options to optimize the Cython code (beside compiler flags -O3 --fast-math)?
I would like to compile the function not only for double precision but also for quad and single precision. Is there a good way to achieve this? My guess was using fused types for the float precision and using C++'s templated complex type, but that doesn't seem to compile.

For the following functions, I still have problems getting rid of the 'yellow' in the annotated HTML file. The issues I can see are:

  • In spite of using @cython.exceptval(check=False) it looks to be as there are still some error checks.
  • I don't find any way to calculate the real and imaginary part without falling back to Python. With libcpp.complex's real or libc.complex's creal I get the error 'no suitable method found'.
  • I haven't found any way, to define any constants without calls to Python. Declaring the variables globally with type declarations results in__Pyx_GetModuleGlobalName.  Not defining them results in the conversion of Python objects to the typed variables.
I am thankful for any advice, I have little experience with Cython.

da-woods

unread,
Jun 20, 2024, 2:43:45 AM (11 days ago) Jun 20
to cython...@googlegroups.com
When I compile it to a html annotated version, the issue is the module globals:

cnan: cython.doublecomplex = complex("nan+nanj")
pi: cython.double = pypi
inf: cython.double = float("inf")

Cython ignores annotations on module globals (essentially because that would annotation variables aren't visible in the module externally, so would change from Python semantics). This means that any time you use these numbers they are looked up from a dict and treated as Python values.

I think you need to use `cython.declare` rather than annotation typing for these.

> for scalars, the overhead of using Cython code is greater than the  benefit

Yes, that's probably true, especially for ufunc, which goes through a fairly complex dispatch mechanism.
--

---
You received this message because you are subscribed to the Google Groups "cython-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cython-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cython-users/860cd380-2a3a-40e6-aeeb-c6eae1abf7d8n%40googlegroups.com.


da-woods

unread,
Jun 20, 2024, 1:10:17 PM (10 days ago) Jun 20
to cython...@googlegroups.com
A few more comments having read your question a bit more thoroughly:

On 19/06/2024 22:38, 'BattleSushi' via cython-users wrote:
My guess was using fused types for the float precision and using C++'s templated complex type, but that doesn't seem to compile.

Fused types and `ufunc` should definitely work, but I'm not sure how easily they combine with C++ templates. It's probably easier just to use cython's native complex types if you can.

  • In spite of using @cython.exceptval(check=False) it looks to be as there are still some error checks.

I can't see much error checking so I'm not sure that's an issue?

  • I don't find any way to calculate the real and imaginary part without falling back to Python. With libcpp.complex's real or libc.complex's creal I get the error 'no suitable method found'.
If you're using Cython's native complex types then `val.real` and `val.imag` should work without needing to explicitly call C or C++.

BattleSushi

unread,
Jun 21, 2024, 4:44:49 AM (10 days ago) Jun 21
to cython-users
You are perfectly correct, I overlooked the documentation in https://cython.readthedocs.io/en/latest/src/tutorial/pure.html#pep-484-type-annotations:
> Annotations on global variables are currently ignored. This is because we expect annotation-typed code to be in majority written for Python, and global type annotations would turn the Python variable into an internal C variable, thus removing it from the module dict. To declare global variables as typed C variables, use @cython.declare().
Dropping the global definitions and instead declaring the constants in the functions removes the interactions with Python. E.g.

@cython.cfunc
@cython.exceptval(check=False)
def agm(a: cython.doublecomplex, b: cython.doublecomplex) -> cython.doublecomplex:
    cython.declare(nan=cython.doublecomplex, nanj=cython.doublecomplex)

    if not (isfinite(a) and isfinite(b)):
        return nan + nanj

    if 0 + 0j in (a, b):
        return 0 + 0j
    if -a == b:
        return 0 + 0j
    eps = 1.49e-8  # TODO: data-type dependent type
    while True:
        a, b = 0.5 * (a + b), (a * b)**0.5
        size = max(abs(a), abs(b))
        err = abs(a - b)
        if size == 0 or err < eps*size:
            return a

This also solves my problems with the error checks, after eliminating all Python code there are no more checks.

BattleSushi

unread,
Jun 21, 2024, 4:44:58 AM (10 days ago) Jun 21
to cython-users
> Fused types and `ufunc` should definitely work, but I'm not sure how easily they combine with C++ templates. It's probably easier just to use cython's native complex types if you can.
To correctly get different precisions, I (think I) need something like

# distutils: language=c++
import cython

from cython.cimports.libcpp.complex import complex

# @cython.ufunc
@cython.cfunc
# @cython.nogil
@cython.exceptval(check=False)
@cython.cdivision(True)
def cayley(z: complex[cython.floating], half_bandwidth: cython.floating) -> complex[cython.floating]:

    d_inv = 1 / half_bandwidth
    z_rel = z * d_inv
    return 2 * d_inv * z_rel * (1 - (1 - z_rel**-2)**0.5)

Adding the `ufunc` decorator y


Adding the `ufunc` decorator yields the error `Type 'complex[float]' cannot be used as a ufunc argument`. Surprisingly, the `nogil` decorator also yields an error  `Calling gil-requiring function not allowed without gil`. Without both of the decorators, the function compiles and yields code without any annotations.

> I can't see much error checking so I'm not sure that's an issue?
You are right, after fixing the issue with annotations of global variables, there is no more issue. Thanks for that fix.

> If you're using Cython's native complex types then `val.real` and `val.imag` should work without needing to explicitly call C or C++.
You mean `cython.doublecomplex`? This produces code like
50: re: cython.double = z.real __pyx_t_1 = __Pyx_CREAL(__pyx_v_z);
__pyx_v_re = __pyx_t_1;

I am not very good with the compiled code, but from the color scheme alone I would infer that `__Pyx_CREAL` is a Python call, or is this unproblematic?
For my `isfinite` code, results like even worse:
+21: @cython.cfunc static CYTHON_INLINE int __pyx_f_11test_script_isfinite(__pyx_t_double_complex __pyx_v_z) {
  int __pyx_r;
 /* … */
  /* function exit code */
   __pyx_L1_error:;
   __Pyx_WriteUnraisable("test_script.isfinite", __pyx_clineno, __pyx_lineno, __pyx_filename, 1, 0);
   __pyx_r = 0;
   __pyx_L0:;
   return __pyx_r;
 }
  22: @cython.exceptval(check=False)
  23: @cython.inline
  24: def isfinite(z: cython.doublecomplex) -> cython.bint:
+25: return cmath.isfinite(z.real) and cmath.isfinite(z.imag) try {
   __pyx_t_2 = std::isfinite(__Pyx_CREAL(__pyx_v_z));
} catch(...) {
   __Pyx_CppExn2PyErr();
   __PYX_ERR(1, 25, __pyx_L1_error)
}
if (__pyx_t_2) {
} else {
   __pyx_t_1 = __pyx_t_2;
   goto __pyx_L3_bool_binop_done;
}
try {
   __pyx_t_2 = std::isfinite(__Pyx_CIMAG(__pyx_v_z));
} catch(...) {
   __Pyx_CppExn2PyErr();
   __PYX_ERR(1, 25, __pyx_L1_error)
}
__pyx_t_1 = __pyx_t_2;
__pyx_L3_bool_binop_done:;
 __pyx_r = __pyx_t_1;
 goto __pyx_L0;

da-woods

unread,
Jun 22, 2024, 4:09:23 AM (9 days ago) Jun 22
to cython...@googlegroups.com
On 21/06/2024 08:43, 'BattleSushi' via cython-users wrote:
> If you're using Cython's native complex types then `val.real` and `val.imag` should work without needing to explicitly call C or C++.
You mean `cython.doublecomplex`? This produces code like
50: re: cython.double = z.real __pyx_t_1 = __Pyx_CREAL(__pyx_v_z);
__pyx_v_re = __pyx_t_1;

I am not very good with the compiled code, but from the color scheme alone I would infer that `__Pyx_CREAL` is a Python call, or is this unproblematic?

The implementation of __Pyx_CREAL varies slightly depending on your platform, but it's very cheap - basically the same as accessing the fields of a C/C++ complex.

What's happening there is that `std::isfinite` is declared as "could raise a C++ exception". I think that's probably wrong and it will never really raise a C++ exception. I'm sure that code costs something, but it should be pretty low - most C++ compilers don't generate a lot of overhead for unused exception handling.


Reply all
Reply to author
Forward
0 new messages