Canonical way for using SIMD in Cython?

336 views
Skip to first unread message

Sebastian M. Ernst

unread,
Jun 22, 2021, 5:10:09 PM6/22/21
to cython...@googlegroups.com
Hi all,

digging through the archives of this mailing lists and plenty of code
online, it appears that many people use SIMD intrinsics in Cython, i.e.
`x86intrin.h` and friends, just like one would do in plain C.
Apparently, this practice is frowned upon in certain circles because it
is specific to x86 and can not (easily at least) be ported to other
architectures. If x86 is the target architecture that one is optimizing
for exclusively, one could stick to intrinsics and be ok. However, I am
seeing more and more ARM popping up, also in HPC arena. RISC is on the
horizon, too. In this context, the commonly recommended approach is to
go for something more portable like GCC's vector extensions [2]. I have
yet not managed to find Cython code which is going down this particular
path - could be for technical reasons (?) or simply me not looking in
the right places.

What is the canonical approach in Cython - if there is any?

Thanks,
Sebastian


1: https://github.com/gcc-mirror/gcc/blob/master/gcc/config/i386/x86intrin.h
2: http://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html

Jérôme Kieffer

unread,
Jun 23, 2021, 6:05:57 AM6/23/21
to Sebastian M. Ernst, cython...@googlegroups.com
On Tue, 22 Jun 2021 22:04:22 +0200
"Sebastian M. Ernst" <er...@pleiszenburg.de> wrote:

> Hi all,
>
> digging through the archives of this mailing lists and plenty of code
> online, it appears that many people use SIMD intrinsics in Cython, i.e.
> `x86intrin.h` and friends, just like one would do in plain C.
> Apparently, this practice is frowned upon in certain circles because it
> is specific to x86 and can not (easily at least) be ported to other
> architectures. If x86 is the target architecture that one is optimizing
> for exclusively, one could stick to intrinsics and be ok. However, I am
> seeing more and more ARM popping up, also in HPC arena. RISC is on the
> horizon, too. In this context, the commonly recommended approach is to
> go for something more portable like GCC's vector extensions [2]. I have
> yet not managed to find Cython code which is going down this particular
> path - could be for technical reasons (?) or simply me not looking in
> the right places.
>
> What is the canonical approach in Cython - if there is any?

Interesting question ...
I use PyOpenCL when small vectors are needed for performances ... with a CPU driver (intel or PoCL)
but this does not answer your question.

Cheers,
--
Jerome

Jonathan Kliem

unread,
Jun 23, 2021, 8:13:45 AM6/23/21
to cython-users
Hello Sebastian,

I basically have been asking myself the same question and did not find a satisfying answer yet. We are just outsourcing the actual intrinsics to a C-header file, which can handle preprocessor instructions:


There is a file bitset_base.pxd, which does everything except for the actual intrinsics, which are delegated to bitset_intrinsics.h

I don't know if such a solution is acceptable for you. It is portable at least. The code needs to be compiled for the specific machine and the compiler needs to be told to use intrinsics, e.g. -march=native.

Jonathan
Reply all
Reply to author
Forward
0 new messages