LLVM Transformation Pass Pipeline

luke panayi

unread,

Aug 13, 2021, 3:44:03 PM8/13/21

to Numba Public Discussion - Public

I'm doing compiler research for a masters dissertation and wanted to investigate what LLVM passes Numba uses to optimise Python code. I've tried to look through the repo and the repo map in the docs but can't find where that part might be though, if anyone can point me to where in the repo the passes are added I'd appreciate it.

If anybody is able to share any information on why Numba's pass pipeline is constructed how it is that would be a big help too. Obviously the typical O3 pipeline used by
Clang is fine tuned to C/++, so I imagine Numba makes changes that better suit Python/it's own IR, and any info on the reasoning behind them would be really appreciated.

Thanks.

Stanley Seibert

unread,

Aug 16, 2021, 10:21:44 AM8/16/21

to Numba Public Discussion - Public

Hi Luke,

This mailing list is slowly being deprecated, so a better place to ask your question might be the Numba forums: https://numba.discourse.group/

That said, since Numba's primary use case is numerical code, much of our interest is in enabling automatic SIMD vectorization in LLVM when possible. You can see what we do here:

https://github.com/numba/numba/blob/master/numba/core/codegen.py#L1190-L1194

Where we create two module pass managers. A cheap -O0 pass and a more expensive -O3 pass, with some workarounds to deal with SIMD regressions in LLVM 11. The factory function for the pass managers is a little further down:

https://github.com/numba/numba/blob/master/numba/core/codegen.py#L1206-L1228

The cheap and full pass managers are run here, with a custom reference count pruning pass in between:

https://github.com/numba/numba/blob/master/numba/core/codegen.py#L668-L683

Removing unnecessary reference count operations is especially important for Numba, because within Numba-compiled code, we use a thread-safe atomic reference count, which is much slower than the standard Python reference count.

--
You received this message because you are subscribed to the Google Groups "Numba Public Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numba-users...@continuum.io.
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/numba-users/1b31eb22-f43b-4db4-bbee-04025cce9ec4n%40continuum.io.

Siu Kwan Lam

unread,

Aug 16, 2021, 3:24:15 PM8/16/21

to Numba Public Discussion - Public

Hi Luke,

Adding to Stan's information, Numba is designed with the limitation of LLVM in mind. Knowing that LLVM is tuned for C/C++, we designed Numba to emit LLVM IR in a similar fashion to what a C frontend would do. The bulk of the high-level optimization is done before reaching LLVM IR. For instance, array expressions are lowered as explicit loop nests so that it looks like:

for (int i=0; i<N; ++i) {

for (int j=0l j<M; ++j) {

...

}

That gives LLVM a familiar workload to perform its loop optimizations.

Type inference is used to statically resolve much of dynamism early on. Even though that means the Python we support is very restrictive, actual numeric use-cases rarely require runtime polymorphism. Also, after Numba's type inference, each Python function is fully specialized to its call-site. You can imagine that we treat every Python function as if it is a C++ templated function, and type inference supplies the template parameter for each invocation. For the remaining unsupported cases, our excuse is that: users are expected to strategically apply Numba JIT to compute intensive code 😉.

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/numba-users/CADv3RKS-n8EiWkKY0%3DST%2BqezquYBwEm9yKBFquuPeRxxcNDzYA%40mail.gmail.com.

Reply all

Reply to author

Forward