Speed difference when using Python's builtin math instead of numpy in jitted function

36 views
Skip to first unread message

William Shipman

unread,
Jan 10, 2016, 8:55:23 PM1/10/16
to numba...@continuum.io
Hi,

I've been comparing jitted functions using functions from the standard math package vs using functions from NumPy to see which approach is faster. I'm doing this using a template that I compile at runtime (Python's exec) to define the jitted function:

code = '''
import {module:s}
from numba import jit
def f(x):
    sum = 0.0
    for idx in range(x.size):
        sum += {module:s}.{function:s}(x[idx])
    return sum
jit_f = jit(f)
array_fn = {module:s}.{function:s}
'''
I replace module with math or numpy and function with one of several functions I'm testing. Then I time the jit_f function. Here's my version information:

Python version: 2.7.11 |Anaconda 2.4.1 (64-bit)| (default, Dec  7 2015, 14:10:42) [MSC v.1500 64 bit (AMD64)]
NumPy version:  1.10.1
Numba version:  0.22.1

I'm testing on 10**6 element NumPy arrays, float64 type. I've attached my source code. Run it using "python numpynmbaspeedtest.py -N 1000000" to duplicate the results.

Most of the time, the speeds are quite close for both versions. But sometimes there's a >5x speedup using the math package instead of NumPy. This happens with ceil, floor and trunc.

I inspected the generated assembler code (inspect_asm method of the jitted function) and found that the math version of the function gets replaced with an inline equivalent, while the NumPy version remains as a function call (numba.npymath.ceil gets called for example). When both versions performed similarly, they were both using function calls.

There are also two examples where the inlining actually disadvantaged the math package: degrees and radians. Here, using math.degrees or math.radians was actually about 3~4 times slower than using numpy.degrees/numpy.radians.

My questions:
  1. What causes these functions to get inlined?
  2. Am I doing this comparison fairly? Is there something needed to get the NumPy functions inlined?
  3. What is the difference between numba.npymath.exp and exp (used when the jitted function calls math.exp)? Same question applies for other numba.npymath functions.
  4. Why are the inlined versions of math.degrees and math.radians so much slower than calling npymath functions? Aren't these are simple enough conversions that the inline versions should be really fast?
Thanks for the info.

Regards,
William.
numpynumbaspeedtest.py

Stanley Seibert

unread,
Jan 11, 2016, 10:08:41 AM1/11/16
to Numba Public Discussion - Public
Hi William,

This is a very interesting study!  Thanks for posting the results.  I think all the differences you are seeing are due to a historical quirk of Numba that we should address.

When the math module was implemented in Numba, the approach taken was to use LLVM intrinsic operations when possible.  The LLVM IR that Numba emits for compilation supports a number of basic math operations, like sin, cos, exp, ceil, trunc, etc.  Because these instructions are LLVM intrinsics, the compiler target decides how to generate the ASM, which means it might be able to inline the implementation for simple functions.  For other functions (like the transcendentals), LLVM emits a function call to the platform's math library.

NumPy also exports a C interface for many of the special math functions that it supports.  I believe this interface exists to work around variation in the availability (or correctness?) of platform math library functions.  Because several of the math functions in the numpy module do not exist as LLVM IR intrinsics, we used the NumPy C functions pretty consistently in the Numba definition of the numpy module.  Unfortunately, LLVM cannot inline the implementation of a C function in a shared library, so we incur the function call even for simple functions.

It's clear from your test that we should make the numpy.* implementations the same as the math.* implementations when they are faster.  Could you open an issue in our bug tracker with your results?

As for the speed degradation when inlining conversions between degrees and radians, I agree that is very puzzling.  That should be a separate issue in the bug tracker we investigate.


--
You received this message because you are subscribed to the Google Groups "Numba Public Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numba-users...@continuum.io.
To post to this group, send email to numba...@continuum.io.
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/numba-users/CAOUkfAxBUKWkNnEfSsjUf8x%3DpwnLbPMJmneu0mC%2BEBvxqnNOcA%40mail.gmail.com.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

William Shipman

unread,
Jan 11, 2016, 12:06:34 PM1/11/16
to numba...@continuum.io
Thanks for the feedback Stanley, I'll create new issues on the tracker and put the code and results online. I was doing this for a blog post, so everything will appear there too.

If there's variation in the platform math libraries for a particular function, rather stick with the most reliable implementation. Personally I'd prefer the version that works without weird hassles and gives the same results which ever OS you choose. What might be useful here is some sort of "fast math" flag. By default, its set to false and the reliable functions get used. However, if the user doesn't care about accuracy - e.g. some image processing, they can set this flag to true and the faster, but unreliable functions get used.

I'll reply here when everything is available online.


Stanley Seibert

unread,
Jan 11, 2016, 12:31:16 PM1/11/16
to Numba Public Discussion - Public
I definitely agree about the reliability criteria.  Fortunately, the functions where you see the most speed discrepancy are relatively simple functions where consistency and portability should not be a problem.  We'll have to evaluate this on a case-by-case basis.

Regardless of how the implementation is done, we always compare the results in our unit tests of the Numba-compiled implementation to the math or numpy module (as appropriate) on every platform we support .  We may beef up the number of test cases just to make sure we are adequately covering the corner cases.

Naveen Michaud-Agrawal

unread,
Jan 11, 2016, 1:06:02 PM1/11/16
to numba...@continuum.io
Have you guys thought of using Hypothesis (https://hypothesis.readthedocs.org/en/latest/) for generating/testing corner cases? It's similar to QuickCheck on haskell.

Naveen




--
-----------------------------------
Naveen Michaud-Agrawal

Stanley Seibert

unread,
Jan 11, 2016, 2:39:26 PM1/11/16
to Numba Public Discussion - Public
Yeah, I've looked at Hypothesis.  It does seem to have strategies for testing float arguments, which would be relevant.  We would need to figure out how this fits into our existing test framework, which is already highly generative.  (To cover different data types and values at the same time.)

Stanley Seibert

unread,
Jan 12, 2016, 9:21:55 AM1/12/16
to Numba Public Discussion - Public
FYI: Antoine took a look through the functions you mentioned, and we've fixed the performance on the master branch of the following:
  • math.isfinite()
  • math.degrees()
  • math.radians()
  • np.degrees()
  • np.radians()
  • np.floor()
  • np.rint()
  • np.ceil()
  • np.trunc()
  • np.copysign()
  • np.fabs()

These functions should all have inline-able implementations now, and also interfere less with SIMD autovectorization in the compiler.  This will make it into the Numba 0.23 release, which should be out by the end of the week.

William Shipman

unread,
Jan 12, 2016, 4:50:24 PM1/12/16
to numba...@continuum.io
Reply all
Reply to author
Forward
0 new messages