Numba 0.36 Released!

0 views
Skip to first unread message

Stanley Seibert

unread,
Dec 8, 2017, 5:04:11 PM12/8/17
to Numba Public Discussion - Public
Hi Numba Users,

I'm very excited to announce the release of Numba 0.36.1.  (We accidentally tagged 0.36.0 before making a change to improve detection of compilers for Numba's AOT time, so version 0.36.0 was never released.)  This release has some very large new features, which I'll describe below.  

I first want to acknowledge the support and contributions of Intel, especially the Intel Labs team of Todd Anderson, Paul Liu, and Ehsan Totoni.  In addition, I want to give kudos to Stuart Archibald, one of the core Numba developers, who worked very closely with the Intel Labs team to test and improve their contributions, as well as worked tirelessly to upgrade Numba's build process to use the new Anaconda Distribution 5.0 compilers.

Now for the new features!

LLVM 5
======

We've upgraded Numba to require llvmlite 0.21, which increases the required LLVM version to 5.0.  This should bring some minor improvements to code generation, especially for AVX-512 (now also available on Skylake, as well as Knight's Landing).  LLVM 5 also adds support for AMD Ryzen CPUs, although we don't have any with which to test.

Somewhat related, we've also started using tools like Helgrind (http://valgrind.org/docs/manual/hg-manual.html) to systematically search for subtle thread-safety issues in the compiler and how it uses LLVM.  (Note: this is about thread safety when the compiler is running, not when executing Numba-generated code.)  Some fixes have made it into this release, and some will appear in the next release.

Stencils
======

We're very excited to debut a new compiler decorator in this release: @stencil.  Similar to @vectorize, this allows you to write a simple kernel function that is expanded into a larger array calculation.  In this case, @stencil is for implementing "neighborhood" calculations, like local averages and convolutions.  The kernel function accesses a view of the full array using relative indexing (i.e. a[-1] means one element to the left of the central element) and returns a scalar that goes into the output array.  The ParallelAccelerator compiler passes can also multithread a stencil the same way they multithread an array expression.  The current @stencil implements only one option for boundary handling, but more can be added, and it does allow for asymmetric neighborhoods, which are important for trailing averages.

Big thanks to Intel for contributing this feature!  We'll have a blog post coming out in the next week that shows some stencil examples, but for now you can also take a look at the documentation: http://numba.pydata.org/numba-doc/latest/user/stencil.html

ParallelAccelerator Improvements
==========================

The Intel developers also implemented a number of bug fixes based on user feedback from the last release, as well as made improvements to range analysis and reductions.  The improved range analysis can now determine that a[1:n-1] has the same size as b[0:n-2], which increases opportunities for loop fusion.  Support for parallel reductions has been expanded to include min(), max(), argmin() and argmax(), as well as general reductions using functools.reduce.

New Anaconda Compilers
====================

As part of the Anaconda Distribution 5.0 release, Anaconda started using custom builds of GCC 7.2 (on Linux) and clang 4.0 (on OS X) to build conda packages in order to ensure the latest compiler performance and security features were enabled, even on older Linux distributions like CentOS 6.  (See https://www.anaconda.com/blog/developer-blog/utilizing-the-new-compilers-in-anaconda-distribution-5/ for more info.)  We've migrated the build process for Numba conda packages on Mac and Linux over to these compilers for consistency with the rest of the distribution.  When doing AOT compilation in Numba, it uses the same compiler that was used for NumPy, so on Anaconda it will remind you to install the required compiler packages with conda.

The Numba wheels will continue to be built with the system compilers used in previous releases to ensure compatibility with the manylinux1 spec (that supports Linux distributions back to CentOS 5).  We've also added a fix so the Numba wheels should work when the Python distribution lacks a libpython shared library.

Miscellaneous
===========

There are also a number of other fixes for user-reported bugs, as well as some minor feature improvements:
  • Support for passing tuples of indices to np.take()
  • Initialization of the RNG states in the CUDA xoroshiro128+ generator is done on the CPU, speeding it up significantly
  • A potential cache file collision in the on-disk function cache has been resolved


As always, you can get the new release in conda with:

    conda update numba

And the source and wheels are on PyPI:


(If you build from source, don't forget to install LLVM 5, build the new llvmlite, then build Numba!)

Coming up we'll be doing a CUDA-focused shorter development cycle to address some issues with pyculib and CUDA 9 support, then we'll switch back to general Numba development with a development cycle focused on internal improvements.  Thanks again to everyone for their questions, feature ideas, and bug reports!
Reply all
Reply to author
Forward
0 new messages