is numba ready for prime time?

239 views
Skip to first unread message

josef...@gmail.com

unread,
Jan 31, 2018, 2:12:11 PM1/31/18
to pystatsmodels
Triggered by a blog post by Matthew Rocklin, I tried out numba for the first time.
https://github.com/statsmodels/statsmodels/issues/4239

numba looks good where we can either avoid huge intermediate arrays, or need a recursive loop as in time series analysis that cannot be vectorized.

This might be useful when we don't have a cython speedup (yet).
E.g. current merged Holt-Winters uses a Python loop. Chad has a new PR that uses cython. numba could be used in between.

See end of Matthew's blog post for difficulties in maintaining or debugging numba code. However, if we just use it for some core functions like the recursive time series predict functions, then this might not be much of a problem.

I don't expect that numba will be a replacement for cython in heavy code like the kalman filters and statespace models.

And as aside: I'm an array/matrix person and have problems thinking quickly in loops.

Josef




Matthew Brett

unread,
Jan 31, 2018, 2:54:02 PM1/31/18
to pystatsmodels
Hi,

On Wed, Jan 31, 2018 at 7:12 PM, <josef...@gmail.com> wrote:
> Triggered by a blog post by Matthew Rocklin, I tried out numba for the first
> time.
> https://github.com/statsmodels/statsmodels/issues/4239
>
> numba looks good where we can either avoid huge intermediate arrays, or need
> a recursive loop as in time series analysis that cannot be vectorized.

I think that you'd be the first big scientific Python project to
depend on numba . There are pip wheels now, but they're pretty recent.
I don't know anyone who has used them, myself (partly because it isn't
used in any of the packages I support).

Cheers,

Matthew

Ralf Gommers

unread,
Jan 31, 2018, 3:19:25 PM1/31/18
to pystat...@googlegroups.com
On Thu, Feb 1, 2018 at 8:12 AM, <josef...@gmail.com> wrote:
Triggered by a blog post by Matthew Rocklin, I tried out numba for the first time.
https://github.com/statsmodels/statsmodels/issues/4239

numba looks good where we can either avoid huge intermediate arrays, or need a recursive loop as in time series analysis that cannot be vectorized.

This might be useful when we don't have a cython speedup (yet).
E.g. current merged Holt-Winters uses a Python loop. Chad has a new PR that uses cython. numba could be used in between.

See end of Matthew's blog post for difficulties in maintaining or debugging numba code. However, if we just use it for some core functions like the recursive time series predict functions, then this might not be much of a problem.

I don't expect that numba will be a replacement for cython in heavy code like the kalman filters and statespace models.

Why not? My concern about numba is about having the dependency in the first place (portability, debuggability). Once you have it installed and working, numba is easier to use as well as significantly faster than Cython.

Ralf
 

josef...@gmail.com

unread,
Jan 31, 2018, 3:19:37 PM1/31/18
to pystatsmodels
Kevin has it as optional dependency in https://github.com/bashtage/arch

How good is numba support in Debian?
Windows looks fine based on my brief trials. 
However, we had sometimes problems with Debian (or better Debian packaging had problems with us) because the dependencies for the doc build are much larger than for the main code, especially because of ipython/jupyter.

Early on statsmodels had cython as optional dependency for one or two years before making it required, extra work to maintain two versions became too large or difficult.

Josef

 

Cheers,

Matthew

josef...@gmail.com

unread,
Jan 31, 2018, 3:41:51 PM1/31/18
to pystatsmodels
I guess debugging a simple function like in my examples, and keeping the number of mistakes small in the first place, should be relatively easy.
statespace models are essentially low level "C" with a lot of BLAS/LAPACK usage. Some parts of the cython code look more like C than python to me.

I wouldn't trust numba to handle all the BLAS/LAPACK things correctly. And I would expect that the overall size and use of classes would make it more difficult for numba to optimize it. Also, AFAIK at that level there are no ducks, it's either float or complex and then we can as well precompile it. 
Some Julia users complain about startup or warmup time, and my guess is numba has the same problem for large code sizes, there is a visible effect even in my simple examples.

Maybe numba will manage to handle low level interfaces to scipy's C/cython/Fortran eventually or have fast high level equivalents, but it doesn't sound debugging numba usage is easy even compared to cython.

Josef

josef...@gmail.com

unread,
Jan 31, 2018, 4:10:53 PM1/31/18
to pystatsmodels
On Wed, Jan 31, 2018 at 2:12 PM, <josef...@gmail.com> wrote:
Triggered by a blog post by Matthew Rocklin, I tried out numba for the first time.
https://github.com/statsmodels/statsmodels/issues/4239

numba looks good where we can either avoid huge intermediate arrays, or need a recursive loop as in time series analysis that cannot be vectorized.

Another area that is currently slow in statsmodels is in some of the kernel methods in nonparametrics. Some of those loops are currently in Python and cannot easily be vectorized, or only by blowing up the memory.

If the numba.jit decorator works for some of the core looping functions, then this would be much easier given that we currently don't have a cython developer in that neighborhood.

A related example for difficulty in debugging nested loops: 
Our current nonparametric.lowess cannot handle a large number of ties in x. I gave up after trying to debug the cython loops for several hours because keeping track of various neighborhood indices and figuring out how to adjust them for ties was just too confusing and painful, and that's only for one dimensional arrays.

(My pain tolerance for marginal use cases is not very high.)

Josef

Matthew Brett

unread,
Jan 31, 2018, 5:03:24 PM1/31/18
to pystatsmodels
Hi,
Yes, I agree completely with Ralf, about the main concern being the
dependency. I am sure that's why all the other big projects have
avoided it as a dependency thus far. Honestly, given your lack of
developer resources, it seems like a pretty bad idea to be the first
project in the ecosystem to iron out those issues. Why not wait until
scikit-image or scikit-learn has taken the plunge with a release, and
reconsider then?

See you,

Matthew

josef...@gmail.com

unread,
Jan 31, 2018, 6:44:48 PM1/31/18
to pystatsmodels
Waiting for others to figure out the problems is my/our usual way, e.g. switching to pytest.
Even if we start to plan for it, it won't happen very fast.

However, there are several applications within statsmodels where it might be a "cheap" speedup and let's us get closer to other packages in terms of speed.
Some things are worth the risk, e.g. AFAIK statsmodels was the first large package with ipython in the docs and the first large package to use the c/cython interface to scipy's BLAS/LAPACK, and of course pandas and patsy. (None of that was my work, except for sometimes as cheerleader.)

The trigger for me was "pairwise" because I have some functions like that that require either loops or large memory in a reduce operation, cases where einsum could be used except the computations are not limited to multiplication.

Josef


 

See you,

Matthew

josef...@gmail.com

unread,
Jan 31, 2018, 7:30:46 PM1/31/18
to pystatsmodels
fun reading: How the performance game is played.


What are the preferences for the trade-off between performance and code uglification?

Josef
spell check
uglify, uglification: First Known Use: 1576
 

Josef


 

See you,

Matthew


josef...@gmail.com

unread,
Jan 31, 2018, 11:26:28 PM1/31/18
to pystatsmodels
one step closer
Both travis and appveyor work as expected. I used numba only on one version each, the others use the pure python loop and fail the speed unit test. In this case numba is optional similar to matplotlib with a simple import check.

In both ci we install from conda.


I didn't try to figure out how to get the cache decorator to work, nor how use type annotation or specification. 150 times faster looks already pretty good to me.


Josef

Matthew Brett

unread,
Feb 1, 2018, 7:17:48 AM2/1/18
to pystatsmodels
Yes, if you only use conda and Anaconda, I imagine you won't hit many
problems. Your maintenance problems will likely come from your pip
users.

Cheers,

Matthew

Tom Augspurger

unread,
Feb 1, 2018, 7:25:19 AM2/1/18
to pystat...@googlegroups.com
I tested it out the branch with virtualenv + pip, and everything worked great (MacOS).

Cheers,

Matthew

Matthew Brett

unread,
Feb 1, 2018, 7:35:15 AM2/1/18
to pystatsmodels
Sure, but the problems will come from a statsmodels release, where you
will have a large number of possible configurations being used.
You'll effectively become stress testers for the numba wheels on
Windows / macOS and Linux. I suppose the numba wheels contain a LLVM
compiler... Obviously you can avoid most of the risk by making it an
optional dependency, but that will add extra maintenance burden. You
could also expand your travis-ci / appveyor testing to cover a wider
range of configurations, but that will also take some maintenance
time. My guess is that things will nearly always be fine using conda
(numba is a Continuum product after all). For pip, it's difficult to
predict, and I doubt you'll know for sure until you release.

Cheers,

Matthew

Tom Augspurger

unread,
Feb 1, 2018, 7:49:34 AM2/1/18
to pystat...@googlegroups.com
In my experience the numba devs have been happy to help debug issues.
 
Cheers,

Matthew

josef...@gmail.com

unread,
Feb 1, 2018, 8:35:28 AM2/1/18
to pystatsmodels
I assume that pure install problems can be redirected to the numba support, mailing list, stackoverflow.

We can have a one to two year trial period where numba is optional and used only for a few functions. If we start within the next year, then we might just be ready to become serious when others have figured out the problems.
(And then we don't have to worry too much about Julia competition if we stay close enough without uglifying our codebase.)

In terms of relative maintenance cost:
If we add the actual model for PoissonARMA, GLMARMA  or nonlinear timeseries with exponential mean function (log link), then I expect a week of debugging convergence, overflow and non-stationarity problems, especially after users throw different dataset on it. Those should be independent of numba, as long as the basic functions behave the same in numerical corner or extreme cases.

stencils sound interesting for local nonparametrics


Josef



 
 
Cheers,

Matthew


josef...@gmail.com

unread,
Feb 1, 2018, 9:54:11 AM2/1/18
to pystatsmodels
On Wed, Jan 31, 2018 at 3:19 PM, <josef...@gmail.com> wrote:


On Wed, Jan 31, 2018 at 2:53 PM, Matthew Brett <matthe...@gmail.com> wrote:
Hi,

On Wed, Jan 31, 2018 at 7:12 PM,  <josef...@gmail.com> wrote:
> Triggered by a blog post by Matthew Rocklin, I tried out numba for the first
> time.
> https://github.com/statsmodels/statsmodels/issues/4239
>
> numba looks good where we can either avoid huge intermediate arrays, or need
> a recursive loop as in time series analysis that cannot be vectorized.

I think that you'd be the first big scientific Python project to
depend on numba . There are pip wheels now, but they're pretty recent.
I don't know anyone who has used them, myself (partly because it isn't
used in any of the packages I support).

Kevin has it as optional dependency in https://github.com/bashtage/arch

How good is numba support in Debian?

availability of numba across architectures looks good

The statsmodels page just says "all" architectures but that might be close to the same set.
I don't remember how to find test results for individual architectures in the Debian information system.

I didn't try to check availability of backports. 

Josef

josef...@gmail.com

unread,
Feb 1, 2018, 7:23:07 PM2/1/18
to pystatsmodels
On Wed, Jan 31, 2018 at 3:41 PM, <josef...@gmail.com> wrote:


On Wed, Jan 31, 2018 at 3:19 PM, Ralf Gommers <ralf.g...@gmail.com> wrote:


On Thu, Feb 1, 2018 at 8:12 AM, <josef...@gmail.com> wrote:
Triggered by a blog post by Matthew Rocklin, I tried out numba for the first time.
https://github.com/statsmodels/statsmodels/issues/4239

numba looks good where we can either avoid huge intermediate arrays, or need a recursive loop as in time series analysis that cannot be vectorized.

This might be useful when we don't have a cython speedup (yet).
E.g. current merged Holt-Winters uses a Python loop. Chad has a new PR that uses cython. numba could be used in between.

See end of Matthew's blog post for difficulties in maintaining or debugging numba code. However, if we just use it for some core functions like the recursive time series predict functions, then this might not be much of a problem.

I don't expect that numba will be a replacement for cython in heavy code like the kalman filters and statespace models.

Why not? My concern about numba is about having the dependency in the first place (portability, debuggability). Once you have it installed and working, numba is easier to use as well as significantly faster than Cython.

I guess debugging a simple function like in my examples, and keeping the number of mistakes small in the first place, should be relatively easy.
statespace models are essentially low level "C" with a lot of BLAS/LAPACK usage. Some parts of the cython code look more like C than python to me.

I wouldn't trust numba to handle all the BLAS/LAPACK things correctly. And I would expect that the overall size and use of classes would make it more difficult for numba to optimize it. Also, AFAIK at that level there are no ducks, it's either float or complex and then we can as well precompile it. 
Some Julia users complain about startup or warmup time, and my guess is numba has the same problem for large code sizes, there is a visible effect even in my simple examples.

another improvement:
When I use an `out` parameter to avoid allocating the results array in the jitted function, then caching works and startup/warmup time almost disappears.

if has_numba:
    predict_exparma = numba.jit(nopython=True, cache=True)(_predict_exparma)
This looks like we won't be able to get the cache benefits with current numba if we need python to allocate objects. 


My standard workflow is to run a python script during development and debugging each time in a new python process, and startup time is important for this and similar scripting cases.

Josef

Kevin Sheppard

unread,
Feb 2, 2018, 3:28:58 AM2/2/18
to pystatsmodels
I suspect that many projects that may be interested are waiting for a 1.0 release.  There are semi-frequent regressions that require waiting a month or more for a fix. I think once 1.0 gets closer these should likely stop happening.

The other issue that may affect some projects is that not all dtypes are supported for wrapped functions.  In particular, it is usually the case for linear algebra that double and complex128 are supported, but not float or complex64.  A related issue is how BLAS support is baked into Numba.  In conda it uses MKL.  When you install from pip what BLAS is used?

Overall I think it is a pretty simple thing to make it a soft dependency.  I didn't do this, but it would probably be a good idea to also have an environmental variable to disable Numba even if installed in case there is an upstream break.

In terms of support outside of conda and possibly pip, I suspect that is is pretty worthless.  Numba has monthly releases and so most traditional packing environments are rapidly out-of-date. 

josef...@gmail.com

unread,
Feb 2, 2018, 8:43:04 AM2/2/18
to pystatsmodels
On Fri, Feb 2, 2018 at 3:28 AM, Kevin Sheppard <kevin.k....@gmail.com> wrote:
I suspect that many projects that may be interested are waiting for a 1.0 release.  There are semi-frequent regressions that require waiting a month or more for a fix. I think once 1.0 gets closer these should likely stop happening.

The other issue that may affect some projects is that not all dtypes are supported for wrapped functions.  In particular, it is usually the case for linear algebra that double and complex128 are supported, but not float or complex64.  A related issue is how BLAS support is baked into Numba.  In conda it uses MKL.  When you install from pip what BLAS is used?

AFAICS, that's the same as we currently use eg. in the statespace cython
according to
numba also uses whatever BLAS/LAPACK library comes with scipy

The advantage is that scipy devs put a lot of effort in getting a cython api that is independent of the underlying BLAS/LAPACK implementation

 

Overall I think it is a pretty simple thing to make it a soft dependency.  I didn't do this, but it would probably be a good idea to also have an environmental variable to disable Numba even if installed in case there is an upstream break.

AFAICS, the standard numba environment variable still applies and can be used by users, or during testing, to disable numba jit and make jit a no-op
 

In terms of support outside of conda and possibly pip, I suspect that is is pretty worthless.  Numba has monthly releases and so most traditional packing environments are rapidly out-of-date. 

That's potentially a big problem for us.
However, I guess that if we restrict ourselves initially to basic usage like simple nobs or (nobs, nobs) loops, then we should have relatively stable behavior. That wouldn't be much different from how we treat or have treated fast moving changes in pandas or scipy. 
Do you think that is stable enough, e.g. from your experience with arch?

Josef


Matthew Brett

unread,
Feb 9, 2018, 12:44:20 PM2/9/18
to pystatsmodels
Hi,

Relevant to this discussion:
https://matthewrocklin.com/blog/work/2018/01/30/the-case-for-numba

(Just flagged in an email to the pydata mailing list by "John E").

In particular, see the section:

Update from the original blogpost authors

Cheers,

Matthew

josef...@gmail.com

unread,
Feb 9, 2018, 1:01:40 PM2/9/18
to pystatsmodels
On Fri, Feb 9, 2018 at 12:43 PM, Matthew Brett <matthe...@gmail.com> wrote:
Hi,

Relevant to this discussion:
https://matthewrocklin.com/blog/work/2018/01/30/the-case-for-numba

(Just flagged in an email to the pydata mailing list by "John E").

In particular, see the section:

Update from the original blogpost authors

That's the blog post that got me started in looking into this.

The final comments mainly tell me to start small with basic loops.
We don't have a need for now to use numba for example in heavy linear algebra methods.

In contrast, we really needed the cython based linear algebra as soon as it became available in scipy, and Skipper and then Chad put a lot of effort in that. But for numba there is no need for us to be the "debuggers".

Josef

Matthew Brett

unread,
Feb 9, 2018, 1:17:56 PM2/9/18
to pystatsmodels
Hi,

On Fri, Feb 9, 2018 at 6:01 PM, <josef...@gmail.com> wrote:
>
>
> On Fri, Feb 9, 2018 at 12:43 PM, Matthew Brett <matthe...@gmail.com>
> wrote:
>>
>> Hi,
>>
>> Relevant to this discussion:
>> https://matthewrocklin.com/blog/work/2018/01/30/the-case-for-numba
>>
>> (Just flagged in an email to the pydata mailing list by "John E").
>>
>> In particular, see the section:
>>
>> Update from the original blogpost authors
>
>
> That's the blog post that got me started in looking into this.
>
> The final comments mainly tell me to start small with basic loops.
> We don't have a need for now to use numba for example in heavy linear
> algebra methods.
>
> In contrast, we really needed the cython based linear algebra as soon as it
> became available in scipy, and Skipper and then Chad put a lot of effort in
> that. But for numba there is no need for us to be the "debuggers".

I was looking particularly at:

"""
However, I would never use Numba to build larger systems, precisely
for the reason Jake mentioned. Subjectively, Numba feels hard to
debug, has cryptic error messages, and seemingly inconsistent
behavior. It is not a “decorate and forget” solution; instead it
always involves plenty of fiddling to get right.
"""

Cheers,

Matthew

josef...@gmail.com

unread,
Feb 9, 2018, 2:12:47 PM2/9/18
to pystatsmodels
That's what I meant before.

We won't use numba as replacement for cython in the large system like the statespace framework.
But it seems to me that using it instead of cython for some time consuming but relatively simple loops shouldn't require a large amount of debugging.

For example the recursive time series loop is just the core predict function for a model where everything else is in Python.
And I don't care about gaining another 20% speedup (after getting 150%) by using "fancy" numba that is still changing and more difficult to debug.

In the example, my "fiddling" was to move intermediate array creation outside the numba loop to get nopython mode with caching to work. 
That wasn't too hard, even if error messages before where roughly "nopython doesn't work in this case".

Josef

 

Cheers,

Matthew

Matthew Brett

unread,
Feb 9, 2018, 3:41:49 PM2/9/18
to pystatsmodels
You're thinking of debugging in the sense of getting the right answer,
I think the respondents were thinking about trying to work out how to
make things go faster - this is Jake:

"""
One thing I think would be worth fleshing-out a bit (you mention it in
the final bullet list) is the fact that numba is kind of a black box
from the perspective of the developer. My experience is that it works
well for straightforward applications, but when it doesn’t work well
it’s *extremely difficult to diagnose what the problem might be.*
"""

By "work well" - I think he means "makes the code go much faster". So
I think he's saying that - in contrast to Cython - it's harder to
reason about how the optimizations are working, and therefore, how to
get the best speedups - except in the simplest cases.

My impression from the later comments was that, if you really want to
avoid compiling and Cython, then numba may be a good choice, but if
you're already committed to Cython, it may not be:

"""
That being said, if I were to build some high-level scientific library
à la Astropy with some few performance bottlenecks, I would definitely
favor Numba over Cython (and if it’s just to spare myself the headache
of getting a working C compiler on Windows).
"""

Cheers,

Matthew

josef...@gmail.com

unread,
Feb 12, 2018, 9:08:42 AM2/12/18
to pystatsmodels
One important point above 
"My experience is that it works well for straightforward applications"

I also meant getting numba to work fast, given that it's essentially python, debugging the algorithm won't be different from the usual.
There might be cases where numba doesn't behave as expected in terms of the algorithm which might need correctness debugging.

I was browsing a bit the numba questions on stackoverflow and the mailing list.

My impression was that those questions sound too advanced. If we can get 80% of the possible speedup with the straightforward usage, then we wouldn't need the extra 20% with the complicated things, at least for the first few years. Similar, the parallel features have just been added back in, AFAICS. I would not use it for now and wait until this has settled and we might be able to include it in a few years.

> but if you're already committed to Cython, it may not be <a good choice>


As package, statsmodels is commited to Cython and that will not change. However, Chad is currently essentially the only cython developer.
The potential advantage of numba is that we can just use a decorator, spend a few hours on timing and adjustments and it runs.

Josef

 

Cheers,

Matthew

Matthew Brett

unread,
Feb 12, 2018, 9:20:27 AM2/12/18
to pystatsmodels
Hi,
Yes, indeed, that's what I think these comments are referring to. If
your problem is simple, it may well work without much extra effort; if
does not work well, you may spend a long time trying to work out why.
In practice, what I think these comments are saying, is that numba
promises easy gains without much investment of time, but in practice,
it will get the time back from you, as you try to work out what's
happening in the not-simple cases. That lost time could have been
spent learning Cython, which you already need.

Cheers,

Matthew

Ralf Gommers

unread,
Feb 12, 2018, 8:20:00 PM2/12/18
to pystat...@googlegroups.com
From my experience, Josef's expectation is right. Simple cases (and that covers what Josef is talking about) are simple, all you need to understand is %timeit and @jit. You're also disregarding that numba's performance is usually significantly better than Cython.

Also, I expect that it's not always numba or cython - it may be "try @jit, if it's not faster then never mind just keep it in Python".

Ralf


Reply all
Reply to author
Forward
0 new messages