Triggered by a blog post by Matthew Rocklin, I tried out numba for the first time.
https://github.com/statsmodels/statsmodels/issues/4239
numba looks good where we can either avoid huge intermediate arrays, or need a recursive loop as in time series analysis that cannot be vectorized.This might be useful when we don't have a cython speedup (yet).E.g. current merged Holt-Winters uses a Python loop. Chad has a new PR that uses cython. numba could be used in between.See end of Matthew's blog post for difficulties in maintaining or debugging numba code. However, if we just use it for some core functions like the recursive time series predict functions, then this might not be much of a problem.I don't expect that numba will be a replacement for cython in heavy code like the kalman filters and statespace models.
Cheers,
Matthew
Triggered by a blog post by Matthew Rocklin, I tried out numba for the first time.
https://github.com/statsmodels/statsmodels/issues/4239
numba looks good where we can either avoid huge intermediate arrays, or need a recursive loop as in time series analysis that cannot be vectorized.
See you,
Matthew
Josef
See you,
Matthew
Cheers,
Matthew
Cheers,
Matthew
Cheers,
Matthew
On Wed, Jan 31, 2018 at 2:53 PM, Matthew Brett <matthe...@gmail.com> wrote:Hi,
On Wed, Jan 31, 2018 at 7:12 PM, <josef...@gmail.com> wrote:
> Triggered by a blog post by Matthew Rocklin, I tried out numba for the first
> time.
> https://github.com/statsmodels/statsmodels/issues/4239
>
> numba looks good where we can either avoid huge intermediate arrays, or need
> a recursive loop as in time series analysis that cannot be vectorized.
I think that you'd be the first big scientific Python project to
depend on numba . There are pip wheels now, but they're pretty recent.
I don't know anyone who has used them, myself (partly because it isn't
used in any of the packages I support).Kevin has it as optional dependency in https://github.com/bashtage/archHow good is numba support in Debian?
On Wed, Jan 31, 2018 at 3:19 PM, Ralf Gommers <ralf.g...@gmail.com> wrote:On Thu, Feb 1, 2018 at 8:12 AM, <josef...@gmail.com> wrote:Triggered by a blog post by Matthew Rocklin, I tried out numba for the first time.
https://github.com/statsmodels/statsmodels/issues/4239
numba looks good where we can either avoid huge intermediate arrays, or need a recursive loop as in time series analysis that cannot be vectorized.This might be useful when we don't have a cython speedup (yet).E.g. current merged Holt-Winters uses a Python loop. Chad has a new PR that uses cython. numba could be used in between.See end of Matthew's blog post for difficulties in maintaining or debugging numba code. However, if we just use it for some core functions like the recursive time series predict functions, then this might not be much of a problem.I don't expect that numba will be a replacement for cython in heavy code like the kalman filters and statespace models.Why not? My concern about numba is about having the dependency in the first place (portability, debuggability). Once you have it installed and working, numba is easier to use as well as significantly faster than Cython.I guess debugging a simple function like in my examples, and keeping the number of mistakes small in the first place, should be relatively easy.statespace models are essentially low level "C" with a lot of BLAS/LAPACK usage. Some parts of the cython code look more like C than python to me.I wouldn't trust numba to handle all the BLAS/LAPACK things correctly. And I would expect that the overall size and use of classes would make it more difficult for numba to optimize it. Also, AFAIK at that level there are no ducks, it's either float or complex and then we can as well precompile it.Some Julia users complain about startup or warmup time, and my guess is numba has the same problem for large code sizes, there is a visible effect even in my simple examples.
if has_numba:
predict_exparma = numba.jit(nopython=True, cache=True)(_predict_exparma)I suspect that many projects that may be interested are waiting for a 1.0 release. There are semi-frequent regressions that require waiting a month or more for a fix. I think once 1.0 gets closer these should likely stop happening.The other issue that may affect some projects is that not all dtypes are supported for wrapped functions. In particular, it is usually the case for linear algebra that double and complex128 are supported, but not float or complex64. A related issue is how BLAS support is baked into Numba. In conda it uses MKL. When you install from pip what BLAS is used?
Overall I think it is a pretty simple thing to make it a soft dependency. I didn't do this, but it would probably be a good idea to also have an environmental variable to disable Numba even if installed in case there is an upstream break.
In terms of support outside of conda and possibly pip, I suspect that is is pretty worthless. Numba has monthly releases and so most traditional packing environments are rapidly out-of-date.
Hi,
Relevant to this discussion:
https://matthewrocklin.com/blog/work/2018/01/30/the-case-for-numba
(Just flagged in an email to the pydata mailing list by "John E").
In particular, see the section:
Update from the original blogpost authors
Cheers,
Matthew
Cheers,
Matthew