lmfit and numba

Julian Stürmer

unread,

Feb 26, 2016, 10:05:47 PM2/26/16

to lmfit-py

Hi,

I'm trying to speed up fitting by using numba. I havn't got much experience in using numba, however, it seems that it gives a speed up of roughly a factor of four for simple functions:

from lmfit.models import moffat
from numba import jit
import numpy as np

@jit
def moffat_numba(x, amplitude=1., center=0., sigma=1., beta=1.):
    return amplitude / (((x - center)/sigma)**2 + 1)**beta

if __name__ == '__main__':
    import time
    xs = np.linspace(-4, 4, 100)
    ys = moffat(xs, 2.5, 0, 0.5, 1.)
    np.random.seed(1)
    moffat_numba(xs)  # call it once becaues it's getting compiled at first call I guess?!

    t = time.time()
    for i in range(100000):
        moffat(xs)

    t2 = time.time()
    for i in range(100000):
        moffat_numba(xs)
    t3 = time.time()

    print 'without numba', t2-t
    print 'numba', t3-t2

gives me
without numba 1.10714912415
numba 0.231529951096

However, when I try to build a lmfit Model that uses the numba functions I get an error:

from lmfit.models import *
from lmfit.lineshapes import  *
from numba import jit

@jit
def moffat_numba(x, amplitude=1., center=0., sigma=1., beta=1.):
    return amplitude / (((x - center)/sigma)**2 + 1)**beta

class MoffatModelnumba(Model):
    __doc__ = moffat.__doc__ + COMMON_DOC if moffat.__doc__ else ""
    def __init__(self, *args, **kwargs):
        super(MoffatModelnumba, self).__init__(moffat_numba, *args, **kwargs)
        self.set_param_hint('fwhm', expr="2*%ssigma*sqrt(2**(1.0/%sbeta)-1)" % (self.prefix, self.prefix))

    def guess(self, data, x=None, negative=False, **kwargs):
        pars = guess_from_peak(self, data, x, negative, ampscale=0.5, sigscale=1.)
        return update_param_vals(pars, self.prefix, **kwargs)


if __name__ == '__main__':
    import time
    xs = np.linspace(-4, 4, 100)
    ys = moffat(xs, 2.5, 0, 0.5, 1.)
    np.random.seed(1)
    moffat_numba(xs)

    t = time.time()
    mod = MoffatModel()
    for i in range(100):
        yn = ys + 0.1*np.random.normal(size=len(xs))
        pars = mod.guess(yn, xs)
        out  = mod.fit(yn,pars, x=xs)

    t2 = time.time()
    mod = MoffatModelnumba()
    for i in range(100):
        yn = ys + 0.1*np.random.normal(size=len(xs))
        pars = mod.guess(yn, xs)
        out  = mod.fit(yn,pars, x=xs)
    t3 = time.time()

    print 'without numba', t2-t
    print 'numba', t3-t2

Gives me:
File "xxxxxx/.PyCharm50/config/scratches/scratch_35", line 12, in __init__
    super(MoffatModelnumba, self).__init__(moffat_numba, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/lmfit/model.py", line 113, in __init__
    self._parse_params()
File "/usr/local/lib/python2.7/dist-packages/lmfit/model.py", line 159, in _parse_params
    argspec = inspect.getargspec(self.func)
File "/usr/lib/python2.7/inspect.py", line 816, in getargspec
    raise TypeError('{!r} is not a Python function'.format(func))
TypeError: CPUDispatcher(<function moffat_numba at 0x7f0d5203d230>) is not a Python function

I don't know enough about lmfit to judge if there is an easy way around that issue and how much work it is to make the Model class accept numba functions.
However, I think it could be a great enhancement ?

Best,
Julian

Matt Newville

unread,

Feb 28, 2016, 9:44:28 AM2/28/16

to Julian Stürmer, lmfit-py

Hi Julian,

I have essentially no experience with numba. A quick try of your first example script shows that getting good performance from numba can be tricky. Just increasing the size of `xs` from 100 to 1000 makes the numba version slower than the non-numba version for me. I don't know why that is.

In your second example, it seems that numba is confused by the use of inspect.getargspec, from the Python standard library. If a supported use of the standard library causes numba to raise an exception, that seems like a question for the numba developers, though it may be a known limitation.

As you probably understand, the model and objective python functions used in lmfit are typically called (and, in general, where you care about performance most) by C code from scipy.optimize. So, an important question here would be: Can the C code in scipy.optimize call numba-jit-compiled Python code and see a performance benefit? I have no idea.

I'd like to be encouraging and supportive about ways to speed up lmfit, but I'll also be cautious. Lmfit is purely a pure-python modules -- it may use a few "advanced" python features like `inspect` and `ast`, and `lambda` but does not do any dirty tricks. Specifically, the Model class uses inspect.getargspec for its basic functionality of turning a users function into a fitting model. If such features are not supported by some 3rd party library like numba, it would make extra work for us in order to play well with that library. -- in this case, a radically different approach to the Model class that would likely break many uses. Generally, one would have to demonstrate a clear benefit (not just "wouldn't it be cool if it worked?") or a willingness of the other library to work together.

All that said, I'm not opposed to trying any such speed-ups, and I'd encourage you or anyone else with more experience with numba (or pypy or other tools) to work on this.

--Matt

Julian Stürmer

unread,

Feb 29, 2016, 12:10:39 PM2/29/16

to lmfit-py, astro.s...@gmail.com, newv...@cars.uchicago.edu

Hi Matt,

thanks for pointing out that numba isn't faster for larger sizes of 'xs'. It seems to depend on the architecture though - for me it's still faster for a size of 1000, but slower for 10000.

I will keep digging into that when I find the time.
I share your concern about 3rd party dependencies and complications.

In my particular case I am CPU bound right now, and I'm looking for some ways to speed code up without having to change too much of my code (or even change to C++).
Lmfit was incredible helpful for 'prototyping' my application and I'm trying to stick with it.
However, I couldn't get a multiprocessing attempt working (although it works well for 'simple' examples) neither with the multiprocessing lib (due to pickling issues) nor with other multiprocessing libs using dill for pickling.
That was my basic intention to look for other ways to speed up the code.

I'll keep you posted if I discover anything.

Julian

Cagey1

unread,

Feb 29, 2016, 2:00:07 PM2/29/16

to lmfit-py

I know nothing about numba. Curious about your model that is so slow. I have code that uses lmfit parameter class for its features. My data is fit with 5-20 nonlinear parameters but thousands of linear parameters. It requires that I use a variable projection algorithm, which i wrote for python. the code can fit 5 exponential models over a time and wavelength data grid with about 40,000 points. The fitting time on a modern pc is about 2-10 s and 10 times longer for complex cases where we are changing time axis parameters as part of the nonlinear exponential model as well as time constants. So, is your model separable into linear and nonlinear parameters with small numbers of nonlinear paramaters? If so, you need to use the variable projection algorithm, which I eventually will publish on github in a more general case rather than my specific complex femtosecond spectroscopy case, dressed up for non-coding users. The variable projection procedure is set up for matrix math and data grids.
Ken Spears

Matt Newville

unread,

Feb 29, 2016, 3:25:44 PM2/29/16

to Julian Stürmer, lmfit-py

Hi Julian,

On Mon, Feb 29, 2016 at 11:10 AM, Julian Stürmer <astro.s...@gmail.com> wrote:

Hi Matt,

thanks for pointing out that numba isn't faster for larger sizes of 'xs'. It seems to depend on the architecture though - for me it's still faster for a size of 1000, but slower for 10000.

I will keep digging into that when I find the time.
I share your concern about 3rd party dependencies and complications.

Oh, well if numba (or similar) gave an obvious improvement, I'd be OK with requiring a 3rd party library.

In my particular case I am CPU bound right now, and I'm looking for some ways to speed code up without having to change too much of my code (or even change to C++).
Lmfit was incredible helpful for 'prototyping' my application and I'm trying to stick with it.
However, I couldn't get a multiprocessing attempt working (although it works well for 'simple' examples) neither with the multiprocessing lib (due to pickling issues) nor with other multiprocessing libs using dill for pickling.
That was my basic intention to look for other ways to speed up the code.

I think the main issue for gaining performance is that scipy.optimize has C/Fortran code (MINPACK, for example) that calls user-supplied Python code. My guess is that moving the compiled code to be able to use multiple processors simultaneously would be the most obvious way to boost performance (say, calculating each column of the Jacobian -- for each fitting variable that is -- on a different core). Being able to automatically Cythonize or numba-fy or otherwise speed up the users objective function would be nice too. Clearly, there is the potential for big benefits there, but I think that since the objective function can do anything, that will be a bit more of an engineering challenge than parallelizing the components of the Jacobian.

--Matt

Julian Stürmer

unread,

Mar 4, 2016, 1:06:22 PM3/4/16

to lmfit-py

Hi Ken,

my data looks like the attached image. It's from an application in laserphysics - it's a transmission spectrum of two etalons and hyperfine lines of rubidium.
I have a bunch of Moffat functions (blue peaks) 4 parameters each, a Lorentzian peak (green) and 3 times a sum of a gaussian + 6 lorentzians. So my total number of free parameters is about 200.

The challange is that I get a data package like this at a rate of about 4 Hz. This is limited by the hardware, but in case my software could handle 10 Hz, it would be relatively easy to upgrade the hardware to get those spectra at 10Hz or even faster.
Right now my code can handle about 1.5Hz (on a average PC), which is 'ok', but speeding it up would be helpfull in many respects.

I don't quite understand what you mean by 'seperable into linear and nonlinear parameters'. My problem would definetly be well suited for multiprocessing, as I can fit the blue peaks independently from each other, as well as the three rubidium groups.

Reply all

Reply to author

Forward