Advice on using lmfit within a loop

1,013 views
Skip to first unread message

Vital Fernández

unread,
Nov 13, 2016, 1:40:02 AM11/13/16
to lmfit-py
I wonder if anyone could give me some insight on this behavior which I do not fully understand:

I want to do "bootstrap" to fit a gaussian model to my data:

from numpy import sqrt, pi, exp, linspace, random, empty, array
from scipy.optimize import curve_fit
from lmfit import Model
from timeit import default_timer as timer

def gaussian(x, amp, cen, wid):
   
return amp * exp(-(x-cen)**2 /wid)

init_vals
= [1, 0, 1]
x
= linspace(-10,10)
n_points
= len(x)
gaussian_true
= gaussian(x, 2.33, 0.21, 1.51)
gmod
= Model(gaussian)


print '--Single fit'
start
= timer()
best_vals
, covar = curve_fit(gaussian, x, gaussian_true + random.normal(0, 0.2, n_points), p0=init_vals)
end = timer()
print 'curve fit', best_vals, ' time ', (end - start)
start
= timer()
result
= gmod.fit(gaussian_true + random.normal(0, 0.2, n_points), x=x, amp=1, cen=0, wid=1)
end = timer()
print 'lmfit', array(result.params.valuesdict().values()), (end - start)


print '--Bootstrap'
results_matrix_curvefit
= empty([3,1000])
results_matrix_lmfit
= empty([3,1000])
start
= timer()
for i in range(1000):    
    best_vals
, covar = curve_fit(gaussian, x, gaussian_true + random.normal(0, 0.2, n_points), p0=init_vals)
    results_matrix_curvefit
[:,i] = best_vals
end = timer()
print 'curve fit', results_matrix_curvefit.mean(1), ' time ', (end - start)


start
= timer()
for i in range(1000):
    result
= gmod.fit(gaussian_true + random.normal(0, 0.2, n_points), x=x, amp=1, cen=0, wid=1)
    results_matrix_lmfit
[:,i] = array(result.params.valuesdict().values())
end = timer()
print 'lmfit', results_matrix_lmfit.mean(1), (end - start)

which gives

--Single fit
curve fit
[ 2.20907852  0.21520868  1.3615026 ]  time  0.0145289897919
lmfit
[ 2.45153259  0.30430127  1.28085118]  time  0.00659894943237
--Bootstrap
curve fit
[ 2.34031556  0.20673496  1.51793454]  time  0.713760137558
lmfit
[ 2.33999564  0.21131336  1.51693473]  time  4.11775302887

As it can learned from the timers, the lmfit loop, the time  interval increases by the number of iterations, 3 orders (which makes sense!). However, in the case of curve_fit it is just 1 order. 

Why is that? Can this time be reduced in lmfit too?

Thanks



Matt Newville

unread,
Nov 13, 2016, 1:50:15 PM11/13/16
to lmfit-py
Hi Vital,



I think there are a couple of points to be made here, not in any particular order.

A. Do you *really* want to do what you call a "bootstrap" fit:  Do 1000 fits of a gaussian model to data plus  different random arrays?  Is that really the goal or is this just some attempt to benchmark how long it takes to do a fit?   Because, if runtime is really important, the task really that simple, and you're using so few features of lmfit.Model, then you should probably be using scipy.optimize, or even C/Fortran, and neither lmfit nor curve_fit.

B. Lmfit, and especially Model are not designed to give runtimes that are as fast as possible. They're designed for ease of use, flexibility, and time of the programmer/analyst.  So, it's *possible* that runtimes could be sped up, but that doesn't necessarily mean they should.

C. Benchmarking is tricky.   timeit.default_timer is time.time on POSIX systems and time.clock on Windows -- they're not really very accurate measures of how many CPU cycles were used.  Also, related to Point A, benchmarking the wrong problem is often counter-productive.

D. One could easily spend 3 hours trying to save that 3 seconds.

What surprises me most is that your first run of curve_fit() vs Model.fit() shows curve_fit() to be ~2x slower.  That doesn't seem quite right to me.  Model.fit() does a lot more work, and should be slower.   You might try running lmfit.Model.fit() *before* curve_fit().  I'd expect the results to be quite different.

It's probably a clue that 1000 runs of a curve_fit() takes significantly less time that 1000x the time to run curve_fit() once.  That doesn't make a lot of sense -- I'm pretty sure curve_fit() isn't caching anything.   So, why is that happening?  I don't know.

Cheers,

--Matt Newville

Andrew Nelson

unread,
Nov 13, 2016, 4:40:59 PM11/13/16
to lmfit-py
Bootstrapping in this manner is a way of estimating uncertainties on the model parameters. Vital, if I were you I'd be looking into the `emcee` method of the `lmfit.Minimizer` object. It's a way of exploring the posterior distribution for the model parameters, but I'm not sure how much faster/slower it might be.

--
You received this message because you are subscribed to the Google Groups "lmfit-py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lmfit-py+unsubscribe@googlegroups.com.
To post to this group, send email to lmfi...@googlegroups.com.
Visit this group at https://groups.google.com/group/lmfit-py.
To view this discussion on the web visit https://groups.google.com/d/msgid/lmfit-py/5a11fd01-ed84-40be-ba46-0302f0ea0fb9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
_____________________________________
Dr. Andrew Nelson


_____________________________________

Vital Fernández

unread,
Nov 13, 2016, 7:30:51 PM11/13/16
to lmfit-py
Thank you very much of your reply.

Yes this is a simple example: I updated my code to measure the emission lines in spectra (which now is much cleaner thanks to lmfit) and everything else seems to work as well as before but with half the lines. However, I notice that when I do this bootstrapping to quantify the uncertainties on the Gaussian curve parameters it takes between 7-10 seconds to perform each measurement. 

For me the weird thing, as you pointed out, is that curve_fit() is slower than lmfit for one operation but only 1 order for 1000 iterations... 

I had been reading this issue on github: https://github.com/lmfit/lmfit-py/issues/259 and I wonder if there was another way to declare the Model in lmfit which provided a faster result but now I think that is not the issue

I will try to see if I can find a way to save those 3 seconds

Thanks

Vital Fernández

unread,
Nov 13, 2016, 7:33:10 PM11/13/16
to lmfit-py
Thank you very much for your reply.

That is exactly my purpose. I have work with pymc  and the times are around that but I will check how emcee works in lmfit.

Thanks

To unsubscribe from this group and stop receiving emails from it, send an email to lmfit-py+u...@googlegroups.com.

To post to this group, send email to lmfi...@googlegroups.com.
Visit this group at https://groups.google.com/group/lmfit-py.
To view this discussion on the web visit https://groups.google.com/d/msgid/lmfit-py/5a11fd01-ed84-40be-ba46-0302f0ea0fb9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Vital Fernández

unread,
Nov 20, 2016, 10:25:32 PM11/20/16
to lmfit-py
Hello Andrew. I have been trying to work on your suggestion:

I have been looking at this  "Model selection using lmfit and emcee" ipynb (https://github.com/lmfit/lmfit-py/commit/89f0860672b0fcf7356506efd0b917cbc29cc794) (which I think it is actually yours). However I have problems to understand the code structure to make a simpler version. Do you think you could give some insight on how should we declare the observe data and the log-posterior probability function?


from numpy import exp, linspace, random, sum, log, pi
from lmfit import Parameters, Minimizer

from timeit import default_timer as timer


def gaussian_curve(A, mu, sigma, x):          
   
return A * exp(-(x-mu)*(x-mu)/(2 * sigma * sigma))


def gaussian_residual(params, x, y, err):
   
return (gaussian_curve(params['A'], params['mu'], params['sigma'], x) - y) / err


def lnprob(p, x, y, err):
    resid
= gaussian_residual(params, x, y, err)
   
return -0.5 * sum((resid)**2 + log(2 * pi * err**2))



x
= linspace(-10,10)
n_points
= len(x)



gaussian_true
= gaussian_curve(2.33, 0.21, 1.51, x)
err
= random.normal(0, 0.2, n_points)
y_obs
= gaussian_true + err


params = Parameters()
params.add('A', value = 1)
params.add('mu', value = 0)
params.add('sigma', value = 1)
       
mini
= Minimizer(lnprob, params, args = (x, y_obs, err))


out = mini.emcee(steps=1000, params=params)

Thanks a lot again

 

On Sunday, 13 November 2016 15:40:59 UTC-6, Andrew Nelson wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to lmfit-py+u...@googlegroups.com.

To post to this group, send email to lmfi...@googlegroups.com.
Visit this group at https://groups.google.com/group/lmfit-py.
To view this discussion on the web visit https://groups.google.com/d/msgid/lmfit-py/5a11fd01-ed84-40be-ba46-0302f0ea0fb9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Andrew Nelson

unread,
Nov 22, 2016, 12:29:51 AM11/22/16
to lmfit-py
There's a few issues with your code.
1) when weighting by data uncertainty, err, all the uncertainties should be positive. In this case I think the uncertainty you weight by should be 0.2. This is because the jitter you add to all the points is taken from a normal distribution with standard deviation 0.2.

2) to set up the minimizer you need to use `fcn_args`, not `args`.


To unsubscribe from this group and stop receiving emails from it, send an email to lmfit-py+unsubscribe@googlegroups.com.

To post to this group, send email to lmfi...@googlegroups.com.
Visit this group at https://groups.google.com/group/lmfit-py.

For more options, visit https://groups.google.com/d/optout.

Andrew Nelson

unread,
Nov 22, 2016, 2:24:15 AM11/22/16
to lmfit-py
Also, there's a major mistake in your lnprob function. Hint: it's to do with the parameters.

Andrew Nelson

unread,
Nov 22, 2016, 2:33:25 AM11/22/16
to lmfit-py

A mistake that wasted an hour of my time I might add.

Andrew Nelson

unread,
Nov 22, 2016, 4:32:37 AM11/22/16
to lmfit-py
# coding: utf-8

from lmfit import Parameters, Minimizer
import numpy as np


def gaussian_curve(A, mu, sigma, x):
    return A * np.exp(-1* (x-mu)*(x-mu)/(2 * sigma * sigma))

def gaussian_residual(p, x, y, dy):
    return (gaussian_curve(p['A'], p['mu'], p['sigma'], x) - y) / dy

def lnprob(p, x, y, dy):
    resid = gaussian_residual(p, x, y, dy)
    return -0.5 * np.sum(resid**2 + np.log(2 * np.pi * dy**2))

x = np.linspace(-10,10)
n_points = len(x) 

gaussian_true = gaussian_curve(2.33, 0.21, 1.51, x)
err = np.random.normal(0, 0.2, n_points)
y_obs = gaussian_true + err
weights=np.ones_like(err) * 0.2

params = Parameters()
params.add('A', value = 4, min=0.1, max=12)
params.add('mu', value = 0.7, min=0.1, max=10)
params.add('sigma', value =3, min=0.1, max=10)


guess = Minimizer(gaussian_residual, params, fcn_args=(x, y_obs, weights)).minimize()
mini = Minimizer(lnprob, guess.params, fcn_args=(x, y_obs, weights))
print(guess.params)

r=mini.emcee(steps=400, burn=200,thin=10)

Vital Fernández

unread,
Nov 28, 2016, 5:32:45 PM11/28/16
to lmfit-py
Thank you very much for all the advice. 

I am very sorry it wasted your time





--
_____________________________________
Dr. Andrew Nelson


_____________________________________



--
_____________________________________
Dr. Andrew Nelson


_____________________________________
Reply all
Reply to author
Forward
0 new messages