Fitting to data with error bars

1,499 views
Skip to first unread message

Albin Nilsson

unread,
Oct 27, 2016, 11:25:07 AM10/27/16
to lmfit-py
I'm trying to use lmfit for parameter estimation via emcee. The data I want to fit to is somewhat noisy, and each data point has a symmetric error bar associated with it. 
The data and errors are loaded into two different arrays:

x =  np.array([ float(datum[1]) for datum in data ])
y =   np.array([ float(datum[2]) for datum in data ])
errs =  np.array([ float(datum[3]) for datum in data ])

and the error bar for each x is x[i] +- error[i] (in pseudocode)

Does anyone know how to include the error bars in the code? Any help would be greatly appreciated.

Matt Newville

unread,
Oct 27, 2016, 11:33:37 AM10/27/16
to lmfit-py
Generally speaking, one would want to make some model for the data (is that "y", and is "x" the independent data?)  and then minimize the array
     (model - data) / errs
 
Typically, one would write an objective function that returned such a quantity.    If you're using the Model class,  you would specify a "weights" option to Model.fit(), with a value of 1/errs (checking for divide-by-zero, of course).

--Matt

Albin Nilsson

unread,
Oct 28, 2016, 3:45:58 AM10/28/16
to lmfit-py
Thanks for replying. 

x is some arbitrary x-axis, y is the data. I then have a function mod(x), which gives me the model value. To get the residual I then do y-mod(x). 
Does the weight option take arrays? the errors are different for each point. 
I'm new to lmfit, sorry if this is obvious.

Andrew Nelson

unread,
Oct 28, 2016, 4:49:19 AM10/28/16
to lmfit-py
The documentation at http://lmfit.github.io/lmfit-py/fitting.html#lmfit.minimizer.emcee should be able to answer this question, but we're always on the lookout to improve documentation.

You need to write your objective function something like:

def generative(p):
     v = p.valuesdict()
     return v['a1'] * np.exp(-x / v['t1']) + v['a2'] * np.exp(-(x - 0.1) / v['t2'])

def lnprob(p):
    mod = generative(p)

    # following line is chi2
    mod = (mod - y)/ errs
    mod *= mod

    mod += np.log(2 * np.pi * s**2)

    # return the log-posterior probability
    return -0.5 * np.sum(resid)

lnprob is the objective function you pass to Minimizer. Note that you wouldn't want to minimize that objective function you want to maximize it (float_behaviour='posterior', is_weighted=True). Alternatively you could use the less correct:

def prob(p):
    mod = generative(p)
    resid = (mod - y)/ errs

    return resid

This is the objective function you'd normally use with lmfit, and you can scalar minimize that, as well as use emcee (is_weighted=True).

HTH.

Matt Newville

unread,
Oct 28, 2016, 9:03:49 AM10/28/16
to lmfit-py
On Fri, Oct 28, 2016 at 2:45 AM, Albin Nilsson <kold...@gmail.com> wrote:
Thanks for replying. 

x is some arbitrary x-axis, y is the data. I then have a function mod(x), which gives me the model value. To get the residual I then do y-mod(x). 
Does the weight option take arrays? the errors are different for each point. 

Yes, of course it can be a scalar or array of the same size as the data and model.   That's sort of a feature of how scipy+numpy+python work.
 
I'm new to lmfit, sorry if this is obvious.
On Thursday, 27 October 2016 17:33:37 UTC+2, Matt Newville wrote:


On Thu, Oct 27, 2016 at 10:25 AM, Albin Nilsson <kold...@gmail.com> wrote:
I'm trying to use lmfit for parameter estimation via emcee. The data I want to fit to is somewhat noisy, and each data point has a symmetric error bar associated with it. 
The data and errors are loaded into two different arrays:

x =  np.array([ float(datum[1]) for datum in data ])
y =   np.array([ float(datum[2]) for datum in data ])
errs =  np.array([ float(datum[3]) for datum in data ])

and the error bar for each x is x[i] +- error[i] (in pseudocode)

Does anyone know how to include the error bars in the code? Any help would be greatly appreciated.


Generally speaking, one would want to make some model for the data (is that "y", and is "x" the independent data?)  and then minimize the array
     (model - data) / errs
 
Typically, one would write an objective function that returned such a quantity.    If you're using the Model class,  you would specify a "weights" option to Model.fit(), with a value of 1/errs (checking for divide-by-zero, of course).

--Matt

--
You received this message because you are subscribed to the Google Groups "lmfit-py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lmfit-py+unsubscribe@googlegroups.com.
To post to this group, send email to lmfi...@googlegroups.com.
Visit this group at https://groups.google.com/group/lmfit-py.
To view this discussion on the web visit https://groups.google.com/d/msgid/lmfit-py/150a81bb-5b8b-468b-a866-ea76cf9aae54%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
--Matt Newville <newville at cars.uchicago.edu> 630-252-0431

Albin Nilsson

unread,
Nov 3, 2016, 5:14:34 AM11/3/16
to lmfit-py
Thanks, this was really helpful! :)

Let's say that my model looks like this:
y = kx+m


and I want to marginalise over m, because I don't care about it. Is there a way to do that with lmfit and emcee? I have been looking at examples but none of them really make sense to me. My errors are in a separate array errs in the same units as y.

Thanks again for your help

Andrew Nelson

unread,
Nov 3, 2016, 5:45:26 AM11/3/16
to lmfit-py

Does this help further?

http://dan.iel.fm/emcee/current/user/line/


--
You received this message because you are subscribed to the Google Groups "lmfit-py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lmfit-py+unsubscribe@googlegroups.com.
To post to this group, send email to lmfi...@googlegroups.com.
Visit this group at https://groups.google.com/group/lmfit-py.

Albin Nilsson

unread,
Nov 3, 2016, 6:50:01 AM11/3/16
to lmfit-py
I have read this quite a few times, and if I understand it correctly, the code produces marginalised distributions of all the parameters automatically?

Is there any reason why emcee and corner would give a reasonable numerical value for the parameter I'm trying to estimate, but a terrible (really awful)-looking corner plots? 
The model is very reasonable and looks good when you just plot it against the data points.
 For completeness, see an abridged version of my code below: (mu0 is a nuisance parameter)

My model:
def d_L(x, Om, M):
       
def integrand(xes, Om):
               
return float(1.0/(np.sqrt(Om*(1+xes)**3.0+(1-Om))))
        int_res
= [ integr.quad(integrand, 0, xes, Om)[0]*(1+zed)/M for xes in x ]


       
return int_res


def func_mod(x, M, Om, mu0):
       
return (5*np.log10(d_L(x, Om, M)) + mu0)

The lnprob function is defined as you described in an earlier post, with the error weighting as:
mod = (mod-mus)/errs

Then I run emcee:
mini = lmfit.Minimizer(lnprob, p)
print 'Running MCMC...'
res
= mini.emcee(burn=300, steps=600, thin=10, nwalkers=100, params=p, float_behavior='posterior', is_weighted=True)


The fit report gives me a good value for my 'M' parameter, but the corner plot looks strange; the histograms look like step functions and the ellipses are squares
My plot command is:
quantiles = np.percentile(res.flatchain['M'], [2.28, 15.9, 50, 84.2, 97.7])
fig
= corner.corner(res.flatchain, labels=res.var_names, quantiles=[0.16, 0.5, 0.84], show_titles=True, title_kwargs={"fontsize": 12}, truths=list(res.params.valuesdict().values()))




Sorry about these questions, I am really banging my my head against a wall here.



On Thursday, 27 October 2016 17:25:07 UTC+2, Albin Nilsson wrote:

Andrew Nelson

unread,
Nov 3, 2016, 9:46:02 PM11/3/16
to lmfit-py
I suspect the answer might by in your objective function somewhere. Does it work when you find the max likelihood solution?
If it helps I've reproduced the emcee line fit example, but using lmfit instead: https://gist.github.com/andyfaff/373cc989d3fa2de8260b5b2aa22eb60e

It's hard to follow the abridged set of code in your email. Note, it's easily possible to get caught out here. Your objective function can return chi2, but it it does then you have to use `float_behaviour=chi2`. This tells the emcee method that your objective function is returning chi2. If your objective function returns log-probability, then you need to use `float_behaviour=posterior`. If you get this wrong then nothing will work. This is because log-probability needs to be maximised and chi2 needs to be minimised. Thus, if you set up the Minimizer with a log-probability objective function then you shouldn't use the minimize method.

HTH,
Andrew.


--
You received this message because you are subscribed to the Google Groups "lmfit-py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lmfit-py+unsubscribe@googlegroups.com.
To post to this group, send email to lmfi...@googlegroups.com.
Visit this group at https://groups.google.com/group/lmfit-py.

For more options, visit https://groups.google.com/d/optout.



--
_____________________________________
Dr. Andrew Nelson


_____________________________________

Albin Nilsson

unread,
Nov 4, 2016, 3:54:15 AM11/4/16
to lmfit-py
The fit report that I generate look something like this:

Median of posterior probability distribution
---------------------------------------------
[[Variables]]
    M:     1.06150727 +/- 0.259849 (24.48%) (init= 1)
    Om:    0.30300722 +/- 0.000694 (0.23%) (init= 0.302)
    mu0:   43.2984210 +/- 0.540611 (1.25%) (init= 42.3)
[[Correlations]] (unreported correlations are <  0.100)
    C(M, mu0)                    =  0.992 
Maximum Likelihood Estimation
------------------------------
Parameters([('M', <Parameter 'M', 1.0, bounds=[0:3]>), ('Om', <Parameter 'Om', 0.302, bounds=[0.2:0.4]>), ('mu0', <Parameter 'mu0', 42.3, bounds=[-inf:inf]>)])
M 2 sigma spread 0.444438067702

I realised that I was constraining the parameter space too much, and the plots look better now. What did you mean by 'does it work when you find the maximum likelihood?'?

I have constructed my objective function (the posterior) in the following way: (as you suggested in a previous post)
func_mod is my model.

def generative(p):
        v
= p.valuesdict()

       
return (func_mod(zeds, v['M'], v['Om'], v['mu0']))



def lnprob(p):
        mod
= generative(p)

        mod
= (mod-mus)/errs
        mod
*= mod
        v
= p.valuesdict()


        mod
+= np.log(2*np.pi*errs**2)
       
return -0.5*np.sum(mod)




And then I run the chains by:
mini = lmfit.Minimizer(lnprob, p)
print 'Running MCMC...'

res
= mini.emcee(burn=300, steps=600, thin=10, nwalkers=200, params=p, float_behavior='posterior', is_weighted=True)

From what I can gather from your advise and the docs, this should be correct. Am I right?

Thanks again

To unsubscribe from this group and stop receiving emails from it, send an email to lmfit-py+u...@googlegroups.com.

To post to this group, send email to lmfi...@googlegroups.com.
Visit this group at https://groups.google.com/group/lmfit-py.

Albin Nilsson

unread,
Nov 15, 2016, 10:57:18 AM11/15/16
to lmfit-py
When I initialise Minimizer, do I want to pass the posterior and is_weighted arguments to it or to emcee? In other words, is this correct?:

mini = lmfit.Minimizer(lnprob, p)
print 'Running MCMC...'

res 
= mini.emcee(burn=300, steps=600, thin=10, nwalkers=200,params=p, float_behavior='posterior', is_weighted=True)



On Thursday, 27 October 2016 17:25:07 UTC+2, Albin Nilsson wrote:

Albin Nilsson

unread,
Nov 15, 2016, 11:08:40 AM11/15/16
to lmfit-py
Just found this in the docs, sorry


On Thursday, 27 October 2016 17:25:07 UTC+2, Albin Nilsson wrote:
Reply all
Reply to author
Forward
0 new messages