[SciPy-User] Unexpected covariance matrix from scipy.optimize.curve_fit

30 views
Skip to first unread message

Christoph Deil

unread,
Aug 30, 2011, 4:15:17 PM8/30/11
to SciPy Users List
I noticed that scipy.optimize.curve_fit returns parameter errors that don't scale with sigma, the standard deviation of ydata, as I expected.

Here is a code snippet to illustrate my point, which fits a straight line to five data points:
import numpy as np
from scipy.optimize import curve_fit
x = np.arange(5)
y = np.array([1, -2, 1, -2, 1])
sigma = np.array([12, 12, 1])
def f(x, a, b):
    return a + b * x
popt, pcov = curve_fit(f, x, y, p0=(0.42, 0.42), sigma=sigma)
perr = np.sqrt(pcov.diagonal())
print('*** sigma = {0} ***'.format(sigma))
print('popt: {0}'.format(popt))
print('perr: {0}'.format(perr))

I get the following result:
*** sigma = [1 2 1 2 1] ***
popt: [  5.71428536e-01   1.19956213e-08]
perr: [ 0.93867933  0.40391117]

Increasing sigma by a factor of 10,
sigma = 10 * np.array([12, 12, 1])
I get the following result:
*** sigma = [10 20 10 20 10] ***
popt: [  5.71428580e-01  -2.27625699e-09]
perr: [ 0.93895295  0.37079075]

The best-fit values stayed the same as expected.
But the error on the slope b decreased by 8% (the error on the offset a didn't change much)
I would have expected fit parameter errors to increase with increasing errors on the data!?
Is this a bug?

Looking at the source code I see that scipy.optimize.curve_fit multiplies the pcov obtained from scipy.optimize.leastsq by a factor s_sq:
    if (len(ydata) > len(p0)) and pcov is not None:
        s_sq = (func(popt, *args)**2).sum()/(len(ydata)-len(p0))
        pcov = pcov * s_sq

If so is it possible to add an explanation to
http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html
that pcov is multiplied with this s_sq factor and why that will give correct errors?

After I noticed this issue I saw that this s_sq factor is mentioned in the cov_x return parameter description of leastsq,
but I think it should be explained in curve_fit where it is applied, maybe leaving a reference in the cov_x leastsq description.

Also it would be nice to mention the full_output option in the curve_fit docu, I only realized after looking at the source code that this was possible.

Christoph

josef...@gmail.com

unread,
Aug 30, 2011, 5:25:21 PM8/30/11
to SciPy Users List

No bug in the formulas. I tested all of them when curve_fit was added.

However in your example the numerical cov lacks quite a bit of
precision. Trying your example with different starting values, I get a
0.05 difference in your perr (std of parameter estimates).

Trying smaller xtol and ftol doesn't change anything. (?)

Since it's linear

>>> import scikits.statsmodels.api as sm
>>> x = np.arange(5.)
>>> y = np.array([1, -2, 1, -2, 1.])
>>> sigma = np.array([1, 2, 1, 2, 1.])
>>> res = sm.WLS(y, sm.add_constant(x, prepend=True), weights=1./sigma**2).fit()
>>> res.params
array([ 5.71428571e-01, 1.11022302e-16])
>>> res.bse
array([ 0.98609784, 0.38892223])

>>> res = sm.WLS(y, sm.add_constant(x, prepend=True), weights=1./(sigma*10)**2).fit()
>>> res.params
array([ 5.71428571e-01, 1.94289029e-16])
>>> res.bse
array([ 0.98609784, 0.38892223])

rescaling doesn't change parameter estimates nor perr

Josef


> Looking at the source code I see that scipy.optimize.curve_fit multiplies
> the pcov obtained from scipy.optimize.leastsq by a factor s_sq:
> https://github.com/scipy/scipy/blob/master/scipy/optimize/minpack.py#L438
>
>     if (len(ydata) > len(p0)) and pcov is not None:
>         s_sq = (func(popt, *args)**2).sum()/(len(ydata)-len(p0))
>         pcov = pcov * s_sq
>
> If so is it possible to add an explanation to
> http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html
> that pcov is multiplied with this s_sq factor and why that will give correct
> errors?
> After I noticed this issue I saw that this s_sq factor is mentioned in the
> cov_x return parameter description of leastsq,
> but I think it should be explained in curve_fit where it is applied, maybe
> leaving a reference in the cov_x leastsq description.
>
> Also it would be nice to mention the full_output option in the curve_fit
> docu, I only realized after looking at the source code that this was
> possible.
> Christoph

> _______________________________________________
> SciPy-User mailing list
> SciPy...@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
_______________________________________________
SciPy-User mailing list
SciPy...@scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user

Charles R Harris

unread,
Aug 30, 2011, 11:19:36 PM8/30/11
to SciPy Users List

Five points, minus two parameters, doesn't give you much accuracy in estimating the variance, look at the \Chi^2 distribution for three degrees of freedom. Generally, you would like a few hundred points for this sort of thing.

Note that the leastsq documentation about the cov is incorrect, it needs to be multiplied by the variance fo the residuals, not the standard deviation.

Not to say that there isn't a bug here, just that the evidence is thin.

Chuck

Christoph Deil

unread,
Aug 31, 2011, 11:09:20 AM8/31/11
to SciPy Users List
Making ftol = 1e-15 very small I get a different wrong result:
popt: [  5.71428580e-01  -2.27625699e-09]
perr: [ 0.92582011  0.59868281]

What do I have to do to get a correct answer (say to 5 significant digits) from curve_fit for this simple example?


Since it's linear

import scikits.statsmodels.api as sm
x = np.arange(5.)
y = np.array([1, -2, 1, -2, 1.])
sigma = np.array([1,  2, 1,  2, 1.])
res = sm.WLS(y, sm.add_constant(x, prepend=True), weights=1./sigma**2).fit()
res.params
array([  5.71428571e-01,   1.11022302e-16])
res.bse
array([ 0.98609784,  0.38892223])

res = sm.WLS(y, sm.add_constant(x, prepend=True), weights=1./(sigma*10)**2).fit()
res.params
array([  5.71428571e-01,   1.94289029e-16])
res.bse
array([ 0.98609784,  0.38892223])

rescaling doesn't change parameter estimates nor perr

This is what I don't understand.
Why don't the parameter estimate errors increase with increasing errors sigma on the data points?
If I have less precise measurements, the model parameters should be less constrained?! 

I was using MINUIT before I learned Scipy and the error definition for a chi2 fit given in the MINUIT User Guide
as well as the example results here
don't mention the factor s_sq that is used in curve_fit to scale pcov.

Is the error definition in the MINUIT manual wrong?
Can you point me to a web resource that explains why the s_sq factor needs to be applied to the covariance matrix?


Josef



PS: I've attached a script to fit the two examples using statsmodels, scipy and minuit (applying the s_sq factor myself).

Here are the results I get (who's right for the first example? why does statsmodels only return on parameter value and error?):

    x = np.array([1  , 2  , 3  , 4  ])
    y = np.array([1.1, 2.1, 2.4, 4.3])
    sigma = np.array([0.1, 0.1, 0.2, 0.1])

statsmodels.api.WLS
popt: [ 1.04516129]
perr: [ 0.0467711]
scipy.optimize.curve_fit
popt: [  8.53964011e-08   1.04516128e+00]
perr: [ 0.27452122  0.09784324]
minuit
popt: [-4.851674617611934e-14, 1.0451612903225629]
perr: [ 0.33828315  0.12647671]

    x = np.arange(5)
    y = np.array([1, -2, 1, -2, 1])
    sigma = 10 * np.array([12, 12, 1])

statsmodels.api.WLS
popt: [  5.71428571e-01   7.63278329e-17]
perr: [ 0.98609784  0.38892223]
scipy.optimize.curve_fit
popt: [  5.71428662e-01  -8.73679511e-08]
perr: [ 0.97804034  0.3818681 ]
minuit
popt: [0.5714285714294132, 2.1449508835758024e-13]
perr: [ 0.98609784  0.38892223]

chi2_example.py

josef...@gmail.com

unread,
Aug 31, 2011, 12:10:52 PM8/31/11
to SciPy Users List

It's standard text book information, but Wikipedia seems to be lacking
a bit in this.

for the linear case
http://en.wikipedia.org/wiki/Ordinary_least_squares#Assuming_normality

cov_params = sigma^2 (X'X)^{-1}

for the non-linear case with leastsq, X is replaced by Jacobian,
otherwise everything is the same.

However, in your minuit links I saw only the Hessian mentioned (from
very fast skimming the pages)

With maximum likelihood, the inverse Hessian is the complete
covariance matrix, no additional multiplication is necessary.

Essentially, these are implementation details depending on how the
estimation is calculated, and there are various ways of numerically
approximating the Hessian.
That's why this is described for optimize.leastsq (incorrectly as
Chuck pointed out) and but not in optimize.curve_fit.

With leastsquares are maximum likelihood, rescaling both y and
f(x,params) has no effect on the parameter estimates, it's just like
changing units of y, meters instead of centimeters.

I guess scipy.odr would work differently, since it is splitting up the
errors between y and x's, but I never looked at the details.


>
> Josef
>
>
>
> PS: I've attached a script to fit the two examples using statsmodels, scipy
> and minuit (applying the s_sq factor myself).
> Here are the results I get (who's right for the first example? why does
> statsmodels only return on parameter value and error?):
>     """Example from
> http://code.google.com/p/pyminuit/wiki/GettingStartedGuide"""
>     x = np.array([1  , 2  , 3  , 4  ])
>     y = np.array([1.1, 2.1, 2.4, 4.3])
>     sigma = np.array([0.1, 0.1, 0.2, 0.1])
> statsmodels.api.WLS
> popt: [ 1.04516129]
> perr: [ 0.0467711]
> scipy.optimize.curve_fit
> popt: [  8.53964011e-08   1.04516128e+00]
> perr: [ 0.27452122  0.09784324]

that's what I get with example 1 when I run your script,
I don't know why you have one params in your case
(full_output threw an exception in curve_fit with scipy.__version__ '0.9.0'

statsmodels.api.WLS
popt: [ -6.66133815e-16 1.04516129e+00]
perr: [ 0.33828314 0.12647671]


scipy.optimize.curve_fit
popt: [ 8.53964011e-08 1.04516128e+00]
perr: [ 0.27452122 0.09784324]


> minuit
> popt: [-4.851674617611934e-14, 1.0451612903225629]
> perr: [ 0.33828315  0.12647671]

statsmodels and minuit agree pretty well

>     """Example from
> http://mail.scipy.org/pipermail/scipy-user/2011-August/030412.html"""
>     x = np.arange(5)
>     y = np.array([1, -2, 1, -2, 1])
>     sigma = 10 * np.array([1,  2, 1,  2, 1])
> statsmodels.api.WLS
> popt: [  5.71428571e-01   7.63278329e-17]
> perr: [ 0.98609784  0.38892223]
> scipy.optimize.curve_fit
> popt: [  5.71428662e-01  -8.73679511e-08]
> perr: [ 0.97804034  0.3818681 ]
> minuit
> popt: [0.5714285714294132, 2.1449508835758024e-13]
> perr: [ 0.98609784  0.38892223]

statsmodels and minuit agree,

my guess is that the jacobian calculation of leastsq (curve_fit) is
not very good in these examples. Maybe trying Dfun or the other
options, epsfcn, will help.

I was trying to see whether I get better results calculation the
numerical derivatives in a different way, but had to spend the time
fixing bugs.
(NonlinearLS didn't work correctly with weights.)

Josef

josef...@gmail.com

unread,
Aug 31, 2011, 9:45:14 PM8/31/11
to SciPy Users List

statsmodels.api.WLS
popt: [ -4.90926744e-16 1.04516129e+00]
perr: [ 0.33828314 0.12647671]
statsmodels NonlinearLS
popt: [ -3.92166386e-08 1.04516130e+00]
perr: [ 0.33828314 0.12647671]


finally, I got some bugs out of the weights handling, but still not fully tested

def run_nonlinearls():
from scikits.statsmodels.miscmodels.nonlinls import NonlinearLS

class Myfunc(NonlinearLS):

def _predict(self, params):
x = self.exog
a, b = params
return a + b*x

mod = Myfunc(y, x, sigma=sigma**2)
res = mod.fit(start_value=(0.042, 0.42))
print ('statsmodels NonlinearLS')
print('popt: {0}'.format(res.params))
print('perr: {0}'.format(res.bse))

The basics is the same as curve_fit using leastsq, but it uses complex
derivatives which are usually numerically very good.

So it looks like the problems with curve_fit in your example are only
in the numerically derivatives that leastsq is using for the Jacobian.

If leastsq is using only forward differences, then it might be better
to calculate the final Jacobian with centered differences. just a
guess.


>
> statsmodels and minuit agree pretty well
>
>>     """Example from
>> http://mail.scipy.org/pipermail/scipy-user/2011-August/030412.html"""
>>     x = np.arange(5)
>>     y = np.array([1, -2, 1, -2, 1])
>>     sigma = 10 * np.array([1,  2, 1,  2, 1])
>> statsmodels.api.WLS
>> popt: [  5.71428571e-01   7.63278329e-17]
>> perr: [ 0.98609784  0.38892223]
>> scipy.optimize.curve_fit
>> popt: [  5.71428662e-01  -8.73679511e-08]
>> perr: [ 0.97804034  0.3818681 ]
>> minuit
>> popt: [0.5714285714294132, 2.1449508835758024e-13]
>> perr: [ 0.98609784  0.38892223]

statsmodels.api.WLS
popt: [ 5.71428571e-01 1.94289029e-16]
perr: [ 0.98609784 0.38892223]
statsmodels NonlinearLS
popt: [ 5.71428387e-01 8.45750929e-08]
perr: [ 0.98609784 0.38892223]

Josef

Christoph Deil

unread,
Sep 1, 2011, 6:04:21 AM9/1/11
to SciPy Users List

OK, now I understand, thanks for your explanations.
Roughly speaking, scipy.optimize.curve_fit applies a factor s_sq = chi^2 / ndf to the covariance matrix to account for possibly incorrect overall scale in the y errors "sigma" of the data.

I had simply not seen this factor being applied to chi^2 fits before. E.g. in many physics and astronomy papers, parameter errors from chi^2 fit results are reported without this factor. Also the manual of the fitting package used by most physicists (MINUIT) as well as the statistics textbook I use (Cowan -- Statistical data analysis) don't mention it.

May I therefore suggest to explicitly mention this factor in the scipy.optimize.curve_fit docstring to avoid confusion?


>>
>>
>>>
>>> Josef
>>>
>>>
>>>
>>> PS: I've attached a script to fit the two examples using statsmodels, scipy
>>> and minuit (applying the s_sq factor myself).
>>> Here are the results I get (who's right for the first example? why does
>>> statsmodels only return on parameter value and error?):
>>> """Example from
>>> http://code.google.com/p/pyminuit/wiki/GettingStartedGuide"""
>>> x = np.array([1 , 2 , 3 , 4 ])
>>> y = np.array([1.1, 2.1, 2.4, 4.3])
>>> sigma = np.array([0.1, 0.1, 0.2, 0.1])
>>> statsmodels.api.WLS
>>> popt: [ 1.04516129]
>>> perr: [ 0.0467711]
>>> scipy.optimize.curve_fit
>>> popt: [ 8.53964011e-08 1.04516128e+00]
>>> perr: [ 0.27452122 0.09784324]
>>
>> that's what I get with example 1 when I run your script,
>> I don't know why you have one params in your case

I'll file a statsmodels issue on github.

>> (full_output threw an exception in curve_fit with scipy.__version__ '0.9.0'

It's there in HEAD.

I'm looking forward to using NonlinearLS once it makes it's way in master.

Charles R Harris

unread,
Sep 1, 2011, 8:18:12 AM9/1/11
to SciPy Users List

Note that most texts will tell you that \sigma^2 should be estimated independently of the data, i.e., from the known precision of the measurements and so on. That is because using the residuals to estimate \sigma^2 is not all that accurate, especially for small data sets. In practice, that recommendation is commonly ignored, but if you do use the residuals you should keep in mind the potential inaccuracy of the estimate.

I think curve_fit should probably return the scaled covariance and the estimate of \sigma^2 separately so that folks can follow the recommended practice if they have decent apriori knowledge of the measurement errors.

<snip>

Chuck

Reply all
Reply to author
Forward
0 new messages