minimize with covariance matrix

1,194 views
Skip to first unread message

ANA ISABEL

unread,
Apr 10, 2016, 4:00:34 PM4/10/16
to lmfit-py
Hi,

I want to do a fir to a function and I have a covariance matrix. Usually I minimize the chi^2 which is (data-model)**2 cov^-1
I think I know how to do it with an error, but not with a covariance matrix. This is what i've done. Should I square the error or the covariance matrix. I do not know exactly if minimizing this is the same as minimizing the chi2.

def residual(pars, data, x, error, icov):
        a = pars['a'].value
        b = pars['b'].value
        c = pars['c'].value
        amp = pars['amp'].value
        thb = pars['thb'].value
        sigma = pars['sigma'].value
        background=a+b*pow(x,-c)
        BAO=amp*np.exp(-(x-thb)*(x-thb)/(2*sigma*sigma))
        model=background+BAO
        resid = model
        if data is not None:
              resid = model - data
        if error is not None:
              resid = resid / error
        if cov is not None:
              resid = resid*icov
        return resid

Matt Newville

unread,
Apr 10, 2016, 5:54:45 PM4/10/16
to ANA ISABEL, lmfit-py
Hi Ana

Hm,  I do not know how to use the full covariance matrix in calculating the fit residual.     The off-diagonal components of the covariance describes how parameters interact, but (AFAIK) do not alter the residual itself.   But, perhaps that's too simplistic.... I hope someone else here might be able to shed some light on this.

If you do know the covariance well, I would imagine you can probably estimate the partial derivatives analytically.  If that is the case, perhaps supplying such a function would be helpful.

--Matt

ANA ISABEL

unread,
Apr 11, 2016, 4:59:05 AM4/11/16
to lmfit-py, niti...@yahoo.es, newv...@cars.uchicago.edu
Hi,

When I minimize I do:
result = minimize(residual, pars, args=(y, x, None, cov))
What does exactly minimize do?
If it's just minimizing residual function then the answer is easy because I just have to minimize the chi2:

def residual(pars, data, x, error, icov):
        a = pars['a'].value
        b = pars['b'].value
        c = pars['c'].value
        amp = pars['amp'].value
        thb = pars['thb'].value
        sigma = pars['sigma'].value
        background=a+b*pow(x,-c)
        BAO=amp*np.exp(-(x-thb)*(x-thb)/(2*sigma*sigma))
        model=background+BAO
        resid = model
        if data is not None:
              resid = np.dot(model - data,model-data)
        if error is not None:
              resid = resid / error**2
        if cov is not None:
              resid = np.dot(model - data,np.dot(icov,model-data))
        return resid

What I do not have clear is if then the chi**2 calculated by minimize would be fine, but I can calculate it myself anyway.
How do you calculate the error of the parameters?

Thanks,

Ana

Matt Newville

unread,
Apr 11, 2016, 8:47:47 AM4/11/16
to ANA ISABEL, lmfit-py
Hi Ana,


On Mon, Apr 11, 2016 at 3:59 AM, 'ANA ISABEL' via lmfit-py <lmfi...@googlegroups.com> wrote:
Hi,

When I minimize I do:
result = minimize(residual, pars, args=(y, x, None, cov))
What does exactly minimize do?

Sorry that's either to vague or too big a topic.  You've read the documentation, right?  Was something there unclear?  
 
If it's just minimizing residual function then the answer is easy because I just have to minimize the chi2:
 
Well, in some sense it does just minimize the residual function.  It alters the Parameter values until the residual returned by the objective function has the smallest sum of squares.  If that's easy for you,  you may not need lmfit. 
 

def residual(pars, data, x, error, icov):
        a = pars['a'].value
        b = pars['b'].value
        c = pars['c'].value
        amp = pars['amp'].value
        thb = pars['thb'].value
        sigma = pars['sigma'].value
        background=a+b*pow(x,-c)
        BAO=amp*np.exp(-(x-thb)*(x-thb)/(2*sigma*sigma))
        model=background+BAO
        resid = model
        if data is not None:
              resid = np.dot(model - data,model-data)
        if error is not None:
              resid = resid / error**2
        if cov is not None:
              resid = np.dot(model - data,np.dot(icov,model-data))
        return resid

What I do not have clear is if then the chi**2 calculated by minimize would be fine, but I can calculate it myself anyway.

Sorry, I am not sure I understand this.  If you're asking what your objective function should return, it should normally just return `data-model` or `(data-model)/error` -- the least-square algorithm does the rest.   Again, the documentation and examples might help.
 
How do you calculate the error of the parameters?

Well, as the algorithm is exploring how the values affect the fit, it is building up the jacobian (related to the hessian) matrix, which can be used to
calculate the covariance matrix with uncertainties and correlations between parameter values.

But, you started this conversation saying you had the covariance matrix and asked how to use it with the residual.     So, I'm very confused by this question.   How do you determine your covariance matrix?

--Matt

ANA ISABEL

unread,
Apr 11, 2016, 9:25:11 AM4/11/16
to lmfit-py
Hi,

I have a covariance matrix because the points in my function are correlated, I can calculate this theoretically or with simulations or doing jack-knife.
I'll look for another way to minimize my chi2.

Thank you.

Ana.

Matt Newville

unread,
Apr 11, 2016, 12:48:22 PM4/11/16
to ANA ISABEL, lmfit-py
On Mon, Apr 11, 2016 at 8:25 AM, 'ANA ISABEL' via lmfit-py <lmfi...@googlegroups.com> wrote:
Hi,

I have a covariance matrix because the points in my function are correlated, I can calculate this theoretically or with simulations or doing jack-knife.

I'm having a hard time understanding how you have the covariance without the solution.   I think of jackknife as comparing solutions and variances of sub-sampled data, but again, that implies that you somehow get a solution in addition to the variance.   Perhaps we're using different terminology?

Sorry for not understanding what you're trying to do,   I do not know of a way to include covariance information in the fit residual.  I think of the covariance matrix as part of the *result* of a non-linear least-squares optimization, not part of the input.  But, I'm certainly willing to say there is much I do not know.

--Matt

ANA ISABEL

unread,
Apr 11, 2016, 4:29:13 PM4/11/16
to lmfit-py
Hi,

I guess we are talking about different covariance matrix. If I have 30 points in my data then I have a 30*30 covariance matrix which tels you the correlation between those points. I guess the one you think of is the one that correlates the parameters of the fit or something like that.

Thanks,

Ana.

On Sunday, April 10, 2016 at 9:00:34 PM UTC+1, ANA ISABEL wrote:

Stefano Renzo Garcia Castañeda

unread,
May 30, 2018, 2:16:37 PM5/30/18
to lmfit-py
Hi Matt,

I just found lmfit and I am very tempted to use it. However, I have the same need as Ana. My data has errors which are correlated. That is, the errors of my data are not independent. For instance, if the error of the parameter[1] happens to increase, the error of the parameter[3] will do as well (in a very particular way). And similar for the rest of the parameters.

This happens when the data to be fitted is not independent but it is correlated. This should affect the fit process somehow.

The way in which the data is correlated is captured in the correlation matrix of the data (different from the correlation matrix obtained as result of fitting model, which will depend on the particular fitted model).

The function curve_fit from SciPy allow us to account for this by mean of the sigma parameter:


sigma : None or M-length sequence or MxM array, optional

Determines the uncertainty in ydata. If we define residuals as r = ydata - f(xdata, *popt), then the interpretation of sigma depends on its number of dimensions:

  • A 1-d sigma should contain values of standard deviations of errors in ydata. In this case, the optimized function is chisq = sum((r / sigma) ** 2).

  • A 2-d sigma should contain the covariance matrix of errors in ydata. In this case, the optimized function is chisq = r.T @ inv(sigma) @ r.

    New in version 0.19.

None (default) is equivalent of 1-d sigma filled with ones.





That is, instead of giving a simple 1D array containing the uncertainty of each data value, we can give a 2D array (the covariance matrix of the data. Not the covariance matrix of the fit) to perform the fit.

My question: Is something equivalent possible in mlfit?

Please let me know,

Best!

Stefano

Matt Newville

unread,
May 30, 2018, 10:30:44 PM5/30/18
to lmfit-py
Dear Stefano,

On Wed, May 30, 2018 at 1:16 PM Stefano Renzo Garcia Castañeda <stefa...@gmail.com> wrote:
Hi Matt,

I just found lmfit and I am very tempted to use it. However, I have the same need as Ana. My data has errors which are correlated. That is, the errors of my data are not independent. For instance, if the error of the parameter[1] happens to increase, the error of the parameter[3] will do as well (in a very particular way). And similar for the rest of the parameters.

As you no doubt know, there is ​pleny of room for confusion on this topic, as terminology overlaps.   It's hard for us to know how the expertise of any person asking a question here is -- we know only what you wrote.  Also, we're likely going to assume that a great many things are working.

That is all to say that precision and accuracy in terminology is extremely important here.  You first say (more or less) that your *data* has uncertainties that are correlated.  Then you say that "if the uncertainty in parameter[1] increases, the error in parameter3 will increase".   One of us is confused.

Do you mean if the **VALUE* of Parameter1 changes, so will the VALUE of Parameter3?  That implies that **Parameters** are correlated, which is not at all the same thing as uncertainties in the data being correlated.

Or you you actually mean that if the **UNCERTAINTY" of Parameter1 changes, so will the UNCERTAINTY of Parameter3?  


This happens when the data to be fitted is not independent but it is correlated. This should affect the fit process somehow.

​Correlations of the Parameters happens even if the uncertainties in the data are completely independent.   

In some sense, the VALUES of the data has to be correlated to be modeled at all, as the whole point of the process is essentially to reduce a set of measued data to a smaller (typically much smaller) set of Parameters which can explain the data.   If the data values can be modelled at all, there must be a correlation. 


The way in which the data is correlated is captured in the correlation matrix of the data (different from the correlation matrix obtained as result of fitting model, which will depend on the particular fitted model).

The function curve_fit from SciPy allow us to account for this by mean of the sigma parameter:


sigma : None or M-length sequence or MxM array, optional

Determines the uncertainty in ydata. If we define residuals as r = ydata - f(xdata, *popt), then the interpretation of sigma depends on its number of dimensions:

  • A 1-d sigma should contain values of standard deviations of errors in ydata. In this case, the optimized function is chisq = sum((r / sigma) ** 2).

  • A 2-d sigma should contain the covariance matrix of errors in ydata. In this case, the optimized function is chisq = r.T @ inv(sigma) @ r.

    New in version 0.19.

None (default) is equivalent of 1-d sigma filled with ones.





That is, instead of giving a simple 1D array containing the uncertainty of each data value, we can give a 2D array (the covariance matrix of the data. Not the covariance matrix of the fit) to perform the fit.

My question: Is something equivalent possible in mlfit?

​Currently, lmfit does not support using a 2-dimensional matrix for the uncertainties of the data.   It looks to be possible (and even not really very hard) to do this.  It might be possible to use `cholesky` and `solve_triangular` to wrap your objective function yourself. That might lead to a more obvious universal solution -- pull requests welcome.

I have to admit that I've not ever tried this, nor have any sense of how to test whether it is wokrking.   I've never actually come across an experiment that produced a matrix of uncertainties and their correlations for data to be modeled.   For the measurements I routinely make, it's hard enough to try to assign uncertainties for the data. Basically, if we can characterize the noise well enough to estimate its size accurately, we understand the origin of the noise well enough to reduce it by an order of magnitude, where we are again unsure even what an accurate estimate of the noise sources are.

I you do intend to make a pull request, please include an annotated and non-trivial example and documentation.

Cheers,

Matt

Stefano Renzo Garcia Castañeda

unread,
May 31, 2018, 7:12:26 AM5/31/18
to lmfit-py
Dear Matt, 

Thank you for your prompt answer. You are right, I expressed myself confusing. Please find below the corrected version of the paragraph that you quoted:

I just found lmfit and I am very tempted to use it. However, I have the same need as Ana. My data has errors which are correlated. That is, the errors of my data are not independent. For instance, if the error of the parameterdata[1] happens to increase, the error of the parameterdata[3] will do as well (in a very particular way). And similar for the rest of the parametersdata.


Let me add a piece more of information, just in case it leads to another reply:

In a fit process, what is often minimized is the Chi squared χ2 :


Where:
  •  - is the data (measurements)
  •  - are the model function and its parameters
  •  - is the uncertainty in the data
  •  - is the residual
Usually, each measurement of our data set is independent and has no correlation with any other element from the dataset. Then  is simply a 1D array where each  tells about each . In fact,  is just the diagonal of the covariance matrix of our dataset. And the covariance matrix happens to be a pure diagonal matrix with all the off-diagonals equal to 0. 

If each measurement of our data set is not independent, then the covariance matrix of our data set is not a diagonal matrix and this can be accounted for by minimizing a so-called "generalized Chi squared"

This is the χ2 that I would like to obtain from lmfit.

By now, I will check what `cholesky` and `solve_triangular` are. If that does not suffice, I will try to make a pull request.

For your information:

My concrete problem is that I have observed data  but what I need to fit is the transformation  which will lead to a non-diagonal covariance matrix for the new dataset . In particular, the covariance matrix of  will be non zero in the diagonal and in the first off-diagonals. 

Best,

Stefano

Stefano Renzo Garcia Castañeda

unread,
May 31, 2018, 7:36:37 AM5/31/18
to lmfit-py
*Sorry. It seems the images did not load on my previous reply. I will try it once more:*

Dear Matt,

Thank you for your prompt answer. You are right, I expressed myself confusing. Please find below the corrected version of the paragraph that you quoted:

I just found lmfit and I am very tempted to use it. However, I have the same need as Ana. My data has errors which are correlated. That is, the errors of my data are not independent. For instance, if the error of the parameterdata[1] happens to increase, the error of the parameterdata[3] will do as well (in a very particular way). And similar for the rest of the parametersdata.


Let me add a piece more of information, just in case it leads to another reply:

In a fit process, what is often minimized is the Chi squared χ2 :



Where: 

  •  - is the data (measurements)
  •  - are the model function and its parameters
  •  - is the uncertainty in the data
  •  - is the residual

Usually, each measurement of our data set is independent and has no correlation with any other element from the dataset. Then is simply a 1D array where eachtells about each. In fact,is just the diagonal of the covariance matrix of our dataset. And the covariance matrix happens to be a pure diagonal matrix with all the off-diagonals equal to 0.


If each measurement of our data set is not independent, then the covariance matrix of our data set is not a diagonal matrix and this can be accounted for by minimizing a so-called "generalized Chi squared"



This is the χ2 that I would like to obtain from lmfit.


By now, I will check what `cholesky` and `solve_triangular` are. If that does not suffice, I will try to make a pull request.

For your information:


My concrete problem is that I have observed databut what I need to fit is the transformationwhich will lead to a non-diagonal covariance matrix for the new dataset. In particular, the covariance matrix ofwill be non zero in the diagonal and in the first off-diagonals. 


Best,


Stefano



 

On Thursday, May 31, 2018 at 4:30:44 AM UTC+2, Matt Newville wrote:

Matt Newville

unread,
May 31, 2018, 8:21:38 AM5/31/18
to lmfit-py
Hi Stefano, 

Thanks for clarifying, but yes, I could not see your images.   It does appear that you do have correlations for the input data that you want to use for weighting the sum-of-squares. 
 
Following what is done in `curve_fit()`, one could probably change Model._residual for the case of a 2x2 matrix of weights.  For clarity, Model.fit() uses `weights`, which would just be 1/sigma here.  A modification like this near the end of Model._residual might do it (needs testing of course):

    if weights is not None:
        if weights.shape == (len(diff), len(diff)):
            sigma = 1./weights
            diff = solve_triangular(cholesky(sigma, lower=True), diff,  lower=True)
        else:
            diff *= weights

If you have a simple example problem to test this on, I would encourage you to submit a PR for this change (and example and doc).

--Matt
 
Reply all
Reply to author
Forward
0 new messages