Accounting for input data correlations within lmfit

225 views
Skip to first unread message

goldst...@gmail.com

unread,
May 30, 2022, 12:28:47 PM5/30/22
to lmfit-py
Dear lmfiters,

I have a question regarding fits on heavily-correlated data.

As far as I have seen, lmfit most common methods fully ignore the correlation between consecutive points of your dataset (something extremely common in high-energy physics), assuming they are "statistically independent".

Is there any method currently implemented that properly accounts for such a correlation? Or is there any prescription on the best method to do it? A colleague in my institute suggested a method to modify the minimizing function (adding data covariance matrix elements to it), in order to use the same chi square minimization method but properly taking into account those correlations. (See here).

Is such a method implemented? Or would you feel implementing it within lmfit could be useful for others?

Best,
Tarek

Renee Otten

unread,
Jun 2, 2022, 10:51:41 AM6/2/22
to lmfi...@googlegroups.com
Dear Tarek, 


Lmfit currently supports all the methods that are implemented in SciPy plus a few others (e.g., AMPGO). So if there is a method is SciPy that can do what you want you should be all set ;) 

If not then, it would be helpful if you would provide a bit more information; the link you added doesn’t work for me so I don’t really know what method you are referring to or how they implement that.

Best, 
Renee


--
You received this message because you are subscribed to the Google Groups "lmfit-py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lmfit-py+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lmfit-py/2e3173ca-f6ca-40e6-9bce-1a3ce479774en%40googlegroups.com.

Tarek H.

unread,
Jun 3, 2022, 7:59:08 AM6/3/22
to lmfi...@googlegroups.com
Hi Renee,

Argh, apologies about the dead link. Now it goes attached. I also add another paper in which a similar method is described, using an input covariance matrix of the data as weight.

One can take correlations into account by modifying the minimizing function, and "manually" weighting data following a covariance matrix given as input, calculated from your data. I was wondering either if any already-implemented method allows this, or if it could make sense to incorporate it.

Best,
Tarek

You received this message because you are subscribed to a topic in the Google Groups "lmfit-py" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lmfit-py/uP4bSIwozW4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lmfit-py+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lmfit-py/7585645D-57A9-4EA7-9D48-3F7972302A07%40gmail.com.
cdf8661_chi2fit_w_corr_syst.pdf
jmsp.1994.1127.pdf

Matt Newville

unread,
Jun 3, 2022, 11:45:36 AM6/3/22
to lmfit-py
Just to agree with Renee, I am not aware of a method that takes input correlations in observations (data) into account in the fitting.  
Honestly, I am not 100% sure how one would know these without knowing the response model, which is sort of what one would be fitting.  That is, if you are fitting data to a Gaussian function, you are sort of asserting that the data has that shape, and so knowing a few (x, y) values would accurately reproduce the other data...   For a Gaussian, that would be pretty smooth, but if you are fitting data to a decaying sine wave, the y values may appear to be varying considerably, even if they are also predictable from a few parameters.   

So, I guess I'm not sure how to encode that sort of correlation in data independently from a model for that data.  Do you know of analysis methods where one is expected to supply correlation values for the data?  I couldn't follow the link you sent.



Denis Leshchev

unread,
Jun 3, 2022, 3:25:45 PM6/3/22
to lmfi...@googlegroups.com
Good day!

This question is being asked in this group with a certain periodicity (thread from 2020). Yet, lmfit authors continue to dismiss the existence of correlated noise and generalized least squares method that handles it.

Correlated noise arises in many experiments. For example, consider a measurement with a 2D camera where there is a spillover of charge between the pixels in the camera (case of most of CCD cameras). Here, the intensity measured by one pixel will affect the intensities in the adjacent pixels and vice versa. One ends up not only having the pixel-specific Poisson noise (typical in this application), but also correlated contributions to noise due to the pixel cross-talk. When correlated noise arises, its full characterization entails estimation of the covariance matrix of observations, which, in turn, requires a lot of repetitions that may or may not be available in a given experiment.

Tarek, I happen to have worked on problems that require handling of correlations between the data points and, assuming you know your covariance matrix, this can be handled with relative ease in lmfit. I will assume that you know how to estimate the covariance matrix for your application and it is a given in the fitting that you are trying to implement.

Some theoretical background. In case of uncorrelated noise the cost function (chi squared) is defined as follows:
chi^2 = sum_i ((y_exp_i - y_model_i) / epsilon_i) ^ 2
For correlated noise with a covariance matrix C, chi squared can be written in a vector/matrix form:
chi^2 = (y_exp - y_model)^T C^-1 (y_exp - y_model)

Now, most minimization algorithms (including those implemented in lmfit) require the user to supply a function that can compute residuals. For uncorrelated noise, residuals are defined as
r_i = (y_exp_i - y_model_i) / epsilon_i
For correlated noise you would need to factorize the inverse of covariance matrix as C^-1 = LL^T which will give you a residual vector as
r = L^T (y_exp - y_model)
Matrix L^T here is called whitening transformation. There are many ways to compute L^T, e.g. Cholesky decomposition or singular value decomposition. The Wikipedia link mentioned earlier shows the proof that this approach is correct.
This means that to account for correlated noise with a known covariance matrix, you need to supply lmfit with a residual vector that was pre-multiplied by whitening transformation, as described above.

Below is a working python example that does exactly that. In this example I simulated a dataset with a strong crosstalk between adjacent points. I used Cholesky decomposition to compute whitening transformation and then implemented the residual computation as described above. The figure at the end of the script shows the results of the fitting and the residuals. I hope this helps. Let me know if you have any questions.

_____

# imports
import numpy as np
from scipy import linalg, stats
from lmfit import Minimizer, Parameters
import matplotlib.pyplot as plt

# set seed to 1
np.random.seed(1)

#%%
# create true data
def func(params, x):
    amp = params['amp']
    phaseshift = params['phase']
    freq = params['freq']
    decay = params['decay']
    offset = params['offset']
    return amp * np.sin(x*freq + phaseshift) * np.exp(-x*x*decay) + offset

params = Parameters()
params.add('amp', value=5)
params.add('decay', value=0.025)
params.add('phase', value=-0.1)
params.add('freq', value=2.0)
params.add('offset', value=10.0)

x = np.linspace(0, 15, 501)
data_true = func(params, x)

#%%
# create noisy data with covariance that has off-diagonal elements
corelation_width = 1e-1
base = np.exp(-(x/(np.sqrt(2)*corelation_width))**2)
A = linalg.toeplitz(base)
C = A @ A.T *1e-2 + np.eye(x.size)*1e-3
data = np.random.multivariate_normal(data_true, C)

# pre-compute inverse and whitening matrices to speed up calculations
C_inv = np.linalg.inv(C)
L = np.linalg.cholesky(C_inv)

#%%
# define the residual vector function
def residual_chol(params, x, data, L):
    dy = data-func(params, x)
    return L.T @ dy # multiplication by Cholesky factor is the key component in handling correlated noise

minimizer = Minimizer(residual_chol, params, fcn_args=(x, data, L))
result = minimizer.minimize()

# figures
plt.figure(1)
plt.clf()

plt.subplot(311)
plt.plot(x, data, 'k.')
plt.plot(x, func(result.params, x), 'r-')
plt.xlabel('x')
plt.ylabel('data')
plt.title('data/fit')

plt.subplot(312)
plt.plot(x, data - func(result.params, x), 'k-')
plt.xlabel('x')
plt.ylabel('$r$')
plt.title('raw residuals')

plt.subplot(313)
plt.plot(x, result.residual, 'k-')
plt.xlabel('x')
plt.ylabel('$L^Tr$')
plt.title('normalized residuals')

plt.tight_layout()





Matt Newville

unread,
Jun 3, 2022, 4:33:51 PM6/3/22
to lmfit-py
On Fri, Jun 3, 2022 at 2:25 PM Denis Leshchev <leshche...@gmail.com> wrote:
Good day!

This question is being asked in this group with a certain periodicity (thread from 2020). Yet, lmfit authors continue to dismiss the existence of correlated noise and generalized least squares method that handles it.

Um, "continue to dismiss"? No.  Neither Renee nor I dismissed the question.  In fact, we both responded.  That, in and of itself, demonstrates interest and is a time commitment.  Some random person on the internet asked about fitting data and we responded.  We did not say "no we do not want to do that", just "we do not know how to do that".   

Yeah, you can "whiten data" or weight it any way you want -- good idea.  For image data, neighboring pixels may very well be correlated, so, weighting accordingly is a fine idea.  Is it general?  Nope.  Sometimes "correlation" might mean neighboring data bins, and sometimes it might mean harmonics or other ways of adding frequencies.  That is all very much "correlated" and also highly dependent on the nature of the signal, and on the representation of the data.   For one of the data types I work with (and I believe for one of the kinds of data Renee works on too), using Fourier or similar transforms of the data is really common: measure the data in time, wavenumber domain but transform to frequency or distance domain to do the fit.  In that case, all the data is correlated and things like sampling rates and Nyquist frequencies need to be considered.  

Is there a generalized method to do that that is independent of the model or form of the data?  I am not aware of one.  Your solution literally uses `minimize()`, and is probably a totally fine approach.  But `minimize()` does not **by itself** take correlations between data into account.

--Matt

Denis Leshchev

unread,
Jun 3, 2022, 5:07:16 PM6/3/22
to lmfi...@googlegroups.com
Hi!

Um, "continue to dismiss"? No.  Neither Renee nor I dismissed the question.  In fact, we both responded.  That, in and of itself, demonstrates interest and is a time commitment.  Some random person on the internet asked about fitting data and we responded.  We did not say "no we do not want to do that", just "we do not know how to do that".
Yet whenever the question is asked you guys continue to insist that the only way to do the fit is to perform ordinary/weighted least squares and dismiss generalized least squares and/or insist that knowledge of correlations requires some kind of explicit model of noise. The only thing one needs to know is the covariance matrix.

For image data, neighboring pixels may very well be correlated, so, weighting accordingly is a fine idea.  Is it general?  Nope.  Sometimes "correlation" might mean neighboring data bins, and sometimes it might mean harmonics or other ways of adding frequencies.
Source of correlations and their manifestation is an interesting subject, but if you have ways to compute your covariance matrix (aka you have a lot of measurements of the same thing) then any correlations can be taken into account, regardless of their nature and form. I used adjacent bins/pixels only as an example because it is easy to explain. Covariance matrices can take very peculiar forms, depending on the problem at hand.

For one of the data types I work with (and I believe for one of the kinds of data Renee works on too), using Fourier or similar transforms of the data is really common: measure the data in time, wavenumber domain but transform to frequency or distance domain to do the fit.  In that case, all the data is correlated and things like sampling rates and Nyquist frequencies need to be considered.  
If uncertainties (covariance matrix) is known in the time domain, propagating it into the frequency domain is straightforward using basic linear algebra. Again, you may have funky structure of the covariance matrix, but the basic fitting algorithm does not change.

Is there a generalized method to do that that is independent of the model or form of the data?  I am not aware of one. 
The link to wikipedia page that I posted explains exactly that generalized method. It is even called "generalized least squeares"! Got data? Got model? Got covariance? you are good to go! :-)

Your solution literally uses `minimize()`, and is probably a totally fine approach. But `minimize()` does not **by itself** take correlations between data into account.
Fitting is minimization of a cost function. Of course minimize is used! The correlations are taken into account by the whitening transformation as explained in the previous thread from 2020, wiki article, and my message above.

Cheers,
Denis.




--
You received this message because you are subscribed to the Google Groups "lmfit-py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lmfit-py+u...@googlegroups.com.

Renee Otten

unread,
Jun 3, 2022, 11:30:42 PM6/3/22
to lmfi...@googlegroups.com
Hi Tarek, Denis, 


Tarek: thank you for sending the PDF files, I’ll take a look at those in the next few days. 

Denis: I went back to the thread in 2020 and earlier one in 2016 and quickly glanced over the discussion there. To me it seems that there is/was primarily confusion about the terminology used and the fact that Matt nor me have any experience in this specific area in the sense that we don’t typically use experimental data where we know the covariance in the actual data. Having said that, if the methodology you would want to use is implemented in curve_fit it’s very likely possible to do so in lmfit as well and I don’t think anyone would object to that. 

Is my reading of the discussion(s) that you do know how to accomplish this using the minimizer interface, but would like to be able to supply the matrix with covariance-in-the-data to the Model class - correct? If that’s the case then it shouldn’t be all that hard - especially if you guys are willing to do the testing and check the mathematical implementation. One could add an example to the gallery on the website showing how to write your own residual function to take care of that and then also add there how to use the Model interface to accomplish the same thing. 

Again, I haven’t gone in detail through the math in the threads, links, and PDF files - but if there is interest and people who want to help out there should be a way to get this done. 

Best,
Renee


Tarek H.

unread,
Jun 4, 2022, 2:41:45 AM6/4/22
to lmfi...@googlegroups.com
Hi Rene, all,

First off all, thank you for the careful consideration. I did search the lmfit mailing history and could not find such a discussion, but it seems there is and I missed it... Apologies. 

My initial question was indeed vague, and I never really provided any specific example (sending pdfs is rather lazy!). Yesterday, I was testing the implementation currently available in curve_fit: when the sigma input is given as a 2D matrix, it is assumed to be the covariance matrix. The inverse is calculated and residuals are weighted with it. This method is rather standard so I feel having it implemented in lmfit models would make sense (maybe already is?). 

To be honest, I'm not yet getting a reasonable result using this method. But as soon as I manage, I will be able to test an alternative implementation within lmfit models. I also have a script to simulate correlated data, so that could be perfect for testing the method in an example.

Long story short, I feel I can help validating the implementation and adding an example. For the moment, I feel implementing something equivalent to what curve_fit does makes sense, although there are other methods that could also be tested (although I'll need some time to understand them better!). 

Best, 
Tarek

You received this message because you are subscribed to a topic in the Google Groups "lmfit-py" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lmfit-py/uP4bSIwozW4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lmfit-py+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lmfit-py/3F836C2B-D90B-40A3-A7EE-CB3A81E9033B%40gmail.com.

Matt Newville

unread,
Jun 4, 2022, 9:01:18 AM6/4/22
to lmfit-py
Hi Renee, Tarek, 

Yeah, thanks for that.  I agree that it seems completely possible to add this to the Model class, but some details might need to be worked out.  My recollection (probably dim) from 2 years was that there was not an elegant way to handle covariances that would lead to "negative effective uncertainties", which could happen mathematically, but have no clear meaning.  Like, I think the idea is to extend uncertainty `epsilon` to `sqrt(epsilon * [(1/covar) . epsilon])`  (if I have that correct -- going from memory here).  The Fortran programmer in me worries about that sqrt().  It is one thing to have that work in a carefully crafted example and quite another to support it for others.  And we would want to extend "weight" to "1/epsilon".  ;)

Maybe the solution is "well known" and we (or to be clear: I) just don't know it - that is certainly possible. For sure, one of the challenges for open-source scientific software is that agreeing on terminology can be challenging.  Everyone involved in the development and support has to be able to explain it and work with the code.  So, some random person on the internet pointing to a PDF or web page and saying "see it is obvious, you morons, you just don't care to solve the problem" actually does not understand the problem at all.  We need solutions in Python that are robust and that we can support.

The complaints from Denis about us dismissing the issue are completely absurd.  It is disappointing to see, especially from a scientist in a closely related field, but there seems no point in engaging with someone with that attitude.




--
--Matt Newville <newville at cars.uchicago.edu630-327-7411

Tarek H.

unread,
Jun 6, 2022, 3:31:38 AM6/6/22
to lmfi...@googlegroups.com
Dear all,

I of course never intended to create conflict raising this issue. I originally only asked if 1) such a method existed "built-in" within lmfit or if 2) it would be useful to include it. I feel the answers are 1) no and 2) maybe, but it may not be that simple.

To be honest, I don't consider myself "a random person from the internet" (term used twice) specially because I gave my true name. I understand where the hostilities are coming from and all, but when you build an open source library I guess you are (or should be) open for "random people" input. In any case, I have extensively used this library and I really appreciate the work that has been done by the authors, so from my side you will only receive gratitude!

Denis: your example will be useful, thanks!

Best,
Tarek

You received this message because you are subscribed to a topic in the Google Groups "lmfit-py" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lmfit-py/uP4bSIwozW4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lmfit-py+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lmfit-py/CA%2B7ESbpdHjmUwae1vDQwxWFboo5-V_99DsgpbXE%2BTPbKAREF7Q%40mail.gmail.com.

Denis Leshchev

unread,
Jun 6, 2022, 5:05:56 PM6/6/22
to lmfi...@googlegroups.com
Tarek, all,

Yeah, thanks for that.  I agree that it seems completely possible to add this to the Model class, but some details might need to be worked out.  My recollection (probably dim) from 2 years was that there was not an elegant way to handle covariances that would lead to "negative effective uncertainties", which could happen mathematically, but have no clear meaning.
Diagonal elements of covariance matrix, square root of which gives us uncertainties (epsilon in lmfit notation) cannot be negative as you correctly point out! However, correlations between data points can be negative. Those will result in off-diagonal elements being negative. An important condition that covariance matrix must satisfy is that it has to be positive semi definite, which is not the same as not having any negative elements. So a covariance matrix where off-diagonal elements are negative can be a valid one!

Like, I think the idea is to extend uncertainty `epsilon` to `sqrt(epsilon * [(1/covar) . epsilon])`  (if I have that correct -- going from memory here).  The Fortran programmer in me worries about that sqrt().  It is one thing to have that work in a carefully crafted example and quite another to support it for others.  And we would want to extend "weight" to "1/epsilon".  ;)
The 2020 thread provides lengthy explanations with working examples showing that this is a bad idea. However, you have the right intuition that we should do something like square root of the covariance matrix. Square rooting matrices is not as straightforward as scalars though! The good news is that linear algebra does extend the concept of square roots into the matrix realm/ The square roots of matrices can be computed in many ways (wiki page on matrix square root is a good resource to learn about it) and are not unique except for very special cases. This ambiguity does not matter though In the context of optimization/fitting. In fact, Cholesky decomposition I used in my solution is one of the definitions of matrix square root. I used it for convenience since it requires only one line of code.

Maybe the solution is "well known" and we (or to be clear: I) just don't know it - that is certainly possible. For sure, one of the challenges for open-source scientific software is that agreeing on terminology can be challenging.  Everyone involved in the development and support has to be able to explain it and work with the code.  So, some random person on the internet pointing to a PDF or web page and saying "see it is obvious, you morons, you just don't care to solve the problem" actually does not understand the problem at all.  We need solutions in Python that are robust and that we can support.
In the entire script I enclosed only two lines of code take care of the correlations. If an implementation of two lines of code that use standard libraries (such as numpy) is not robust/simple then I don’t know what is.
In fact, scipy’s curve_fit uses exactly the same approach with cholesky decomposition of covariance matrix:
Their implementation is a bit different technically (they really take advantage of the Cholesky factor's triangular structure) and it most likely results in a better computational performance. However, the methodology is the same.

I have further extended my example to show that curve_fit yields exactly the same result as lmfit with cholesky factor:
# imports
import numpy as np
from scipy import linalg, stats
from lmfit import Minimizer, Parameters
import matplotlib.pyplot as plt

from scipy.optimize import curve_fit


# set seed to 1
np.random.seed(1)

#%%
# create true data

def func(params, x):
    amp = params['amp']
    phase = params['phase']

    freq = params['freq']
    decay = params['decay']
    offset = params['offset']
    return amp * np.sin(x*freq + phase) * np.exp(-x*x*decay) + offset

def func_array(x, amp, phase, freq, decay, offset):
    params = {'amp' : amp,
              'phase' : phase,
              'freq' : freq,
              'decay' : decay,
              'offset' : offset}
    return func(params, x)


params = Parameters()
params.add('amp', value=5)
params.add('phase', value=-0.1)
params.add('freq', value=2.0)
params.add('decay', value=0.025)

params.add('offset', value=10.0)

x = np.linspace(0, 15, 501)
data_true = func(params, x)
p0_arr = [5, -0.1, 2.0, 0.025, 10.0]

#%%
# create noisy data with covariance that has off-diagonal elements
corelation_width = 1e-1
base = np.exp(-(x/(np.sqrt(2)*corelation_width))**2)
A = linalg.toeplitz(base)
C = A @ A.T *1e-2 + np.eye(x.size)*1e-3
data = np.random.multivariate_normal(data_true, C)

# pre-compute some matrices to speed up calculations

C_inv = np.linalg.inv(C)
L = np.linalg.cholesky(C_inv)

#%%

def residual_chol(params, x, data, L):
    dy = data-func(params, x)
    return L.T @ dy # multiplication by Cholesky factor is the key component in handling correlated noise

minimizer = Minimizer(residual_chol, params, fcn_args=(x, data, L))
result = minimizer.minimize()
# find residuals
residual = result.residual

p_arr, p_arr_cov = curve_fit(func_array, x, data, p0=p0_arr, sigma=C)
p_arr_err = np.sqrt(np.diag(p_arr_cov))


# figures
plt.figure(1)
plt.clf()

plt.subplot(311)
plt.plot(x, data, 'k.')
plt.plot(x, func(result.params, x), 'r-')
plt.plot(x, func_array(x, *p_arr), 'b:')

plt.xlabel('x')
plt.ylabel('data')
plt.title('data/fit')

plt.subplot(312)
plt.plot(x, data - func(result.params, x), 'k-')
plt.xlabel('x')
plt.ylabel('$r$')
plt.title('raw residuals')

plt.subplot(313)
plt.plot(x, result.residual, 'k-')
plt.xlabel('x')
plt.ylabel('$L^Tr$')
plt.title('normalized residuals')

plt.tight_layout()

print('\t\tlmfit\t\t\tcurve_fit')
for i, k in enumerate(params.keys()):
    print(k, 'value ', '\t', result.params[k].value,  '\t', p_arr[i])    
    print(k, 'stderr', '\t', result.params[k].stderr, '\t', p_arr_err[i])

The result prints out the following:
amp value        5.112655779597885       5.112665047701475
amp stderr       0.1295239096465614      0.12952611761549376
phase value      -0.08243341946021458    -0.08243446730640175
phase stderr     0.02497059590587641     0.024970471047126793
freq value       1.99514253241486        1.9951432017813735
freq stderr      0.009654727425451334    0.009655074417609226
decay value      0.028541209815362276    0.028541424039505475
decay stderr     0.0016738967218192092   0.001674024833945412
offset value     9.99514278795249        9.995142838118364
offset stderr    0.03720335936338103     0.03720337660121335

It is disappointing to see, especially from a scientist in a closely related field, but there seems no point in engaging with someone with that attitude.
It is also disappointing to have conversations where other parties (including scientists from related or unrelated fields) completely dismiss proposed solutions, explanations, code, examples and just state that the person “actually does not understand the problem at all” with no attempts to actually look into what they are trying to say. Despite this I still love lmfit and hope to get to the bottom of this one day! ;-)

Cheers,
Denis.


Cheers,
Denis.

Matt Newville

unread,
Jun 7, 2022, 2:36:24 PM6/7/22
to lmfit-py
On Mon, Jun 6, 2022 at 4:05 PM Denis Leshchev <leshche...@gmail.com> wrote:
Tarek, all,

Yeah, thanks for that.  I agree that it seems completely possible to add this to the Model class, but some details might need to be worked out.  My recollection (probably dim) from 2 years was that there was not an elegant way to handle covariances that would lead to "negative effective uncertainties", which could happen mathematically, but have no clear meaning.
Diagonal elements of covariance matrix, square root of which gives us uncertainties (epsilon in lmfit notation) cannot be negative as you correctly point out! However, correlations between data points can be negative. Those will result in off-diagonal elements being negative. An important condition that covariance matrix must satisfy is that it has to be positive semi definite, which is not the same as not having any negative elements. So a covariance matrix where off-diagonal elements are negative can be a valid one!

Yep, absolutely agree.


Like, I think the idea is to extend uncertainty `epsilon` to `sqrt(epsilon * [(1/covar) . epsilon])`  (if I have that correct -- going from memory here).  The Fortran programmer in me worries about that sqrt().  It is one thing to have that work in a carefully crafted example and quite another to support it for others.  And we would want to extend "weight" to "1/epsilon".  ;)
The 2020 thread provides lengthy explanations with working examples showing that this is a bad idea.

um... sorry... what is a bad idea?   Your read of past messages might be different from mine.  Be precise.

However, you have the right intuition that we should do something like square root of the covariance matrix. Square rooting matrices is not as straightforward as scalars though! The good news is that linear algebra does extend the concept of square roots into the matrix realm/ The square roots of matrices can be computed in many ways (wiki page on matrix square root is a good resource to learn about it) and are not unique except for very special cases. This ambiguity does not matter though In the context of optimization/fitting. In fact, Cholesky decomposition I used in my solution is one of the definitions of matrix square root. I used it for convenience since it requires only one line of code.

Yes, right, I see that now.  Yes, you are completely right.  Cholesky decomposition is exactly the thing to use.  I'm in favor of doing that.

Maybe the solution is "well known" and we (or to be clear: I) just don't know it - that is certainly possible. For sure, one of the challenges for open-source scientific software is that agreeing on terminology can be challenging.  Everyone involved in the development and support has to be able to explain it and work with the code.  So, some random person on the internet pointing to a PDF or web page and saying "see it is obvious, you morons, you just don't care to solve the problem" actually does not understand the problem at all.  We need solutions in Python that are robust and that we can support.
In the entire script I enclosed only two lines of code take care of the correlations. If an implementation of two lines of code that use standard libraries (such as numpy) is not robust/simple then I don’t know what is.
In fact, scipy’s curve_fit uses exactly the same approach with cholesky decomposition of covariance matrix:
Their implementation is a bit different technically (they really take advantage of the Cholesky factor's triangular structure) and it most likely results in a better computational performance. However, the methodology is the same.

A PR is welcome.  And, well, we do need "weights" to be the inverse of covariance.  Complaints about ignoring issues not so much.

--Matt

Renee Otten

unread,
Jun 11, 2022, 10:30:16 PM6/11/22
to lmfi...@googlegroups.com
I started to look at this a bit more and plan to make a PR to add support the use of the covariance matrix to fit correlated data.

I’ve added the example from Denis to show how to do this when using the Minimizer class directly. There is no change in the code needed here, just a demonstration of how to use the Cholesky factor to take the correlation of the data into account in the residual function. 

For the Model class, I *think* the easiest/cleanest way to implement this is to add another argument (for example, “cov_data” os something similar) to the .fit() method. Although in theory possible (i.e., by checking the dimensions of the array as scipy.optimize.curve_fit does), I’d rather not re-use “weights” for this purpose. Matt and others: do you agree on this; just checking before starting to make the actual changes.

Renee

Matt Newville

unread,
Jun 12, 2022, 12:39:13 PM6/12/22
to lmfit-py
Hi Renee,

On Sat, Jun 11, 2022 at 9:30 PM Renee Otten <otten...@gmail.com> wrote:
I started to look at this a bit more and plan to make a PR to add support the use of the covariance matrix to fit correlated data.

I’ve added the example from Denis to show how to do this when using the Minimizer class directly. There is no change in the code needed here, just a demonstration of how to use the Cholesky factor to take the correlation of the data into account in the residual function. 

For the Model class, I *think* the easiest/cleanest way to implement this is to add another argument (for example, “cov_data” os something similar) to the .fit() method. Although in theory possible (i.e., by checking the dimensions of the array as scipy.optimize.curve_fit does), I’d rather not re-use “weights” for this purpose. Matt and others: do you agree on this; just checking before starting to make the actual changes.


Okay, thanks for that.  I guess an outside contribution here is unlikely.

It seems that the use of `weights` is somewhat confusing.  I kind of think that it will not be possible to eliminate the chance of confusing anyone.  Trying to not confuse anyone is probably hopeless, but we can try to come up with a coherent strategy of what makes the most sense for us.

How would `cov_data` interact with `weights`?  That is, adding `cov_data` might add to the confusion about `weights` -- can we come up with an approach that reduces that chance?

Could a user simply pass a diagonal (or 1-D) `cov_data` and have the inverse of that used (in place of? in addition to?) `weights`?  Maybe you could call that `epsilon` and allow it to be 1-D or 2-D?   Or maybe we leave it as is and consider it a documentation problem?   I guess I don't have a strong opinion of whether using `epsilon` makes more sense than `weights`, but it does seeem that extending espilon to covariance does seem slightly more straightforward than extending `weights`. 

--Matt

Reply all
Reply to author
Forward
0 new messages