The last part of the sentence is incorrect since the the sign of the signal does not matter, but what matters is the sign of dy. So independent of the model, the np.sqrt of a negative value will give nan. One might think that dy * np.dot(np.linalg.inv(C), dy) is always positive. Unfortunately, this is not guaranteed as np.dot(np.linalg.inv(C), dy does not necessarily have the same sign as dy so multiplication of the two does not give only positive values. May be you can correct me, but I did not manage to make the code work with these residuals (see below).
One can try to make it work by adding abs before doing sqrt:
np.sqrt(np.abs(dy * np.dot(C_inv, dy)))
This operation while giving seemingly right result, does not transform the covariance matrix of residuals into identity and hence will not provide weighted residuals that are normally distributed, centered around 0, and have std of 1.
I wrote a quick example that illustrates the issues. I have also included comparison to the method where we ignore off-diagonal elements in the data covariance matrix and think of noise as of independent. For brevity I use the following notation:
method "sigma" - we ignore off-diagonal elements of covariance matrix
method "sqrtC" - is where we compute residuals as np.sqrt(np.abs(dy * np.dot(C_inv, dy)))
method "chol" - is where we compute residuals using cholesky factor
<code>
# imports
import numpy as np
from scipy import linalg, stats
from lmfit import Minimizer, Parameters
import matplotlib.pyplot as plt
# set seed to 1
np.random.seed(1)
# create true data
x = np.linspace(0, 15, 501)
data_true = (5.0 * np.sin(2.0*x - 0.1) * np.exp(-x*x*0.025)) +10
# create noisy data with covariance that has off-diagonal elements
corelation_width = 1e-1
base = np.exp(-(x/(np.sqrt(2)*corelation_width))**2)
A = linalg.toeplitz(base)
C = A @ A.T *1e-2 + np.eye(x.size)*1e-3
data = np.random.multivariate_normal(data_true, C)
# pre-compute some matrices to speed up calculations
C_inv = np.linalg.inv(C)
L = np.linalg.cholesky(C_inv)
eps_data = np.sqrt(np.diag(C))
def func(params, x):
amp = params['amp']
phaseshift = params['phase']
freq = params['freq']
decay = params['decay']
offset = params['offset']
return amp * np.sin(x*freq + phaseshift) * np.exp(-x*x*decay) + offset
def residual_sigma(params, x, data, sigma):
dy = data-func(params, x)
return dy/sigma
def residual_sqrtC(params, x, data, C_inv):
dy = data-func(params, x)
# return np.sqrt(dy * np.dot(np.linalg.inv(C), dy)) # <- this produces NaNs and does not work
return np.sqrt(np.abs(dy * np.dot(C_inv, dy))) # <- this does work
def residual_chol(params, x, data, L):
dy = data-func(params, x)
return L.T @ dy
params = Parameters()
params.add('amp', value=1)
params.add('decay', value=0.007)
params.add('phase', value=0.2)
params.add('freq', value=3.0)
params.add('offset', value=0.0)
minner_sigma = Minimizer(residual_sigma, params, fcn_args=(x, data, eps_data))
minner_sqrtC = Minimizer(residual_sqrtC, params, fcn_args=(x, data, C_inv))
minner_chol = Minimizer(residual_chol, params, fcn_args=(x, data, L))
result_sigma = minner_sigma.minimize()
result_sqrtC = minner_sqrtC.minimize()
result_chol = minner_chol.minimize()
# find residuals
resid_sigma = result_sigma.residual
resid_sqrtC = result_sqrtC.residual
resid_chol = result_chol.residual
# the part below is a bit of a dirty work, but gets the job done
# figures
plt.figure(1)
plt.clf()
plt.subplot(2,3,(1,2))
plt.plot(x, data, 'k.')
plt.plot(x, func(result_sigma.params, x), 'g-') # sigma
plt.plot(x, func(result_sqrtC.params, x), 'b-') # sqrt(abs)
plt.plot(x, func(result_chol.params, x), 'r-') # proper
plt.xlabel('x')
plt.ylabel('data')
plt.title('data/fit')
plt.subplot(2,3,(4,5))
#plt.plot(x, result_sigma.residual, 'g-') # sqrt(abs)
plt.plot(x, result_sqrtC.residual, 'b-', label='sqrtC')
plt.plot(x, result_chol.residual, 'r-', label='chol')
plt.plot(x, result_sigma.residual, 'g-', label='sigma')
plt.xlabel('x')
plt.ylabel('residuals')
plt.title('residuals')
plt.legend()
plt.subplot(2,3,(6))
plt.hist(resid_sigma, 25, orientation='horizontal', color='g', histtype='step');
plt.hist(resid_sqrtC, 25, orientation='horizontal', color='b', histtype='step');
plt.hist(resid_chol, 25, orientation='horizontal', color='r', histtype='step');
plt.xlabel('occurances')
plt.ylabel('residuals ')
plt.title('residual distribution')
plt.tight_layout()
print('')
print('resid sigma distribution mean:', np.mean(resid_sigma))
print('resid sqrtC distribution mean:', np.mean(resid_sqrtC))
print('resid chol distribution mean:', np.mean(resid_chol))
print('')
print('resid sigma distribution std:', np.std(resid_sigma))
print('resid sqrtC distribution std:', np.std(resid_sqrtC))
print('resid chol distribution std:', np.std(resid_chol))
print('')
# testing residual normality
print('D’Agostino normality test results:')
print('resid sigma normally distributed:', stats.normaltest(resid_sigma)[1]>0.05 )
print('resid sqrtC normally distributed:', stats.normaltest(resid_sqrtC)[1]>0.05 )
print('resid chol normally distributed:', stats.normaltest(resid_chol)[1]>0.05 )
print('')
print('Parameter values')
print('\t sigma \t\t sqrtC \t\t chol')
for key in params.keys():
print(key, '\t %0.6f' %result_sigma.params[key].value,
'\t %0.6f' %result_sqrtC.params[key].value,
'\t %0.6f' %result_chol.params[key].value,
)
print('')
print('Parameter std errors')
print('\t sigma \t\t sqrtC \t\t chol')
for key in params.keys():
print(key, '\t %0.6f' %result_sigma.params[key].stderr,
'\t %0.6f' %result_sqrtC.params[key].stderr,
'\t %0.6f' %result_chol.params[key].stderr,
)
<code>
This will produce the following figure:
On the top panel we can see that the fits are essentially the same.
The inspection of the residuals on the low-left panel shows that the sqrtC method (unsurprisingly) gives values that are all positive and are not centered around 0. If you ever use weighted residuals to check the fit quality (and you should) this will not be a good plot to look at.
Some of the output printouts indicate that both sigma and chol methods give residuals centered around zero; strikingly all the residuals have standard deviation of ~1, which means that some of the whitening actually works:
resid sigma distribution mean: -5.8269982730564494e-05
resid sqrtC distribution mean: 1.7250573130189921
resid chol distribution mean: 0.012187697573908895
resid sigma distribution std: 0.8727569387567908
resid sqrtC distribution std: 1.0494047460860807
resid chol distribution std: 0.9829735616819255
To further understand the normality of the residuals, we can use some classic normality test (scipy implementation of D’Agostino test works just fine):
D’Agostino normality test results:
resid sigma normally distributed: False
resid sqrtC normally distributed: False
resid chol normally distributed: True
Only the proper whitening operator gives the true normal distribution in the weighted residuals, making it a good indicator of goodness of fit.
Although all of these considerations are interesting, one can ask: why bother going through all the linear algebra, if we get the same fits?
Indeed, all 3 fits provide similar best-fit values:
Parameter values
sigma sqrtC chol
amp 4.743743 4.767942 4.723184
decay 0.022070 0.021600 0.021898
phase -0.121092 -0.107353 -0.104812
freq 2.002138 1.995802 1.998234
offset 10.054233 10.036156 10.049204
However, the key is that all of them but chol provide severely underestimated uncertainties (even after correction for chisq->1, that lmfit does automatically):
Parameter std errors
sigma sqrtC chol
amp 0.032110 0.000379 0.117574
decay 0.000340 0.000007 0.001262
phase 0.008500 0.000161 0.025366
freq 0.002570 0.000112 0.008460
offset 0.009648 0.000402 0.036033
To conclude, if you know your covariance matrix, you should incorporate it in the fitting. Using its cholesky factor to compute the residuals is not difficult and produces statistically sound results, while other methods may give you similar fit results while violating a bunch of assumptions and give incorrect uncertainty estimations which one may not even notice.
I hope this clarifies my earlier message,
Denis.