# Weights in lmfit

98 views

### p212121

Dec 12, 2023, 1:57:54 PM12/12/23

I apologize for the rather large minimal example and the lenghty explanation.

I try to fit a Mössbauer doublet which consists of two Lorentz functions. The fit should give two important parameters, the isomer shift ('ishift', which is in the middle of the two functions) and the quadropule splitting ('qsplit', which is distance of the two functions.) y is the intensity or transmittance, x is velocity. I don't will go into too much detail, but the energy of the gamma-ray source (which is moved back and forth) is changed due to the Doppler effect. The absorption of gamma rays of the sample is detected by a 512 channel analyzer. Since the source  is moved back and forth, the spectrum is seen twice. To improve the quality of data acquisition, the spectrum is folded almost in the middle (between channel 256 and 257). The intensities of the lefthand side and right hand side of the spectrum are added. From a calibration, one can determine the velocity and the folding point ('FP'). This are the constants FP,  v0, vmax in the script. All values and errors are normalized.

The result of the fit is great and is in agreement with other software. But chi-square and red. Chi-square are too low. So, it needs some 'weigths'. In theory, the intensity of the left hand side and right hand side spectra should be equal. But they are not, so this the error of the measurement. I multiplied them by two, and calculated the standard deviation. I used 1/stdev as 'weights', but the result was not good. It improved much, after utilizing the mean stdev (mean(stdev)) of all intensities. The latter one was calculated from the sqrt of the mean of the individual variances. With this 'weights', red chi-square is now close to one.

Now my questions.

Is 1/(mean(stdev)) a good approximation for 'weights'?

Why is R-Squared so different from 1? When I calculate R-squared (see end of the script) including mean(stdev)) it is fine.

Why are the residuals look fine in 'result.plot()' but not with ax.plot(result.residual)? Or, why do I have to multiply them with mean(stdev)) to get the true values of the residuals?

Thank you very much!

###################################

import numpy as np

from scipy import interpolate

from lmfit.models import Model, ConstantModel

from lmfit import Parameters

import matplotlib.pyplot as plt

raw_data= [225150, 225024, 225509, 224541, 225529, 225207, 224692, 225212, 224921, 225508, 225109, 225022, 224852, 224851, 224644, 224938, 224923, 225486, 225286, 225308, 225220, 225091,

224878, 225765, 224876, 224130, 224379, 225228, 225107, 224627, 224322, 225049, 224436, 225104, 225211, 224251, 225168, 224577, 224665, 224137, 224008, 225172, 224117, 224886,

225027, 223985, 224571, 225210, 224312, 224028, 224542, 224716, 224321, 224534, 224682, 223912, 224102, 224637, 223367, 223895, 223820, 223040, 223552, 222524, 221435, 221679,

220291, 218876, 218988, 216523, 216121, 214408, 214229, 214935, 216391, 216886, 218981, 220108, 222119, 221573, 222958, 223593, 223349, 223225, 224043, 223746, 224414, 224230,

224452, 224342, 223643, 224559, 225029, 224556, 224369, 224405, 224352, 225544, 224237, 224019, 224804, 225073, 224646, 224541, 225413, 223907, 224729, 224740, 224582, 224189,

225437, 224786, 224323, 225242, 224969, 224298, 224471, 224209, 224498, 224848, 224200, 223855, 223880, 223706, 224926, 224367, 223528, 223938, 223522, 223725, 222921, 222180,

222216, 221271, 220647, 218622, 217752, 216671, 214926, 215136, 216108, 215887, 216592, 218074, 218880, 219658, 221571, 222541, 222824, 222919, 223529, 223875, 223838, 224132,

224781, 225348, 223961, 224647, 223793, 224363, 224755, 225180, 225652, 224906, 224450, 224231, 224653, 224643, 226213, 223895, 224028, 224634, 224933, 224639, 224949, 224848,

223989, 224535, 225283, 224260, 225448, 224570, 224484, 224628, 224751, 224672, 225644, 225289, 225073, 224575, 225086, 225551, 225055, 225237, 224718, 225487, 225035, 225013,

224490, 224287, 224967, 225531, 224847, 224615, 225209, 224629, 224442, 224533, 225024, 224925, 225111, 225662, 224112, 225469, 225964, 224958, 224695, 225219, 225766, 225058,

225060, 224941, 224276, 224689, 225409, 225080, 225860, 224690, 223970, 225353, 224980, 225241, 224364, 224091, 224551, 225962, 224827, 224910, 225159, 225128, 225785, 224404,

225568, 224660, 224022, 224087, 225071, 224646, 224934, 225697, 224868, 225273, 224969, 224356, 224359, 224219, 225141, 224135, 226064, 225363, 224569, 224825, 225074, 225099,

225133, 224294, 225390, 224382, 225638, 224819, 224437, 224738, 225274, 224444, 224942, 225633, 224656, 224600, 224954, 225425, 225470, 224772, 224597, 224677, 225034, 224474,

225645, 224537, 225199, 224536, 225207, 225253, 224959, 225227, 224982, 224488, 224804, 224715, 225040, 224559, 224854, 224831, 224689, 225216, 224780, 224909, 224723, 225149,

224764, 224601, 224646, 224156, 224393, 224707, 224903, 225228, 225356, 224833, 225906, 224826, 224419, 224601, 224743, 224781, 224624, 224468, 225496, 224117, 224904, 224284,

224589, 225339, 225420, 225062, 225187, 224696, 225674, 225122, 224236, 224695, 225354, 225386, 224692, 224958, 224815, 224524, 224492, 224974, 223961, 224935, 224464, 224104,

224629, 225024, 224809, 223966, 223520, 225127, 223532, 223865, 223864, 223038, 223078, 222239, 221280, 220789, 221098, 220402, 218781, 217289, 215914, 215324, 215131, 214834,

215231, 217432, 218191, 219403, 220977, 221772, 221966, 221792, 223089, 222788, 223851, 223691, 223778, 223922, 224969, 224116, 224045, 224816, 224277, 224018, 224316, 223950,

224368, 224617, 224362, 224363, 224180, 224543, 224143, 224809, 224770, 224407, 224812, 224131, 225252, 224278, 224168, 223766, 224849, 224673, 224147, 223513, 224160, 224511,

224269, 223698, 224293, 224596, 224059, 224397, 223815, 224100, 224461, 223263, 223469, 223487, 223560, 222739, 222106, 220833, 219771, 219499, 218317, 217291, 215240, 214412,

214913, 214812, 216481, 217747, 219417, 219823, 221331, 220894, 222084, 223628, 223638, 222785, 224759, 224149, 224394, 224343, 223959, 224287, 224207, 224732, 224151, 224211,

225404, 224782, 224344, 224228, 224719, 224110, 224799, 223948, 225305, 224811, 224880, 224558, 224091, 224190, 224377, 224185, 224556, 224101, 225227, 225201, 224676, 224815,

224684, 224745, 225667, 225032, 225280, 224377, 224651, 223817, 225174, 224650, 224124, 224733, 225910, 224251, 224210, 224736, 224171, 223929, 225147, 224993, 224603, 225539,

223704, 224413, 223695, 225134, 224386, 225196]

def fold_data(N_chan,raw_data,FP,v0,vmax):

#folding ws5 raw data

#N_chan  = total number of channels

#raw_data = (raw) intensity data from measurement, WissEl

#FP      = folding point

#v0      = channel with velocity = 0

#vmax    = maximum velocity

#

#to add to left hand side (lhs) channels

#'(FP - 256.5)*2' for 512 channels, if channel 1 is 1 (and not zero)

folding_diff = (FP - (int(N_chan/2)+0.5))*2

#calc velocity per channel from vmax

chan_lhs = np.linspace(1, int(N_chan/2), int(N_chan/2))

#velocity left hand side (lhs) should be the same as right hand side (rhs)

#so its only lhs

velocity_lhs = vmax - (vmax + vmax)/(N_chan/2-1)*(chan_lhs + (N_chan/2-1)/2 - v0)

#interpolate channels, to operate with channel floating point numbers (xxx.xx)

all_chan = np.linspace(1, int(N_chan), int(N_chan))

ws5_ichan = interpolate.interp1d(all_chan, raw_data, bounds_error=True,

kind = 'linear')

#lhs channels (note that it goes from high to low)

lhs_chan = np.linspace(int(N_chan/2), 1, int(N_chan/2))

#rhs channels

rhs_chan = np.linspace(int(N_chan/2) + 1, N_chan, int(N_chan/2))

#add the intensities of lhs + folding difference and rhs channels pairwise

folded_intens = (np.add(ws5_ichan(lhs_chan + folding_diff), ws5_ichan(rhs_chan)))

#error from folding

#assuming that lhs intensity should be equal to rhs intensity

#since the sum of lhs intensity and rhs intensity is utilized, lhs and rhs intensity

#are doubled (*2)

lhs_i_2x = ws5_ichan(lhs_chan + folding_diff) * 2

rhs_i_2x = ws5_ichan(rhs_chan) * 2

#stddev for error bar plot in ax0

stdev_fold_i = np.std([lhs_i_2x, rhs_i_2x], axis = 0)

#normalized stddev

stdev_fold_i_norm = stdev_fold_i / max(folded_intens)

#mean stddev (one value) from sqrt of the mean of variances of

#lhs and rhs intensity * 2

#the mean stddev is used as weight in the fit and for the residuals

mean_stdev_fold_i = np.sqrt(np.mean(np.var([lhs_i_2x, rhs_i_2x], axis = 0)))

#normalized mean stddev

mean_stdev_fold_i_norm = mean_stdev_fold_i / max(folded_intens)

x = velocity_lhs

y = folded_intens

#x = velocity lhs

#y = intensities = intensity lhs + intensity rhs

return x, y , stdev_fold_i_norm, mean_stdev_fold_i_norm

def lorentzdoublet(x,area1,ishift,qsplit,fwhm):

#function for the Lorentz doublet

#area1  = area

#ishift = I.S. or δ in mm/s

#qsplit = Q.S. or ΔEQ in mm/s

#fwhm   = fwhm

doublet = ((area1/np.pi)*((fwhm/2)/((ishift-abs(qsplit)/2-x)**2+(fwhm/2)**2))) + \

((area1/np.pi)*((fwhm/2)/((ishift+abs(qsplit)/2-x)**2+(fwhm/2)**2)))

return doublet

def normalize_y(y):

#normalize y values

normalized_y_data = y/max(y)

return normalized_y_data

N_chan = 512

FP     = 256.790

v0     = 125.690

vmax   = -4.6927

x, y, stdev_fold_i_norm, mean_stdev_fold_i_norm = fold_data(N_chan,raw_data,FP,v0,vmax)

#normalize data

y = normalize_y(y)

#prepare fit

params = Parameters()

doubmodel = Model(lorentzdoublet, prefix='d_')

params.update(doubmodel.make_params(area1 = {'value':0.1,

'min':-1.00001, 'max':1e-5,

'vary':True},

ishift = {'value':0.9,

'min':-3, 'max':3,

'vary':True},

qsplit = {'value':2.5,

'min':1e-5, 'max':5,

'vary':True},

fwhm   = {'value':0.3,

'min':1e-5, 'max':3,

'vary':True}))

bg = ConstantModel()

bg_params = bg.make_params(c = {'value':1, 'min': 0.8, 'max':1.2, 'vary':True})

curve = doubmodel + bg

#fit

result = curve.fit(y, params + bg_params, x=x)

print('####################################')

print('no weights:')

print(result.fit_report())

print('Residuals: ',result.residual[:3])

print('####################################')

result = curve.fit(y, params + bg_params, x=x, weights = 1 / mean_stdev_fold_i_norm, scale_covar = True)

print('weights, mean stdev, scale_covar=True:')

print(result.fit_report())

print('Residuals: ',result.residual[:3])

print('####################################')

result = curve.fit(y, params + bg_params, x=x, weights = 1 / mean_stdev_fold_i_norm, scale_covar = False)

print('weights, mean stdev, scale_covar=False:')

print(result.fit_report())

print('Residuals: ',result.residual[:3])

print('####################################')

result = curve.fit(y, params + bg_params, x=x, weights = 1 / stdev_fold_i_norm, scale_covar = True)

print('weights, stdev, scale_covar=True:')

print(result.fit_report())

print('Residuals: ',result.residual[:3])

print('####################################')

result = curve.fit(y, params + bg_params, x=x, weights = 1 / mean_stdev_fold_i_norm, scale_covar = True)

print('weights, mean stdev, scale_covar=True, R-squared:')

print(result.fit_report())

print('R-squared with weights: ', 1-(result.residual * mean_stdev_fold_i_norm).var()/np.var(y))

print('Residuals: ',result.residual[:3])

print('Residuals * weigths: ',result.residual[:3] * mean_stdev_fold_i_norm)

result.plot(yerr = stdev_fold_i_norm)

fig, ax = plt.subplots()

ax.plot(result.residual)

plt.show()

### Matt Newville

Dec 12, 2023, 3:27:54 PM12/12/23
On Tue, Dec 12, 2023 at 12:57 PM p212121 <p21...@gmail.com> wrote:

I apologize for the rather large minimal example and the lenghty explanation.

I try to fit a Mössbauer doublet which consists of two Lorentz functions. The fit should give two important parameters, the isomer shift ('ishift', which is in the middle of the two functions) and the quadropule splitting ('qsplit', which is distance of the two functions.) y is the intensity or transmittance, x is velocity. I don't will go into too much detail, but the energy of the gamma-ray source (which is moved back and forth) is changed due to the Doppler effect. The absorption of gamma rays of the sample is detected by a 512 channel analyzer. Since the source  is moved back and forth, the spectrum is seen twice. To improve the quality of data acquisition, the spectrum is folded almost in the middle (between channel 256 and 257). The intensities of the lefthand side and right hand side of the spectrum are added. From a calibration, one can determine the velocity and the folding point ('FP'). This are the constants FP,  v0, vmax in the script. All values and errors are normalized.

The result of the fit is great and is in agreement with other software. But chi-square and red. Chi-square are too low. So, it needs some 'weigths'. In theory, the intensity of the left hand side and right hand side spectra should be equal. But they are not, so this the error of the measurement. I multiplied them by two, and calculated the standard deviation. I used 1/stdev as 'weights', but the result was not good. It improved much, after utilizing the mean stdev (mean(stdev)) of all intensities. The latter one was calculated from the sqrt of the mean of the individual variances. With this 'weights', red chi-square is now close to one.

Now my questions.

Is 1/(mean(stdev)) a good approximation for 'weights'?

I think so.  To be clear, I (think that I) understand the basics of Mossbauer spectroscopy, and I have no idea what  your fold_data() is *really* doing -- it seems reasonable and it seems to be returning decent looking data.  I'll trust that it is right, but "use 1/mean(stdev) as weights" sort of assumes that this calculation is right.

Why is R-Squared so different from 1? When I calculate R-squared (see end of the script) including mean(stdev)) it is fine.

I don't see what you got for R-squared.  But I will bet that you are seeing a bug (for R-squared being mis-reported for models with weights) that was recently fixed. Can you try the code in the current github master and see if that works better for you?

Why are the residuals look fine in 'result.plot()' but not with ax.plot(result.residual)? Or, why do I have to multiply them with mean(stdev)) to get the true values of the residuals?

I don't know what you are seeing.  What are you seeing that you think is not correct?

Do not expect us to run your code and get the same results as you. Show us explicitly what you get, including the text of the fit report that you get.

--Matt

### Hans Hansen

Dec 13, 2023, 8:33:30 AM12/13/23
to lmfit-py
Sorry, I didn't realize that outputs could be different.

Thank you for your considerations regarding the weights.

With the recent (github) version of lmfit, R-squared is exactly as expected.

There is still a difference of the residuals. To get reasonable values, I have to multiply them with mean(stddev).
For some reason with 'result.plot()' (internal lmfit plot function), residuals are as expected (see fit_res.png). But when I plot with 'result.residual' they are not (see residuals.png) (have to multiply them with  mean(stddev) to get the same).

Thank you very much,
Sebastian

fit_res.png
residuals.png

### Matt Newville

Dec 13, 2023, 10:18:56 AM12/13/23
On Wed, Dec 13, 2023 at 7:33 AM Hans Hansen <p21...@gmail.com> wrote:
Sorry, I didn't realize that outputs could be different.

Thank you for your considerations regarding the weights.

With the recent (github) version of lmfit, R-squared is exactly as expected.

Great -- we should probably release a new version soon!

There is still a difference of the residuals. To get reasonable values, I have to multiply them with mean(stddev).
For some reason with 'result.plot()' (internal lmfit plot function), residuals are as expected (see fit_res.png). But when I plot with 'result.residual' they are not (see residuals.png) (have to multiply them with  mean(stddev) to get the same).

Ah, yeah, this is sort of confusing.

The ModelResult.residual array is the weighted difference between data and model.  This is the actual array minimized in the least-squares sense.   For a Model, `residual = weights*(data-model)`.

The ModelResult.plot() routines are plotting the unweighted `data-model` as the residual.

Both weighted and unweighted residuals are useful to have a see.  Maybe we should be naming different things better.  I'm not sure I have a definite answer to what (if anything) needs changing -- perhaps changing the label on the automated plot of the "residual"?

At a minimum, the docs could be clearer!

### Hans Hansen

Dec 13, 2023, 2:07:01 PM12/13/23
to lmfit-py
Thank you for the clarification.

### Antonio Bulgheroni (toto)

Feb 15, 2024, 7:37:38 AMFeb 15
to lmfit-py
Hi! I'm also affected by the R-Squared bug when using weights. The github master code is producing reasonable output. Are you planning to make a new release anytime soon?

Cheers,
toto

### Matt Newville

Feb 16, 2024, 7:50:36 PMFeb 16
Hi Antonio, All,

Sorry for the delay on this.  Things have been unusually busy for the past few months.

I have a branch I am still working on to try to clean up one issue and PR (working on the topic of Github #925 and #926).  I think this will be done this upcoming week and then hope to be able to push out a release in the next few weeks.

--
You received this message because you are subscribed to the Google Groups "lmfit-py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lmfit-py+u...@googlegroups.com.

### Antonio Bulgheroni

Feb 17, 2024, 12:59:29 AMFeb 17
Thanks!

Take it easy and thanks again for all your hard work!

Cheers,
toto

-----------------------------------------
Antonio Bulgheroni, PhD

You received this message because you are subscribed to a topic in the Google Groups "lmfit-py" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lmfit-py/vRFRZAbPK04/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lmfit-py+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lmfit-py/CA%2B7ESbotuN7ARwb9UaJp4J0xfMkBNDFiKC4iyNM61g_8EvEysQ%40mail.gmail.com.