fitting multiple gaussians (240x) and to find fwhm

Rocky

unread,

Nov 11, 2014, 11:38:40 PM11/11/14

to lmfi...@googlegroups.com

Hello All,

I am quite new to python and I came across this wonderful imfit routines. Please find the test.txt which is a image file.

https://drive.google.com/file/d/0B6sUnnbyNGuOQUF3VFBjbnNvSjg/view?usp=sharing

I was writing a code that could extract 24 horizontal profiles and calculate 10 fwhm for gaussian shaped peaks for each profile. The true value of fwhm is close to 1.0.

In the following code, there is some redundancy as I could use for-loops for repetitive stuff.

I am looking how to optimize the code for so many Gaussian fits and extract the true value.

Any advise/help greatly appreciated.


import numpy as np
from lmfit.models import LorentzianModel, GaussianModel, VoigtModel, StepModel, LinearModel
import matplotlib.pyplot as plt
from StringIO import StringIO


#Read and display the image
fob= open('test.txt', 'r')  
data_read= fob.read()      
data = np.genfromtxt(StringIO(data_read), delimiter=",")[:]
plt.imshow(data)
plt.show()

# to extract a profile from the image
def profile(x_value,y_start,y_end):
    x = np.array(range(len(np.arange(y_start,y_end))), dtype=np.float)
    y = np.array(data[x_value,y_start:y_end], dtype=np.float)
    return x, y

# Here I took just one profile but actually I need to perform fit on 24 profiles 
# x = np.linspace(46, 964, 24) 

x, y  =  profile(485, 60, 950)
plt.plot(x,y)
plt.show()


MODEL = 'gauss'
#MODEL = 'loren'
#MODEL = 'voigt'
#gamma_free = False
gamma_free = True

if MODEL.lower().startswith('g'):
    mod1  = GaussianModel(prefix='g1_')
    mod2  = GaussianModel(prefix='g2_')
    mod3  = GaussianModel(prefix='g3_')
    mod4  = GaussianModel(prefix='g4_')
    mod5  = GaussianModel(prefix='g5_')
    mod6  = GaussianModel(prefix='g6_')
    mod7  = GaussianModel(prefix='g7_')
    mod8  = GaussianModel(prefix='g8_')
    mod9  = GaussianModel(prefix='g9_')
    mod10  = GaussianModel(prefix='g10_')
    mod11  = GaussianModel(prefix='g11_')
    mod12  = GaussianModel(prefix='g12_')

elif MODEL.lower().startswith('l'):
    mod1  = LorentzianModel(prefix='l1_')
    mod2  = LorentzianModel(prefix='l2_')
    mod3  = LorentzianModel(prefix='l3_')
    mod4  = LorentzianModel(prefix='l4_')
    mod5  = LorentzianModel(prefix='l5_')
    mod6  = LorentzianModel(prefix='l6_')
    mod7  = LorentzianModel(prefix='l7_')
    mod8  = LorentzianModel(prefix='l8_')
    mod9  = LorentzianModel(prefix='l9_')
    mod10  = LorentzianModel(prefix='l10_')
    mod11  = LorentzianModel(prefix='l11_')
    mod12  = LorentzianModel(prefix='l12_')     
elif MODEL.lower().startswith('v'):
    mod1  = VoigtModel(prefix='v1_')
    mod2  = VoigtModel(prefix='v2_')
    mod3  = VoigtModel(prefix='v3_')
    mod4  = VoigtModel(prefix='v4_')
    mod5  = VoigtModel(prefix='v5_')
    mod6  = VoigtModel(prefix='v6_')
    mod7  = VoigtModel(prefix='v7_')
    mod8  = VoigtModel(prefix='v8_')
    mod9  = VoigtModel(prefix='v9_')
    mod10  = VoigtModel(prefix='v10_')
    mod11  = VoigtModel(prefix='v11_')
    mod12  = VoigtModel(prefix='v12_')     

def index_of(arrval, value):
    "return index of array *at or below* value "
    if value < min(arrval):  return 0
    return max(np.where(arrval<=value)[0])

ix1 = index_of(x, 45)
ix2 = index_of(x, 100)
ix3 = index_of(x, 180)
ix4 = index_of(x, 240)
ix5 = index_of(x, 340)
ix6 = index_of(x, 420)
ix7 = index_of(x, 500)
ix8 = index_of(x, 580)
ix9 = index_of(x, 660)
ix10 = index_of(x,740 )
ix11 = index_of(x,820 )
ix12 = index_of(x,900 )


pars1 = mod1.guess(y[:ix1], x=x[:ix1])
pars2 = mod2.guess(y[ix1:ix2], x=x[ix1:ix2])
pars3 = mod3.guess(y[ix2:ix3], x=x[ix2:ix3])
pars4 = mod4.guess(y[ix3:ix4], x=x[ix3:ix4])
pars5 = mod5.guess(y[ix4:ix5], x=x[ix4:ix5])
pars6 = mod6.guess(y[ix5:ix6], x=x[ix5:ix6])
pars7 = mod7.guess(y[ix6:ix7], x=x[ix6:ix7])
pars8 = mod8.guess(y[ix7:ix8], x=x[ix7:ix8])
pars9 = mod9.guess(y[ix8:ix9], x=x[ix8:ix9])
pars10 = mod10.guess(y[ix9:ix10], x=x[ix9:ix10])
pars11 = mod11.guess(y[ix10:ix11], x=x[ix10:ix11])
pars12 = mod12.guess(y[ix11:ix12], x=x[ix11:ix12])


pars = pars1 + pars2 + pars3 + pars4 + pars5 + pars6 + pars7 + pars8 + pars9 + pars10 + pars11 + pars12
mod = mod1 + mod2 + mod3 + mod4 + mod5 + mod6 + mod7 + mod8 + mod9 + mod10 + mod11 + mod12

out = mod.fit(y, pars, x=x)

print(out.fit_report(min_correl=0.5))

plt.plot(x, y)
plt.plot(x, out.init_fit, 'k--')
plt.plot(x, out.best_fit, 'r-')
plt.show()

Matt Newville

unread,

Nov 12, 2014, 11:07:52 AM11/12/14

to Rocky, lmfit-py

Hi Rocky,

On Tue, Nov 11, 2014 at 10:38 PM, Rocky <xraysp...@googlemail.com> wrote:
> Hello All,
>
> I am quite new to python and I came across this wonderful imfit routines.
> Please find the test.txt which is a image file.
> https://drive.google.com/file/d/0B6sUnnbyNGuOQUF3VFBjbnNvSjg/view?usp=sharing
>
> I was writing a code that could extract 24 horizontal profiles and calculate
> 10 fwhm for gaussian shaped peaks for each profile. The true value of fwhm
> is close to 1.0.
>
> In the following code, there is some redundancy as I could use for-loops for
> repetitive stuff.
>
> I am looking how to optimize the code for so many Gaussian fits and extract
> the true value.
>
> Any advise/help greatly appreciated.
>
>
> import numpy as np
> from lmfit.models import LorentzianModel, GaussianModel, VoigtModel,
> StepModel, LinearModel
> import matplotlib.pyplot as plt
> from StringIO import StringIO
>
>
> #Read and display the image
> fob= open('test.txt', 'r')
> data_read= fob.read()
> data = np.genfromtxt(StringIO(data_read), delimiter=",")[:]
> plt.imshow(data)
> plt.show()

I think this could be just

data = np.loadtxt('text.txt', delimiter=',')

> # to extract a profile from the image
> def profile(x_value,y_start,y_end):
> x = np.array(range(len(np.arange(y_start,y_end))), dtype=np.float)
> y = np.array(data[x_value,y_start:y_end], dtype=np.float)
> return x, y
>
> # Here I took just one profile but actually I need to perform fit on 24
> profiles
> # x = np.linspace(46, 964, 24)
>
> x, y = profile(485, 60, 950)
> plt.plot(x,y)
> plt.show()
>
>
> MODEL = 'gauss'
> #MODEL = 'loren'
> #MODEL = 'voigt'
> #gamma_free = False
> gamma_free = True
>
> if MODEL.lower().startswith('g'):
> mod1 = GaussianModel(prefix='g1_')
> mod2 = GaussianModel(prefix='g2_')

Here, you might consider replacing the if-elif switch with

modelmaker = GaussianModel
#modelmaker = LorentzianModel

peak1 = modelmaker(prefix='p1_')
peak2 = modelmaker(prefix='p2_')
...

I also think that using a bit of structure to add peaks to a model
might simplify your code. Perhaps something like (warning: untested
sketch, you can probably improve this):

peak_bounds = [(0, 45), (45, 100), (100, 180), (180, 240), ....]

model = None
params = None

for i, bounds in enumerate(peak_bounds):
prefix = 'p%d_' % (i+1)
xmin = index_of(x, bounds[0])
xmax = index_of(x, bounds[1])
mod = modelmake(prefix=prefix)
par = mod.guess(y[xmin:ymax], x=x[xmin:xmax])
if model is None:
model = mod
params = par
else:
model = model + mod
params = params + par

You could probably turn that into an "add_peak" function.

Other comments:

1. Are you sure that you should be treating this data as a set of 240
independent peaks, or is there some relation between the peaks?
Fitting 240 independent peaks in one fit will certainly be
challenging!!

2. We don't yet have models for 2D peaks, but that might be what you
actually want.

3. It seems like some of the peaks are saturated at 65536 or that the
data rolled over from low values to ~65500 -- that could be a data
collection / storage issue. But it might make it hard to get good
fits.

--Matt

Rocky

unread,

Nov 12, 2014, 5:07:38 PM11/12/14

to lmfi...@googlegroups.com, xraysp...@googlemail.com, newv...@cars.uchicago.edu

Thank so much Matt for your prompt reply. I mistakenly wrote the true fwhm value as 1,0 but it is about 40.

Other comments:

1. Are you sure that you should be treating this data as a set of 240
independent peaks, or is there some relation between the peaks?
Fitting 240 independent peaks in one fit will certainly be
challenging!!

- I need to fit 240 independent peaks but not necessarily in a single fit and there is no relationship between those peaks.

Do you think it makes sense if I extract just a single peak and then fit a model and do this for 240 peaks to determine fwhm ? for example using doc_model1.py or doc_peakmodels.py. Is there a way to introduce some additional variables to control the shape to optimize the fit.

from numpy import sqrt, pi, exp, linspace, loadtxt

from lmfit import Model

import matplotlib.pyplot as plt

x, y = profile(110, 115, 180)

def gaussian(x, amp, cen, wid):

"1-d gaussian: gaussian(x, amp, cen, wid)"

return (amp/(sqrt(2*pi)*wid)) * exp(-(x-cen)**2 /(2*wid**2))

gmod = Model(gaussian)

result = gmod.fit(y, x=x, amp=38000, cen=40, wid=65)

print(result.fit_report())

plt.plot(x, y, 'bo')

plt.plot(x, result.init_fit, 'k--')

plt.plot(x, result.best_fit, 'r-')

plt.show()

Also, I need to find the absolute position of fwhm ? Is there a way to do it

2. We don't yet have models for 2D peaks, but that might be what you
actually want.

- I have feeling that I don't need 2d models as i will be extract data just in one direction (along x axis).

3. It seems like some of the peaks are saturated at 65536 or that the
data rolled over from low values to ~65500 -- that could be a data
collection / storage issue. But it might make it hard to get good
fits.

- Thats bit weird. I think its more data storage issue but that can be resolve.

Thank you again for all your inputs !

Matt Newville

unread,

Nov 13, 2014, 10:31:10 AM11/13/14

to Rocky, lmfit-py

Hi,

On Wed, Nov 12, 2014 at 4:07 PM, Rocky <xraysp...@googlemail.com> wrote:
> Thank so much Matt for your prompt reply. I mistakenly wrote the true fwhm
> value as 1,0 but it is about 40.
>
>> Other comments:
>>
>> 1. Are you sure that you should be treating this data as a set of 240
>> independent peaks, or is there some relation between the peaks?
>> Fitting 240 independent peaks in one fit will certainly be
>> challenging!!
>
>
> - I need to fit 240 independent peaks but not necessarily in a single fit
> and there is no relationship between those peaks.
> Do you think it makes sense if I extract just a single peak and then fit a
> model and do this for 240 peaks to determine fwhm ? for example using
> doc_model1.py or doc_peakmodels.py. Is there a way to introduce some
> additional variables to control the shape to optimize the fit.

If the peaks can be isolated and the fits can be done independently,
that is probably wise.

>
> from numpy import sqrt, pi, exp, linspace, loadtxt
> from lmfit import Model
> import matplotlib.pyplot as plt
>
> x, y = profile(110, 115, 180)
>
> def gaussian(x, amp, cen, wid):
> "1-d gaussian: gaussian(x, amp, cen, wid)"
> return (amp/(sqrt(2*pi)*wid)) * exp(-(x-cen)**2 /(2*wid**2))
>
> gmod = Model(gaussian)
> result = gmod.fit(y, x=x, amp=38000, cen=40, wid=65)
>
> print(result.fit_report())
>
> plt.plot(x, y, 'bo')
> plt.plot(x, result.init_fit, 'k--')
> plt.plot(x, result.best_fit, 'r-')
> plt.show()
>
> Also, I need to find the absolute position of fwhm ? Is there a way to do
> it

If you use the built-in GaussianModel:

from lmfit.models import GaussianModel
gmod = GaussianModel()
result = gmod.fit(y, x=x, amplitude=38000, center=40, sigma=65)

(this has variable parameters called "amplitude", "center", and
"sigma" like your "amp", "cen", and "wid"), this calculates and
reports (including uncertainty) "fwhm".

>> 2. We don't yet have models for 2D peaks, but that might be what you
>> actually want.
>
> - I have feeling that I don't need 2d models as i will be extract data just
> in one direction (along x axis).

OK.

--Matt

Reply all

Reply to author

Forward