Composite model: fit() routine adding more prefixed parameters

31 views

Skip to first unread message

nama...@colorado.edu

unread,

Apr 10, 2019, 2:35:15 PM4/10/19

to lmfit-py

Hello,

I am having unexpected runtime behavior from the composite model functionality. My code is similar to the built-in example https://lmfit.github.io/lmfit-py/builtin_models.html#example-3-fitting-multiple-peaks-and-using-prefixes, in that I am summing the same model multiple times with different prefixes. However, with my example, the built-in lmfit eval() and fit() routines appear to double-up the prefixes to produce an unwanted larger set of fit parameters. This double-prefix problem doesn't seem to happen in the linked Example 3, and I don't understand what in my code is causing this difference.

My goal is to fit an arbitrary number of parallel spectroscopy models, where each model has the same governing equation and small set of floatable parameters, but calculated using a different large set of unique fixed parameters pulled from a database elsewhere on the computer. Therefore my lmfit model must handle some strings to locate and appropriately pull from the database file. My current strategy is to pass these strings in a particularly-formatted keyword argument. But this particular implementation backfires because of the new parameters with both the prefixes.

Example Python code:

It should run directly from the command-line, because this example doesn't actually pull from an external database file, but otherwise keeps the same architecture and shows the same bug.

-----------------------------------------------------------------

import numpy as np
from lmfit import Model

# Minimization model, to be added arbitrary number of times into composite model
def spectra_debug(xx, pressure, pathlength, **kwargs):
    '''
    Simplistic model with similar setup characteristics to hitran.org/hapi
    '''

    # Pass string information with each model.
    # In full version this string would point to data file
    # of thousands of fixed additional model parameters,
    # and relevant information for parsing subset of data file.
    for key, values in kwargs.items():
        strings = key.split('_')
        if 'hitran' in strings[1]:
            mol_id = int(strings[-2])

    # Toy model, where string information from kwargs influences model output.
    absorp = np.zeros(len(xx))
    if mol_id is 1:
        # make spectrum at 0
        absorp += pathlength * np.exp(-(xx - xx[0])**2 / pressure)
    elif mol_id is 6:
        # make spectrum at end
        absorp -= pathlength * np.exp(-(xx - xx[-1])**2 / pressure)

    return absorp

# Situation specific data and fitting parameters
x_data = np.linspace(7100,7110,1500)
y_debug = np.linspace(-1,1,len(x_data))
molec_list = [['H2O',
[1, 1],
[0.8289473684210527, False],
[100, False]],
['CH4',
[6, 1],
[0.02368421052631579, False],
[91.44, True]]]


# lmfit composite model setup
for ii, molec in enumerate(molec_list):
    name = molec[0]
    mol_id, iso = molec[1]
    p_atm, float_p = molec[2]
    pathlength, float_l = molec[3]
    if ii is 0:
        gmodel = Model(spectra_debug, prefix = name + '_')
        params = gmodel.make_params()
    else:
        model_new = Model(spectra_debug, prefix = name + '_')
        params.update(model_new.make_params())
        gmodel += model_new
    params.add(name + '_hitran_' + repr(mol_id) + '_' + repr(iso),value=0)

    # Update the guess + limits on parameters
    params[name + '_pressure'].set(value = p_atm,
                          min=0, max=1, vary = float_p)
    params[name + '_pathlength'].set(value = pathlength,
                          min=0, max=2*pathlength, vary = float_l)

# And run lmfit
out = gmodel.fit(y_debug, xx=x_data, params=params)
out.best_values # this has 2x as many parameters as input params

--------------------------------------------------------------------------------------------------------------------

This code produces the following input parameters

And a larger list of output parameters (new unwanted parameters shown in red brackets, with both of the prefixes)

And if I calculate the individual model components, I expect the 'H2O (1)' input to produce the positive signal and the 'CH4 (6)' component to produce the negative signal, but the opposite is true

--------------------------------------------

comps = gmodel.eval_components(xx=x_data, params=params)

import matplotlib.pyplot as plt

plt.figure(); plt.plot(x_data, comps['H2O_'],label = 'H2O')

plt.plot(x_data, comps['CH4_'],label = 'CH4'); plt.legend()

-------------------------------------

My questions:

1) What can I change to avoid creating these double-prefix parameters (and do they really exist)?

2) Do you have a different recommended method of passing strings to models to handle this external-database situation?

Thank you,

Nate

Using lmfit 0.9.13

Matt Newville

unread,

Apr 10, 2019, 10:25:36 PM4/10/19

to lmfit-py

Hi Nate,

Sorry for the trouble. I am having a hard time following what you are doing.

On Wed, Apr 10, 2019 at 1:35 PM <nama...@colorado.edu> wrote:

Hello,

I am having unexpected runtime behavior from the composite model functionality. My code is similar to the built-in example https://lmfit.github.io/lmfit-py/builtin_models.html#example-3-fitting-multiple-peaks-and-using-prefixes, in that I am summing the same model multiple times with different prefixes.

OK... your code seems pretty different from that example to me.

What is supposed to happen If `hitran` is not in `kwargs`? It looks like that case will fail. I think that means you really want `hitran` to not be buried in `kwargs` but a real parameter, probably a keyword parameter with a default value. I have no idea what `hitran` is meant to signify.

# Situation specific data and fitting parameters
x_data = np.linspace(7100,7110,1500)
y_debug = np.linspace(-1,1,len(x_data))
molec_list = [['H2O',
[1, 1],
[0.8289473684210527, False],
[100, False]],
['CH4',
[6, 1],
[0.02368421052631579, False],
[91.44, True]]]


# lmfit composite model setup
for ii, molec in enumerate(molec_list):
    name = molec[0]
    mol_id, iso = molec[1]
    p_atm, float_p = molec[2]
    pathlength, float_l = molec[3]

Why this structure for molec_list? Flat is better than nested. Or, with 7 entries use an object or a named tuple or something like

molec_list = [('H2O', 1, 1, 0.8289473684210527, False, 100, False),
('CH4', 6, 1, 0.02368421052631579, False, 91.44, True)]

gmodel = None

for name, mod_id, iso, p_atm, float_p, pathlength, float_l in molec_list:

if gmodel is None:

gmodel = Model(spectra_debug, prefix = name + '_')
params = gmodel.make_params()

else:

model_new = Model(spectra_debug, prefix = name + '_')
        params.update(model_new.make_params())
        gmodel += model_new

   if ii is 0:
        gmodel = Model(spectra_debug, prefix = name + '_')
        params = gmodel.make_params()
    else:
        model_new = Model(spectra_debug, prefix = name + '_')
        params.update(model_new.make_params())
        gmodel += model_new
    params.add(name + '_hitran_' + repr(mol_id) + '_' + repr(iso),value=0)

I think you would be better off using `mol_id` as a parameter to `spectra_debug` and just not varying that.

Do you need `hitran` for something else? I would avoid a) expecting `kwargs` to have any values, and b) turning

an int to a string only to parse it back to an it.

    # Update the guess + limits on parameters
    params[name + '_pressure'].set(value = p_atm,
                          min=0, max=1, vary = float_p)
    params[name + '_pathlength'].set(value = pathlength,
                          min=0, max=2*pathlength, vary = float_l)

# And run lmfit
out = gmodel.fit(y_debug, xx=x_data, params=params)
out.best_values # this has 2x as many parameters as input params
--------------------------------------------------------------------------------------------------------------------

This code produces the following input parameters
And a larger list of output parameters (new unwanted parameters shown in red brackets, with both of the prefixes)

Sorry, I don't really understand what you're doing here.

And if I calculate the individual model components, I expect the 'H2O (1)' input to produce the positive signal and the 'CH4 (6)' component to produce the negative signal, but the opposite is true

I have no idea...

--------------------------------------------
comps = gmodel.eval_components(xx=x_data, params=params)
import matplotlib.pyplot as plt
plt.figure(); plt.plot(x_data, comps['H2O_'],label = 'H2O')
plt.plot(x_data, comps['CH4_'],label = 'CH4'); plt.legend()
-------------------------------------

My questions:
1) What can I change to avoid creating these double-prefix parameters (and do they really exist)?

Sorry, but I don't understand what that is asking. I'd strongly suggest cleaning up the code, but then posting full examples.

2) Do you have a different recommended method of passing strings to models to handle this external-database situation?

Hm is there an external database? Maybe I'm missing some important background information or something...