Easiest way to specify multiple independent variables from a Pandas data frame?

Werner Robitza

unread,

Sep 22, 2015, 7:55:54 AM9/22/15

to lmfit-py

I'm trying to fit a model

 model = Model(fun, independent_vars=['A', 'B'])

where A and B are Series in a Pandas DataFrame. The model should be fit against the Series "target" from that dataframe.

How do I need to prepare the actual data that I pass to the fit() function?

From the FAQ I see that I probably need to flatten it, so does this mean I need to first zip all the series, then flatten that array? Like:

data = zip(df['A'].tolist(), df['B'].tolist())
data = np.array(data).flatten()

But then my independent variable data is four times longer than the target value list.

Could someone please give a simple example on how to curve fit with multiple independent variables?

Thanks

Matt Newville

unread,

Sep 22, 2015, 4:54:11 PM9/22/15

to Werner Robitza, lmfit-py

Hi Werner,

On Tue, Sep 22, 2015 at 6:55 AM, Werner Robitza <werner....@gmail.com> wrote:

I'm trying to fit a model

model = Model(fun, independent_vars=['A', 'B'])

where A and B are Series in a Pandas DataFrame. The model should be fit against the Series "target" from that dataframe.
How do I need to prepare the actual data that I pass to the fit() function?

The 'independent_vars' argument to Model identifies function arguments of the model function. They don't specify (or care about) the types or shape of those values. Identifying an independent variable essentially say "do not turn this function arguments into Parameters".

So it should be fine to identify 'A' and 'B' as independent variables and then pass in DataFrames for those values when you evaluate or fit the model.

The data you pass in to model.fit() needs to be "array like", but pandas Series should be OK in that sense.

From the FAQ I see that I probably need to flatten it, so does this mean I need to first zip all the series, then flatten that array? Like:

data = zip(df['A'].tolist(), df['B'].tolist()) data = np.array(data).flatten()

That should not be necessary. The **output** of Model's built-in objective function needs to be a 1-d array, but it does (simplified):

diff = model.eval(params, ...) - data
return numpy.asarray(diff).ravel()

which should take care of the need to flatten and convert the "data" to a numpy array.

Your model function should calculate a result that matches the data in shape and type. But I think it should be OK to calculate a numpy array or pandas Series. That is, I think those are close enough to subtract one from the other.

But then my independent variable data is four times longer than the target value list.
Could someone please give a simple example on how to curve fit with multiple independent variables?

It should be OK to have multidimensional data (and model) and use pandas Series for the data. If that's not working, giving more details of what you've tried would be helpful.

--Matt Newville

Werner Robitza

unread,

Sep 23, 2015, 3:54:14 AM9/23/15

to lmfit-py, werner....@gmail.com, newv...@cars.uchicago.edu

HI Matt,

Thanks for your reply. I think I understood now. Perhaps it was the lack of a concrete example that made me think in more complicated terms than necessary.

This worked for me — I really just had to pass the series as keyword arguments to the fit function:

import pandas as pd
import numpy as np
from lmfit import Model


df = pd.DataFrame({
  'A'      : pd.Series([1, 1, 1, 2, 2, 2, 2]),
  'B'      : pd.Series([5, 4, 6, 6, 5, 6, 5]),
  'target' : pd.Series([87.79, 40.89, 215.30, 238.65, 111.15, 238.65, 111.15])
})


def fun(A, B, p1 = 1, p2 = 1):
  return p1 * np.exp(A) + p2 * np.exp(B)




model = Model(fun, independent_vars=['A', 'B'])


fit = model.fit(df['target'], A = df['A'], B = df['B'])


fit.eval()

Reply all

Reply to author

Forward