lmfit for social science data?

64 views
Skip to first unread message

Roger Hart

unread,
Feb 14, 2025, 10:34:19 PMFeb 14
to lmfit-py
I am writing to ask for assistance using lmfit for social science data. I am trying to find best-fit curves to project into the short-term future citation trends in quantum information science (QIS). I hope to publish in a journal with the highest standards. I have spent a considerable amount of time (weeks) trying to solve what I think should be a fairly simple problem: getting the best fit, as measured by R^2 etc. values, to justify my projections. I have tried many functions and parameters in curve_fit. The past several days I have tried various models and parameters in lmfit. In the following MWE, I have used only ExponentialModel along with a logistic function, both with minimal parameters, but in general, nothing has worked. Admittedly, the data for North America and Europe is not smooth, but I am having similar trouble with other data that is smooth. I would most sincerely appreciate any guidance on how to best model this data. Is this the best I can do? I have attached the MWE data, along with the resulting graph. Thanks so very much, Roger  

MWE PROGRAM

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from lmfit.models import ExponentialModel
from lmfit import Model

# Define custom logistic function
def logistic_func(x, K, a, b):
    return K / (1 + np.exp(-a * (x - b)))

# Create an lmfit model from custom function
LogisticModel = Model(logistic_func, independent_vars=['x'])

df = pd.read_csv("/Users/rhart/Documents/Quantum/Databases/QIST/output/QIS_MWE_data.csv")

def fit_and_plot_exponential(ax, years, shares, region):
    # Use LogisticModel for China, ExponentialModel otherwise
    if region == "China":
        model = LogisticModel
        params = model.make_params(K=max(shares), a=0.1, b=np.median(years))
    else:
        model = ExponentialModel()
        params = model.guess(shares, x=years)

    result = model.fit(shares, params, x=years, nan_policy='propagate')

    # Plot best fit line
    ax.plot(years, result.best_fit, "--")
    ax.plot(years, shares, label=region)
    years_extended = np.linspace(years.min(), 2032, 100)  # Extend to 2030
    ax.plot(years_extended, result.eval(x=years_extended), "--")

    return result

fig, ax = plt.subplots()
years = df["Year"].values
for region in df.columns[1:]:
    shares = df[region].values
    result = fit_and_plot_exponential(ax, years, shares, region)
    print(f'\n{region}\n')
    print(result.fit_report())

ax.legend()
plt.show()

OUTPUT

China

[[Model]]
    Model(logistic_func)
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 26
    # data points      = 25
    # variables        = 3
    chi-square         = 38.0925752
    reduced chi-square = 1.73148069
    Akaike info crit   = 16.5285891
    Bayesian info crit = 20.1852165
    R-squared          = 0.95580629
[[Variables]]
    K:  31.3865536 +/- 7.20714671 (22.96%) (init = 20.95146)
    a:  0.12580945 +/- 0.02074981 (16.49%) (init = 0.1)
    b:  2018.51003 +/- 3.88125688 (0.19%) (init = 2012)
[[Correlations]] (unreported correlations are < 0.100)
    C(K, b) = +0.9951
    C(a, b) = -0.9486
    C(K, a) = -0.9368

Europe

[[Model]]
    Model(exponential)
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 134
    # data points      = 25
    # variables        = 2
    chi-square         = 400.449145
    reduced chi-square = 17.4108324
    Akaike info crit   = 73.3427739
    Bayesian info crit = 75.7805255
    R-squared          = 0.51212748
[[Variables]]
    amplitude:  4.7900e+14 +/- 3.0058e+15 (627.51%) (init = 1.490743e+15)
    decay:      66.6497466 +/- 13.8653110 (20.80%) (init = 64.22144)
[[Correlations]] (unreported correlations are < 0.100)
    C(amplitude, decay) = -1.0000

India

[[Model]]
    Model(exponential)
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 3495
    # data points      = 25
    # variables        = 2
    chi-square         = 4.12707795
    reduced chi-square = 0.17943817
    Akaike info crit   = -41.0326547
    Bayesian info crit = -38.5949030
    R-squared          = 0.89166040
[[Variables]]
    amplitude:  2.979e-125 +/- 8.101e-124 (2718.97%) (init = 1.093542e-89)
    decay:     -7.02162915 +/- 0.66163751 (9.42%) (init = -9.820983)
[[Correlations]] (unreported correlations are < 0.100)
    C(amplitude, decay) = -1.0000

North America

[[Model]]
    Model(exponential)
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 221
    # data points      = 25
    # variables        = 2
    chi-square         = 449.485870
    reduced chi-square = 19.5428639
    Akaike info crit   = 76.2307148
    Bayesian info crit = 78.6684665
    R-squared          = 0.40977526
[[Variables]]
    amplitude:  6.4483e+13 +/- 4.5367e+14 (703.54%) (init = 1.024293e+13)
    decay:      71.2487149 +/- 17.7617672 (24.93%) (init = 76.19592)
[[Correlations]] (unreported correlations are < 0.100)
    C(amplitude, decay) = -1.0000
QIS_MWE_data.csv
lmfit_MWE.png

Jeremy DeJournett

unread,
Feb 14, 2025, 10:43:17 PMFeb 14
to lmfi...@googlegroups.com

Dr. Hart,

Without additional context or the raw data, it would be tough to say that these fits are necessarily bad fits. The upper two curves (with the fits that look like lines of negative slope) look to be the consequence of noisy data, nothing more. I'm not sure if it's possible or correct to filter the incoming data, even using a simple weighted average filter would smooth the data into something with better fit characteristics.

Although, lacking context, it's hard to say what better really is. What in particular are you unhappy about with these fits? What is your goal? Do you know for certain they should be modeled, eg, logistically (because, for example, you know some underlying distribution), or is that a guess based on the data you've seen? If the latter, you may be better using something like a cubic spline interpolator for short term predictive power based on assuming some underlying continuity in the process you're observing.

Additional detail would help!


--
You received this message because you are subscribed to the Google Groups "lmfit-py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lmfit-py+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/lmfit-py/3f3e243d-1ef6-4c9a-8490-c000d07716ebn%40googlegroups.com.

Roger Hart

unread,
Feb 14, 2025, 11:42:15 PMFeb 14
to lmfit-py
Dear Jeremy (if I may, and please call me Roger),

Thank you very much for your prompt response! As you likely guessed from my message, I am new to curve fitting (although I am trained in mathematics). My goal is simple: I wish to publish this research in one of the very best journals; my thesis is that Chinese QIS and science in general are advancing very rapidly, exponentially in the near term; and I wish to make projections into the near-term future. I therefore must provide the most accurate curve fits possible. I am concerned that although my curve fits seem to me to be correct, I must justify my projections using R^2 along with other accepted metrics. The slightly modified MWE I have included here shows the results using Nature Index data sets, which are relatively smooth, at least compared to my QIS data. Yet, as you can see, the R^2 values range from .992 for China to .06 for North America and lower. Again, I have tried many different functions and parameters, using curve_fit and now lmfit. My concern is this: visually the trends are clear, and the curve fits are good; but the metrics, including R^2 and others, seem to me to be terrible. I am concerned that this will undermine my thesis, and could even result in rejection of my article. I would tremendously appreciate any guidance you might offer. Again, thanks so very much, Roger

MWE PROGRAM (SLIGHTLY MODIFIED)

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from lmfit.models import ExponentialModel
from lmfit import Model

# Define custom logistic function
def logistic_func(x, K, a, b):
    return K / (1 + np.exp(-a * (x - b)))

# Create an lmfit model from custom function
LogisticModel = Model(logistic_func, independent_vars=['x'])

# df = pd.read_csv("/Users/rhart/Documents/Quantum/Databases/QIST/output/QIS_MWE_data.csv")
df = pd.read_csv("/Users/rhart/Documents/Quantum/Databases/QIST/output/lmfit_NI_MWE.csv")


def fit_and_plot_exponential(ax, years, shares, region):
    # Use LogisticModel for China, ExponentialModel otherwise
    if region == "China":
        model = LogisticModel
        params = model.make_params(K=max(shares), a=0.1, b=np.median(years))
    else:
        model = ExponentialModel()
        params = model.guess(shares, x=years)

    result = model.fit(shares, params, x=years, nan_policy='propagate')

    # Plot best fit line
    ax.plot(years, result.best_fit, "--")
    ax.plot(years, shares, label=region)
    years_extended = np.linspace(years.min(), 2028, 100)  # Extend to 2030

    ax.plot(years_extended, result.eval(x=years_extended), "--")

    return result

fig, ax = plt.subplots()
years = df["Year"].values
for region in df.columns[1:]:
    shares = df[region].values
    result = fit_and_plot_exponential(ax, years, shares, region)
    print(f'\n{region}\n')
    print(result.fit_report())

ax.legend()
plt.savefig("lmfit_NI_MWE", dpi=300)
plt.show()

OUTPUT

ASEAN [[Model]] Model(exponential) [[Fit Statistics]] # fitting method = leastsq # function evals = 257 # data points = 9 # variables = 2 chi-square = 7440.39029 reduced chi-square = 1062.91290 Akaike info crit = 64.4570861 Bayesian info crit = 64.8515352 R-squared = 0.45030856 [[Variables]] amplitude: 1.5874e-10 +/- 1.9539e-09 (1230.92%) (init = 4.984574e-11) decay: -69.4116022 +/- 29.3517490 (42.29%) (init = -66.75665) [[Correlations]] (unreported correlations are < 0.100) C(amplitude, decay) = -1.0000 China [[Model]] Model(logistic_func) [[Fit Statistics]] # fitting method = leastsq # function evals = 7516 # data points = 9 # variables = 3 chi-square = 1650775.38 reduced chi-square = 275129.230 Akaike info crit = 115.075780 Bayesian info crit = 115.667454 R-squared = 0.99296697 [[Variables]] K: 1.4977e+09 +/- 3.5076e+13 (2342061.42%) (init = 23171.84) a: 0.14525621 +/- 0.03433156 (23.64%) (init = 0.1) b: 2100.28926 +/- 161669.621 (7697.49%) (init = 2020) [[Correlations]] (unreported correlations are < 0.100) C(K, b) = +1.0000 C(a, b) = -0.9876 C(K, a) = -0.9876 CIS [[Model]] Model(exponential) [[Fit Statistics]] # fitting method = leastsq # function evals = 282 # data points = 9 # variables = 2 chi-square = 26561.4122 reduced chi-square = 3794.48746 Akaike info crit = 75.9099118 Bayesian info crit = 76.3043609 R-squared = 0.13105334 [[Variables]] amplitude: 9.8569e-12 +/- 3.1357e-10 (3181.18%) (init = 2.954567e-12) decay: -63.9895294 +/- 64.4668629 (100.75%) (init = -61.64783) [[Correlations]] (unreported correlations are < 0.100) C(amplitude, decay) = -1.0000 East Asia [[Model]] Model(exponential) [[Fit Statistics]] # fitting method = leastsq # function evals = 45 # data points = 9 # variables = 2 chi-square = 275440.984 reduced chi-square = 39348.7121 Akaike info crit = 96.9601369 Bayesian info crit = 97.3545860 R-squared = 0.00311246 [[Variables]] amplitude: 1092.81757 +/- 11220.5932 (1026.76%) (init = 859.4368) decay: -1321.79277 +/- 8880.24737 (671.83%) (init = -1142.621) [[Correlations]] (unreported correlations are < 0.100) C(amplitude, decay) = -1.0000 Europe [[Model]] Model(exponential) [[Fit Statistics]] # fitting method = leastsq # function evals = 92 # data points = 9 # variables = 2 chi-square = 1168805.70 reduced chi-square = 166972.243 Akaike info crit = 109.968416 Bayesian info crit = 110.362865 R-squared = 0.62358039 [[Variables]] amplitude: 7.9876e-05 +/- 4.5057e-04 (564.09%) (init = 0.0001055617) decay: -104.763223 +/- 30.6436168 (29.25%) (init = -106.3014) [[Correlations]] (unreported correlations are < 0.100) C(amplitude, decay) = -1.0000 India [[Model]] Model(exponential) [[Fit Statistics]] # fitting method = leastsq # function evals = 1141 # data points = 9 # variables = 2 chi-square = 39295.8802 reduced chi-square = 5613.69717 Akaike info crit = 79.4348535 Bayesian info crit = 79.8293026 R-squared = 0.87663358 [[Variables]] amplitude: 7.0811e-54 +/- 1.2913e-52 (1823.60%) (init = 1.321418e-48) decay: -15.6144893 +/- 2.19359076 (14.05%) (init = -17.23095) [[Correlations]] (unreported correlations are < 0.100) C(amplitude, decay) = -1.0000 Latin America [[Model]] Model(exponential) [[Fit Statistics]] # fitting method = leastsq # function evals = 767 # data points = 9 # variables = 2 chi-square = 14023.6094 reduced chi-square = 2003.37278 Akaike info crit = 70.1614570 Bayesian info crit = 70.5559061 R-squared = 0.56328946 [[Variables]] amplitude: 1.0403e-20 +/- 1.8589e-19 (1787.00%) (init = 2.373629e-22) decay: -38.4798445 +/- 13.0933579 (34.03%) (init = -35.89635) [[Correlations]] (unreported correlations are < 0.100) C(amplitude, decay) = -1.0000 MENA [[Model]] Model(exponential) [[Fit Statistics]] # fitting method = leastsq # function evals = 138 # data points = 9 # variables = 2 chi-square = 5834.14556 reduced chi-square = 833.449365 Akaike info crit = 62.2683267 Bayesian info crit = 62.6627759 R-squared = 0.76348651 [[Variables]] amplitude: 3.5373e-14 +/- 2.8335e-13 (801.03%) (init = 2.018318e-14) decay: -53.4122452 +/- 11.3093378 (21.17%) (init = -52.63192) [[Correlations]] (unreported correlations are < 0.100) C(amplitude, decay) = -1.0000 North America [[Model]] Model(exponential) [[Fit Statistics]] # fitting method = leastsq # function evals = 57 # data points = 9 # variables = 2 chi-square = 2947243.01 reduced chi-square = 421034.715 Akaike info crit = 118.292405 Bayesian info crit = 118.686854 R-squared = 0.06657959 [[Variables]] amplitude: 98.8970767 +/- 754.843792 (763.26%) (init = 119.0168) decay: -373.216546 +/- 526.283690 (141.01%) (init = -386.4626) [[Correlations]] (unreported correlations are < 0.100) C(amplitude, decay) = -1.0000 Oceania [[Model]] Model(exponential) [[Fit Statistics]] # fitting method = leastsq # function evals = 71 # data points = 9 # variables = 2 chi-square = 28231.6719 reduced chi-square = 4033.09599 Akaike info crit = 76.4587765 Bayesian info crit = 76.8532257 R-squared = 0.52925071 [[Variables]] amplitude: 4.1757e-12 +/- 4.9932e-11 (1195.77%) (init = 3.468818e-12) decay: -60.4147141 +/- 21.5999694 (35.75%) (init = -60.08293) [[Correlations]] (unreported correlations are < 0.100) C(amplitude, decay) = -1.0000 South Asia [[Model]] Model(exponential) [[Fit Statistics]] # fitting method = leastsq # function evals = 1400 # data points = 9 # variables = 2 chi-square = 534.602030 reduced chi-square = 76.3717186 Akaike info crit = 40.7586822 Bayesian info crit = 41.1531314 R-squared = 0.32971045 [[Variables]] amplitude: 6.4104e-46 +/- 3.6306e-44 (5663.53%) (init = 8.295705e-39) decay: -18.7450493 +/- 9.82433844 (52.41%) (init = -22.10781) [[Correlations]] (unreported correlations are < 0.100) C(amplitude, decay) = -1.0000 Sub-Saharan Africa [[Model]] Model(exponential) [[Fit Statistics]] # fitting method = leastsq # function evals = 2915 # data points = 9 # variables = 2 chi-square = 5183.43338 reduced chi-square = 740.490482 Akaike info crit = 61.2039852 Bayesian info crit = 61.5984343 R-squared = 0.82931838 [[Variables]] amplitude: 1.192e-126 +/- 6.336e-125 (5315.00%) (init = 3.873133e-109) decay: -6.85049332 +/- 1.23054381 (17.96%) (init = -7.93547) [[Correlations]] (unreported correlations are < 0.100) C(amplitude, decay) = -1.0000 Rest of World [[Model]] Model(exponential) [[Fit Statistics]] # fitting method = leastsq # function evals = 1119 # data points = 9 # variables = 2 chi-square = 481.696791 reduced chi-square = 68.8138272 Akaike info crit = 39.8208125 Bayesian info crit = 40.2152616 R-squared = 0.73788330 [[Variables]] amplitude: 1.3530e-63 +/- 4.6088e-62 (3406.30%) (init = 7.221239e-61) decay: -13.5640834 +/- 3.08790564 (22.77%) (init = -14.16167) [[Correlations]] (unreported correlations are < 0.100) C(amplitude, decay) = -1.0000
lmfit_NI_MWE.png
lmfit_NI_MWE.csv

Jeremy DeJournett

unread,
Feb 14, 2025, 11:49:45 PMFeb 14
to lmfi...@googlegroups.com
I think it would be prudent to factor in a certain amount of measurement noise into your model, perhaps using a Kalman filter with an appropriate a-posteriori model backing it to remove that noise, and then perform your fit on that data. I think it's not unreasonable to assess that you might have some noise in your measurements, which, given their time spans the better part of a generation, might have changed methods a bit from year to year (only speculating here).

The a-posteriori model and parameters of the noise (is it gaussian etc) are something you likely have expertise in.

There are some fantastic resources out there for learning how to program your own Kalman filter, it's fairly simple linear algebra once you pick your covariance etc matrices.

Hope this helps,

Jeremy

Jeremy DeJournett

unread,
Feb 14, 2025, 11:58:34 PMFeb 14
to lmfi...@googlegroups.com

Another possibility, if you understand your sources of noise well, is to do a Monte-Carlo simulation as part of your fitness function. That way, even if you don't know what the exact amount of noise is in a given year, you can keep mixing it in as the solver solves. With enough samples, you might converge faster than expected. At the very least, you can cover a huge amount of scenario variations this way. If you do this, I'd recommend adding the seed to an RNG as a parameter; that way when the solver completes you can rerun with exactly the same noise to see if it passes a sniff test.

I'm hoping someone smarter than me at this will chime in with a simpler way, but I've found this method pretty effective in the past.

Roger Hart

unread,
Feb 15, 2025, 12:13:51 PMFeb 15
to lmfit-py
Dear Jeremy,

Thank you very much indeed for your very thorough and helpful suggestions! I very much appreciate it! I will certainly look into these solutions. I must confess, however, as you note, that what I was hoping to find was a very simple, straightforward solution: ideally, my hope was to find a direct approach with conclusive results---that is, the most technically accurate, widely accepted approach, one that would then produce results for which metrics such as R^2 would confirm the curve fit validity. I am a little concerned that the more I process the data, the less reliable the results may seem. So yes, I would very much appreciate anyone with additional suggestions. And again, Jeremy, thanks so very much for your very detailed suggestions! I do appreciate it!

Very best, 

Roger

Matt Newville

unread,
Feb 15, 2025, 1:07:06 PMFeb 15
to lmfi...@googlegroups.com

Hi Roger, 

 

As Jeremy suggested, understanding and characterizing the fluctuations and uncertainties in the data would be helpful.    Like, your first CSV file had 11 significant digits – almost certainly far below “the noise level”.   If I understand correctly, this data is “publication year” for articles published on some topic and then selected by “region”.  In my experience, these could easily be influenced by the occurrences of related conferences and funding cycles.   I might also wonder how clear the meaning of “region” is.   It sounds like you are aware of all these and other subtleties.

 

For fitting itself, having a single model is vastly preferable to different models.  If you are trying to show that “Category C” is different from others, using a different model for that data might not be super-persuasive.   I suggest thinking about the Model for its ability to “explain the phenomenon”, which is a bit different from “heuristically match the data”.   That is, the Model should imply some Theory of why (not just how) the data is changing.

 

To me, Exponential implies compounding growth, which is different from Accelerating growth.   I might suggest a Quadratic model, implying “Offset, Velocity, and Acceleration” of change.    Exponential growth implies more of an explosion. It may be a popular notion that a field or technique (or App or Meme or whatever) is experiencing Exponential growth when it is “only” Accelerating (and may reach a constant velocity).

 

An advantage of a Quadratic model is that you can use Regression methods.  We’re all in for non-linear least-squares fitting, but some problems are linear (in the parameters) and so can use Regression.

 

I would say that none of the data you show is convincingly Logistic, which would imply a Step-like change (which could be possible:  Group X entered the funding of this field, or Discovery Z makes certain techniques 10x cheaper/faster.   Those things do happen in research sometimes ;) ).

 

Anyway, good luck with your analysis and findings!

 

--Matt

 

 

 

 

Roger Hart

unread,
Feb 17, 2025, 3:25:10 PMFeb 17
to lmfi...@googlegroups.com
Dear Matt and Jeremy,

Thank you very much indeed for all your expert help!!! I think what I am looking for is exponential growth with a variable exponent. My first attempt is below. It seems to give better results, but there is still considerable room for improvement. I would tremendously appreciate any further suggestions!

Thanks again,

Very best,

Roger

PROGRAM

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import lmfit

# Define the variable exponent exponential function
def variable_exp(x, A, alpha0, beta, t0):
    alpha_x = alpha0 / (1 + beta * (x - t0))  # Variable exponent
    return A * np.exp(alpha_x * (x - t0))


# Create an lmfit model
VariableExpModel = lmfit.Model(variable_exp, independent_vars=['x'])

# Load data
df = pd.read_csv("/Users/rhart/Documents/Quantum/Databases/QIST/output/QIS_MWE_data.csv").dropna()

# Function to fit and plot
def fit_and_plot(ax, years, shares, region):
    """Fits the variable exponential model and plots the results."""
   
    if len(years) < 5:  # Not enough data to fit
        print(f"Skipping {region}, insufficient data points.")
        return None

    # Define initial parameter estimates
    A_init = shares[0] if shares[0] > 0 else np.mean(shares)  # Start with first value
    alpha0_init = max(0.01, np.abs(np.gradient(np.log(shares + 1)).mean()))  # Estimate initial growth rate
    beta_init = 0.01  # Small decay factor

    # Define parameter constraints
    params = VariableExpModel.make_params(A=A_init, alpha0=alpha0_init, beta=beta_init, t0=years.min())
    params['A'].set(min=0)  # A must be positive
    # params['alpha0'].set(min=0)  # Growth rate must be positive
    params['beta'].set(min=0, max=1)  # Decay factor must be reasonable
    params['t0'].set(value=years.min(), vary=False)  # Fix t0 to prevent drift

    # Fit the model
    try:
        result = VariableExpModel.fit(shares, params, x=years, nan_policy='omit')
    except Exception as e:
        print(f"Fit failed for {region}: {e}")
        return None

    # Plot actual data and best fit
    ax.plot(years, shares, 'o', label=region)
    ax.plot(years, result.best_fit, "--", label=f"{region} Fit")

    # Extend projection to 2032

    years_extended = np.linspace(years.min(), 2032, 100)
    ax.plot(years_extended, result.eval(x=years_extended), "--", alpha=0.6)

    return result

# Setup plot
fig, ax = plt.subplots(figsize=(10, 6))

years = df["Year"].values

# Iterate over each region and fit model

for region in df.columns[1:]:
    shares = df[region].values
    result = fit_and_plot(ax, years, shares, region)
    if result:
        print(f'\n{region} Fit Report:\n')
        print(result.fit_report())

ax.legend()
ax.set_xlabel("Year")
ax.set_ylabel("Shares")
ax.set_title("Variable Exponential Fit with Decreasing Growth Rate")
plt.show()

RESULTS

China Fit Report:

[[Model]]
    Model(variable_exp)
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 35
    # data points      = 25
    # variables        = 3
    chi-square         = 36.6713186
    reduced chi-square = 1.66687812
    Akaike info crit   = 15.5779779
    Bayesian info crit = 19.2346053
    R-squared          = 0.95745519
[[Variables]]
    A:       2.39023918 +/- 0.55894676 (23.38%) (init = 0.3092917)
    alpha0:  0.16630843 +/- 0.05183735 (31.17%) (init = 0.1373607)
    beta:    0.03474102 +/- 0.01687994 (48.59%) (init = 0.01)
    t0:      2000 (fixed)
[[Correlations]] (unreported correlations are < 0.100)
    C(alpha0, beta) = +0.9822
    C(A, alpha0)    = -0.9547
    C(A, beta)      = -0.8846

Europe Fit Report:

[[Model]]
    Model(variable_exp)
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 58
    # data points      = 25
    # variables        = 3
    chi-square         = 400.449145
    reduced chi-square = 18.2022339
    Akaike info crit   = 75.3427739
    Bayesian info crit = 78.9994014
    R-squared          = 0.51212748
[[Variables]]
    A:       44.4825774 +/- 2.48562521 (5.59%) (init = 44.11792)
    alpha0: -0.01500366 +/- 0.01171201 (78.06%) (init = 0.02351537)
    beta:    5.1431e-13 +/- 0.00509658 (990953569712.16%) (init = 0.01)
    t0:      2000 (fixed)
[[Correlations]] (unreported correlations are < 0.100)
    C(alpha0, beta) = -0.9621
    C(A, alpha0)    = -0.8282
    C(A, beta)      = +0.6916

India Fit Report:

[[Model]]
    Model(variable_exp)
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 111
    # data points      = 25
    # variables        = 3
    chi-square         = 4.12707796
    reduced chi-square = 0.18759445
    Akaike info crit   = -39.0326546
    Bayesian info crit = -35.3760271
    R-squared          = 0.89166040
[[Variables]]
    A:       0.14999824 +/- 0.11314413 (75.43%) (init = 0.1334867)
    alpha0:  0.14241291 +/- 0.08476284 (59.52%) (init = 0.07943498)
    beta:    1.1869e-12 +/- 0.00632034 (532515755435.27%) (init = 0.01)
    t0:      2000 (fixed)
[[Correlations]] (unreported correlations are < 0.100)
    C(alpha0, beta) = -0.9867
    C(A, alpha0)    = -0.9716
    C(A, beta)      = +0.9217

North America Fit Report:

[[Model]]
    Model(variable_exp)
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 21
    # data points      = 25
    # variables        = 3
    chi-square         = 434.553944
    reduced chi-square = 19.7524520
    Akaike info crit   = 77.3861066
    Bayesian info crit = 81.0427341
    R-squared          = 0.42938253
[[Variables]]
    A:       43.4547080 +/- 3.20150939 (7.37%) (init = 41.81545)
    alpha0: -0.03129501 +/- 0.02964241 (94.72%) (init = 0.01542437)
    beta:    0.04975881 +/- 0.08146205 (163.71%) (init = 0.01)
    t0:      2000 (fixed)
[[Correlations]] (unreported correlations are < 0.100)
    C(alpha0, beta) = -0.9673
    C(A, alpha0)    = -0.8368
    C(A, beta)      = +0.7006





You received this message because you are subscribed to a topic in the Google Groups "lmfit-py" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lmfit-py/nUJLZMM3mmI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lmfit-py+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/lmfit-py/D635B8B0-CCC0-0440-B8FF-AD5FE07619EF%40hxcore.ol.
lmfit_variable_exponent.png

Jeremy DeJournett

unread,
Feb 19, 2025, 9:24:41 PMFeb 19
to lmfi...@googlegroups.com
I will reiterate that you have to model the measurement noise, or you will not be able to see the R^2 you want with the data you have. This will involve making noise-adjusted data for the final fit, which you will have to make a case for at publication time. Sufficient rigor in the categorization of the noise should pass peer review in my experience.

Best of luck!

Jeremy 

Roger Hart

unread,
Feb 20, 2025, 4:10:27 PMFeb 20
to lmfit-py
Dear Jeremy,

Thank you very much for your helpful suggestions! I very much appreciate it! I do wonder whether there might be fundamental differences between modeling social science data versus science data? Could you possibly suggest preeminent experts in modeling social science data who I might contact, or the best published works on modeling social science data? I have looked, but I have not found much that was helpful. And again, I am new to this field, but I do have good mathematical training. 

Again, thank you so very much,

Very best,

Roger
Reply all
Reply to author
Forward
0 new messages