209 views

Skip to first unread message

Dec 22, 2023, 11:10:14 PM12/22/23

to lmfit-py

Dear all,

# - - -

# Example of nonlinear regression for Nina's project using hard clam (Mercenaria mercenaria) data from Ries et al 2009

# using the lmfit Python package

# Greg Pelletier (gjpel...@gmail.com)

# - - -

# Non-Linear Least-Squares Minimization and Curve-Fitting for Python

# https://lmfit.github.io//lmfit-py/index.html

import matplotlib.pyplot as plt

# the following line is needed for making plots in ipython:

%matplotlib auto

import numpy as np

from numpy import loadtxt

from numpy import exp, linspace, sin

import lmfit as fit

from lmfit.models import ExpressionModel

# import data

data = loadtxt('/mnt/c/nina/hanna/ries_hardclam.dat')

x = data[:, 0]

y = data[:, 1]

# find 25%tile y to use for initial b1

y25th = np.percentile(y,25)

# buidl the expression model and set the initial parameter values

mod = ExpressionModel('b1 + b2 * exp(b3 * x)')

pars = fit.Parameters()

pars['b1'] = fit.Parameter(name='b1', value=y25th, min=-np.inf, max=np.inf)

pars['b2'] = fit.Parameter(name='b2', value=0, min=-np.inf, max=np.inf)

pars['b3'] = fit.Parameter(name='b3', value=0, min=-np.inf, max=np.inf)

print(pars)

# Parameters([('b1', <Parameter 'b1', value=0.326622, bounds=[-inf:inf]>), ('b2', <Parameter 'b2', value=0, bounds=[-inf:inf]>), ('b3', <Parameter 'b3', value=0, bounds=[-inf:inf]>)])

# find the best-fit for the model

out = mod.fit(y, pars, x=x)

# - - -

# Using initial b1 = y25th (good result):

# These results are nearly identical to what I found with matlab using this data set

print(out.fit_report())

# [[Model]]

# Model(_eval)

# [[Fit Statistics]]

# # fitting method = leastsq

# # function evals = 1451

# # data points = 25

# # variables = 3

# chi-square = 2.0705e-08

# reduced chi-square = 9.4115e-10

# Akaike info crit = -516.793759

# Bayesian info crit = -513.137132

# R-squared = 0.82814534

# [[Variables]]

# b1: 6.9536e-05 +/- 1.6571e-05 (23.83%) (init = -1.665168e-05)

# b2: -675692.875 +/- 5434609.15 (804.30%) (init = 0)

# b3: -22.2179803 +/- 8.14731384 (36.67%) (init = 0)

# [[Correlations]] (unreported correlations are < 0.100)

# C(b2, b3) = +0.9999

# C(b1, b3) = +0.8711

# C(b1, b2) = +0.8671

# - - -

# Plot best fit ypred prediciton line at interpolated/extrapolated xpred values

# This result does a good job of matching the results of matlab for the best fitting paramters b1, b2, and b3

# x values to use for the smooth lines of the prediction interval plot

xpred = linspace(.95, 1.3, 100)

# extract parameter values from the best fit output and make user-defined function for the regression

d = out.params

b1 = d['b1'].value

b2 = d['b2'].value

b3 = d['b3'].value

param = [b1, b2, b3]

f = lambda param,xval : param[0] + param[1] * exp(param[2] * xval)

# predicted y values at xpred

ypred = f(param,xpred)

plt.figure()

plt.plot(x, y, '.', label='observations')

plt.plot(x, out.init_fit, '--', label='initial fit')

plt.plot(xpred, ypred, '-', label='interpolated/extrapolated best fit')

plt.legend()

plt.savefig('/mnt/c/nina/hanna/python_ries_hardclam_bestfit_goodresult_v06.png',dpi=300)

# - - -

# Plot prediction intervals

# This is my first attempt to plot prediction intervals using lmfit

# Obviously the result is not correct.

# I am not sure if I am doing something wrong, or if there is a bug in lmfit that gives incorrect results

plt.figure()

plt.plot(x, y, '.', label='observations')

plt.plot(xpred, ypred, '-', label='interpolated/extrapolated best fit')

dy = out.eval_uncertainty(x=xpred,sigma=2)

plt.fill_between(xpred, ypred-dy, ypred+dy,

color="#ABABAB", label=r'2-$\sigma$ uncertainty band')

plt.legend()

plt.savefig('/mnt/c/nina/hanna/python_ries_hardclam_predictioninterval_badresult_v06.png',dpi=300)

I am getting obviously incorrect results from eval_uncertainty() in my application of lmfit with an expression model.

The best fit of the model for the predicted y compared with observations appears to be a good result, but the prediction intervals or confidence intervals of the predicted y from eval_uncertainty() appears to be very wrong.

I am not sure if I am doing something wrong, or if it is a bug in lmfit. Any advice is appreciated.

Here are my questions:

How can I get correct results for the 95% prediction interval for this model?

How do I specify whether the output is prediction intervals or confidence intervals (I want 95% prediction intervals for the uncertainty of predicted y at new observations)?

Does eval_uncertainty() use the delta-method to estimate the confidence intervals or prediction intervals for the predicted y?

Best

Greg Pelletier

P.S. below is my python code that is also attached along with the example data file and output plots

# - - -

# Example of nonlinear regression for Nina's project using hard clam (Mercenaria mercenaria) data from Ries et al 2009

# using the lmfit Python package

# Greg Pelletier (gjpel...@gmail.com)

# - - -

# Non-Linear Least-Squares Minimization and Curve-Fitting for Python

# https://lmfit.github.io//lmfit-py/index.html

import matplotlib.pyplot as plt

# the following line is needed for making plots in ipython:

%matplotlib auto

import numpy as np

from numpy import loadtxt

from numpy import exp, linspace, sin

import lmfit as fit

from lmfit.models import ExpressionModel

# import data

data = loadtxt('/mnt/c/nina/hanna/ries_hardclam.dat')

x = data[:, 0]

y = data[:, 1]

# find 25%tile y to use for initial b1

y25th = np.percentile(y,25)

# buidl the expression model and set the initial parameter values

mod = ExpressionModel('b1 + b2 * exp(b3 * x)')

pars = fit.Parameters()

pars['b1'] = fit.Parameter(name='b1', value=y25th, min=-np.inf, max=np.inf)

pars['b2'] = fit.Parameter(name='b2', value=0, min=-np.inf, max=np.inf)

pars['b3'] = fit.Parameter(name='b3', value=0, min=-np.inf, max=np.inf)

print(pars)

# Parameters([('b1', <Parameter 'b1', value=0.326622, bounds=[-inf:inf]>), ('b2', <Parameter 'b2', value=0, bounds=[-inf:inf]>), ('b3', <Parameter 'b3', value=0, bounds=[-inf:inf]>)])

# find the best-fit for the model

out = mod.fit(y, pars, x=x)

# - - -

# Using initial b1 = y25th (good result):

# These results are nearly identical to what I found with matlab using this data set

print(out.fit_report())

# [[Model]]

# Model(_eval)

# [[Fit Statistics]]

# # fitting method = leastsq

# # function evals = 1451

# # data points = 25

# # variables = 3

# chi-square = 2.0705e-08

# reduced chi-square = 9.4115e-10

# Akaike info crit = -516.793759

# Bayesian info crit = -513.137132

# R-squared = 0.82814534

# [[Variables]]

# b1: 6.9536e-05 +/- 1.6571e-05 (23.83%) (init = -1.665168e-05)

# b2: -675692.875 +/- 5434609.15 (804.30%) (init = 0)

# b3: -22.2179803 +/- 8.14731384 (36.67%) (init = 0)

# [[Correlations]] (unreported correlations are < 0.100)

# C(b2, b3) = +0.9999

# C(b1, b3) = +0.8711

# C(b1, b2) = +0.8671

# - - -

# Plot best fit ypred prediciton line at interpolated/extrapolated xpred values

# This result does a good job of matching the results of matlab for the best fitting paramters b1, b2, and b3

# x values to use for the smooth lines of the prediction interval plot

xpred = linspace(.95, 1.3, 100)

# extract parameter values from the best fit output and make user-defined function for the regression

d = out.params

b1 = d['b1'].value

b2 = d['b2'].value

b3 = d['b3'].value

param = [b1, b2, b3]

f = lambda param,xval : param[0] + param[1] * exp(param[2] * xval)

# predicted y values at xpred

ypred = f(param,xpred)

plt.figure()

plt.plot(x, y, '.', label='observations')

plt.plot(x, out.init_fit, '--', label='initial fit')

plt.plot(xpred, ypred, '-', label='interpolated/extrapolated best fit')

plt.legend()

plt.savefig('/mnt/c/nina/hanna/python_ries_hardclam_bestfit_goodresult_v06.png',dpi=300)

# - - -

# Plot prediction intervals

# This is my first attempt to plot prediction intervals using lmfit

# Obviously the result is not correct.

# I am not sure if I am doing something wrong, or if there is a bug in lmfit that gives incorrect results

plt.figure()

plt.plot(x, y, '.', label='observations')

plt.plot(xpred, ypred, '-', label='interpolated/extrapolated best fit')

dy = out.eval_uncertainty(x=xpred,sigma=2)

plt.fill_between(xpred, ypred-dy, ypred+dy,

color="#ABABAB", label=r'2-$\sigma$ uncertainty band')

plt.legend()

plt.savefig('/mnt/c/nina/hanna/python_ries_hardclam_predictioninterval_badresult_v06.png',dpi=300)

Dec 23, 2023, 11:46:43 AM12/23/23

to lmfi...@googlegroups.com

Hi Greg,

On Fri, Dec 22, 2023 at 10:10 PM Greg Pelletier <g.j.pe...@gmail.com> wrote:

>

> Dear all,

>

> I am getting obviously incorrect results from eval_uncertainty() in my application of lmfit with an expression model.

I am not sure that these results are obviously wrong. I think they are

just expressing very large uncertainties.

>

> The best fit of the model for the predicted y compared with observations appears to be a good result, but the prediction intervals or confidence intervals of the predicted y from eval_uncertainty() appears to be very wrong.

>

> I am not sure if I am doing something wrong, or if it is a bug in lmfit. Any advice is appreciated.

Well, you really only have 4 independent x values, and they don't span

a very large "dynamic range" that might be really helpful for fitting

an exponential. That is, even over one decade, exponentially decaying

data will often look a lot like a quadratic or 1/x function. The x

range here goes from 1.0 to 1.15, much less than even a factor of 2,

and the y values are pretty scattered. So a fit (probably any

analysis) will see this as effectively 4 measurements with large

uncertainties.

The fit is reporting the b2 scale of the exponential as -6.7 +/- 54.

x 10^5. Yep, that is pretty uncertain. And that is what the plot

with the values from `eval_uncertainty()` is showing.

I could be wrong, but I don't think that any regression or analysis

method would give a substantially different result here. There just

is not enough range or certainty in this data to get a high-fidelity

measurement of exponential decay.

More data?

> --

> You received this message because you are subscribed to the Google Groups "lmfit-py" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to lmfit-py+u...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/lmfit-py/b1c3d59e-bb20-4038-9fcf-e34d42c864c1n%40googlegroups.com.

--Matt

On Fri, Dec 22, 2023 at 10:10 PM Greg Pelletier <g.j.pe...@gmail.com> wrote:

>

> Dear all,

>

> I am getting obviously incorrect results from eval_uncertainty() in my application of lmfit with an expression model.

just expressing very large uncertainties.

>

> The best fit of the model for the predicted y compared with observations appears to be a good result, but the prediction intervals or confidence intervals of the predicted y from eval_uncertainty() appears to be very wrong.

>

> I am not sure if I am doing something wrong, or if it is a bug in lmfit. Any advice is appreciated.

a very large "dynamic range" that might be really helpful for fitting

an exponential. That is, even over one decade, exponentially decaying

data will often look a lot like a quadratic or 1/x function. The x

range here goes from 1.0 to 1.15, much less than even a factor of 2,

and the y values are pretty scattered. So a fit (probably any

analysis) will see this as effectively 4 measurements with large

uncertainties.

The fit is reporting the b2 scale of the exponential as -6.7 +/- 54.

x 10^5. Yep, that is pretty uncertain. And that is what the plot

with the values from `eval_uncertainty()` is showing.

I could be wrong, but I don't think that any regression or analysis

method would give a substantially different result here. There just

is not enough range or certainty in this data to get a high-fidelity

measurement of exponential decay.

More data?

> You received this message because you are subscribed to the Google Groups "lmfit-py" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to lmfit-py+u...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/lmfit-py/b1c3d59e-bb20-4038-9fcf-e34d42c864c1n%40googlegroups.com.

--Matt

Dec 23, 2023, 1:42:15 PM12/23/23

to lmfit-py

Hi Matt,

Thanks for the reply, very much appreciated

The matlab nlpredci function gives a substantially different result compared with eval_uncerainty(). The matlab nlpredci results (attached to this reply) look much more reasonable. I think that matlab nlpredci is using the "delta-method".

Next I will try using the delta method in R to confirm the results from matlab nlpredci. I expect that the results using the delta method in R will most likely look very similar to the results of matlab nlpredci, and very different from my results of eval_uncertainty().

I agree that there is a lot of uncertainty in some parameters. However, I would expect that whichever method is used, the envelope of uncertainty for the 95% prediction limits should not be as extremely beyond the min and max observed y values as the results from eval_uncertainty().

Looking forward to your thoughts on the extremely different results of eval_uncertainty() compared with matlab nlpredci. I would also like to get your opinion on whether my python code for using eval_uncertainty() is correct. Also, do you know how to specify whether the output from eval_uncertainty() may be either confidence intervals for the regression line, or instead prediction intervals for new observations?

Best

Greg

Dec 24, 2023, 10:29:10 AM12/24/23

to lmfi...@googlegroups.com

Hi Greg,

Hmm, that's interesting. I should say that I have basically no

experience with Matlab and do not know what any of its functions do.

That said, I notice that the parameter values on the graph you

attached are remarkably close to those from lmfit (leastsq/MINPACK),

as is the reported R squared. Those parameter values don't include

uncertainties. Do you have those?

I am reasonably confident in the uncertainties reported by lmfit. In

our tests with NIST standard test cases (some of which are not too

hard, but others can trip up fitting algorithms), we generally match

their certified values and uncertainties. I take that to mean our

codes cannot for this cannot be too far off.

When looking at the attached graph from Matlab, I am sort of surprised

that the confidence band seems so constant (basically model+/-1) over

the whole range of x. From the lmfit result that b2 (exponential

amplitude) is 6e-5 +/- 54e-5, I think that the uncertainty band really

ought to include y=0 at x=0.95. Like, b2=0 is not ruled out...

I would assume that Matlab is doing everything correctly. But I also

know there is a lot of user-contributed code, which might be slightly

less "well-vetted". I do not know if that applies here.

That said, I can believe that our eval_uncertainty() could have a problem.

Cheers,

> To view this discussion on the web visit https://groups.google.com/d/msgid/lmfit-py/141be0e7-8888-423d-b18c-f897da5f0e89n%40googlegroups.com.

Hmm, that's interesting. I should say that I have basically no

experience with Matlab and do not know what any of its functions do.

That said, I notice that the parameter values on the graph you

attached are remarkably close to those from lmfit (leastsq/MINPACK),

as is the reported R squared. Those parameter values don't include

uncertainties. Do you have those?

I am reasonably confident in the uncertainties reported by lmfit. In

our tests with NIST standard test cases (some of which are not too

hard, but others can trip up fitting algorithms), we generally match

their certified values and uncertainties. I take that to mean our

codes cannot for this cannot be too far off.

When looking at the attached graph from Matlab, I am sort of surprised

that the confidence band seems so constant (basically model+/-1) over

the whole range of x. From the lmfit result that b2 (exponential

amplitude) is 6e-5 +/- 54e-5, I think that the uncertainty band really

ought to include y=0 at x=0.95. Like, b2=0 is not ruled out...

I would assume that Matlab is doing everything correctly. But I also

know there is a lot of user-contributed code, which might be slightly

less "well-vetted". I do not know if that applies here.

That said, I can believe that our eval_uncertainty() could have a problem.

Cheers,

Dec 24, 2023, 11:04:02 AM12/24/23

to lmfi...@googlegroups.com

Hi Matt,

Thanks for the reply. This is very helpful.

Can you tell me the python code for how I to extract values of the following into arrays from the lmfit outputs:

- variance-covariance matrix of model parameters

- standard deviations of the model parameters

- correlation matrix of the model parameters

With these I will write a python function to apply the delta method using the lmfit outputs to see how that compares with eval_uncertainty and matlab and R

I also checked using the delta method in R and the results are very close to the matlab nlpredci, but with a little more spread in predicted y at low x values, though still fairly close to the range of observed y.

The nlpredci function is part of the official matlab product from mathworks

Below are the std err stats of the parameters from R (note that R also found nearly identical parameter values):

nlm2 <- nls(calc_rate ~ b1 + b2 * exp(b3 * ta2dic), hardclam, start = c(b1=7e-5, b2=-7e5, b3=-22))

summary(nlm2)

# > summary(nlm2)

# Formula: calc_rate ~ b1 + b2 * exp(b3 * ta2dic)

# Parameters:

# Estimate Std. Error t value Pr(>|t|)

# b1 6.954e-05 1.657e-05 4.196 0.000374 ***

# b2 -6.757e+05 5.435e+06 -0.124 0.902195

# b3 -2.222e+01 8.149e+00 -2.726 0.012325 *

# ---

# Signif. codes:

# 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

# Residual standard error: 3.068e-05 on 22 degrees of freedom

# Number of iterations to convergence: 4

# Achieved convergence tolerance: 2.123e-07

summary(nlm2)

# > summary(nlm2)

# Formula: calc_rate ~ b1 + b2 * exp(b3 * ta2dic)

# Parameters:

# Estimate Std. Error t value Pr(>|t|)

# b1 6.954e-05 1.657e-05 4.196 0.000374 ***

# b2 -6.757e+05 5.435e+06 -0.124 0.902195

# b3 -2.222e+01 8.149e+00 -2.726 0.012325 *

# ---

# Signif. codes:

# 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

# Residual standard error: 3.068e-05 on 22 degrees of freedom

# Number of iterations to convergence: 4

# Achieved convergence tolerance: 2.123e-07

Best regards, and Merry Xmas!

Greg

To view this discussion on the web visit https://groups.google.com/d/msgid/lmfit-py/CA%2B7ESbqiqUCK99-iYf_PoeVSa_1CeZGugxPZqftVW3tiWfUEqA%40mail.gmail.com.

Dec 27, 2023, 10:22:14 PM12/27/23

to lmfi...@googlegroups.com

Hi Greg,

You have sent a lot of messages about this over the past few days. It

was a holiday for some of us ;)

I've looked into this in more detail. I think I better understand what

is happening, though I am not entirely sure why. I'll also say that I

am not entirely convinced what "the right answer" is for this case. I

think the case here is approximately "pathological", in that there

appear to be 25 observations, but only 4 independent "x" values, and

the model uses 3 variables. And what you are looking to do is to

extrapolate outside the extremely limited data range. The

uncertainties should explode outside of your data range.

Part of the problem could be related to the confusion about the

degrees of freedom of the fit. Almost every analysis package will

conclude that there are 22 degrees of freedom when in fact there is 1.

But the biggest difference you are seeing in comparing lmfit to your

other software packages is due to the step size taken for the

derivative when calculating the Jacobian. By default, lmfit

ModelResult.eval_uncertainty() uses a step size of

`0.333*Parameter.stderr`. If this step is lowered to 0.01 or lower

(say, 0.001 to be "safe"), the results are much more consistent with

what you see from other packages. I think those other packages must

also be taking small steps for the derivatives. I did not look very

closely at your code, but it looks like it uses much smaller steps

too. I think that should not be necessary - and have found that it is

not needed in the cases I've looked at.

Anyway, we can certainly take smaller steps when calculating the

derivatives. I think that will improve consistency with other

approaches.

But I also kind of think those results are not super-reliable for this

data set.

There are sort of a lot of potential things to check here, so I put

these into a Jupyter notebook (attached, as a notebook and rendered

HTML). I also created a branch and PR that makes this change (and

added the notebook in the examples folder).

My conclusions are:

1. sure, let's make the default step size `0.001 * stderrr`

instead of `0.333 * stderr` when making the derivatives. That will

make the results more consistent with other tools, even for

pathological cases.

2. extrapolating uncertainties outside a data range is dangerous.

--Matt

You have sent a lot of messages about this over the past few days. It

was a holiday for some of us ;)

I've looked into this in more detail. I think I better understand what

is happening, though I am not entirely sure why. I'll also say that I

am not entirely convinced what "the right answer" is for this case. I

think the case here is approximately "pathological", in that there

appear to be 25 observations, but only 4 independent "x" values, and

the model uses 3 variables. And what you are looking to do is to

extrapolate outside the extremely limited data range. The

uncertainties should explode outside of your data range.

Part of the problem could be related to the confusion about the

degrees of freedom of the fit. Almost every analysis package will

conclude that there are 22 degrees of freedom when in fact there is 1.

But the biggest difference you are seeing in comparing lmfit to your

other software packages is due to the step size taken for the

derivative when calculating the Jacobian. By default, lmfit

ModelResult.eval_uncertainty() uses a step size of

`0.333*Parameter.stderr`. If this step is lowered to 0.01 or lower

(say, 0.001 to be "safe"), the results are much more consistent with

what you see from other packages. I think those other packages must

also be taking small steps for the derivatives. I did not look very

closely at your code, but it looks like it uses much smaller steps

too. I think that should not be necessary - and have found that it is

not needed in the cases I've looked at.

Anyway, we can certainly take smaller steps when calculating the

derivatives. I think that will improve consistency with other

approaches.

But I also kind of think those results are not super-reliable for this

data set.

There are sort of a lot of potential things to check here, so I put

these into a Jupyter notebook (attached, as a notebook and rendered

HTML). I also created a branch and PR that makes this change (and

added the notebook in the examples folder).

My conclusions are:

1. sure, let's make the default step size `0.001 * stderrr`

instead of `0.333 * stderr` when making the derivatives. That will

make the results more consistent with other tools, even for

pathological cases.

2. extrapolating uncertainties outside a data range is dangerous.

--Matt

Dec 27, 2023, 11:32:54 PM12/27/23

to lmfit-py

Hi Matt,

Thanks for the reply. This helps a lot. Yes, I am a bit pathological about this!

I completely agree with you that the data set I have been using is not the best for testing. I also agree that it would be good for you to switch to 0.001 stderr, that may solve the problem for the confidence interval with problematic data sets like mine

I have attached a Jupyter notebook with a much better data set using my python function for the delta-method. This data set is one of the R base package data sets. This is the data that Julien Chiquet used for his great lecture on how to apply the delta method (https://jchiquet.github.io/MAP566/docs/regression/map566-lecture-nonlinear-regression.html)

I found that my python delta method is able the exactly match the example that Julien Piquet presents using R in section 5.1 at the link above for both the confidence interval and the prediction interval.

I also attached a figure that compares the results of my python delta-method (panel a) with the lmfit eval_uncertainty function (panel b). I found that the eval_uncertainty method is practically the same result as the delta-method for the 95% confidence interval. However, it appears that the eval_uncertainty function presently does not provide an estimate of the prediction interval

I also attached a one-page description of what I mean by the difference between the confidence interval and the prediction interval so we are on the same page about definitions of those terms. The confidence interval only accounts for variance in the model parameters, and it represents the 95% envelope of where the curve is expected to be. The prediction interval adds the variability of the residuals from the observations to the variability of the parameters, and it represents the 95% envelope of where new predicted y-vales for any new observations will occur, including any that are within the range of the observed x values.

I don't think that lmfit currently has the functionality to estimate the prediction intervals, is that right? I wonder if you could add that fairly easily. For example maybe all you need to do as add the MSE to the parameter variance when you calculate the delta values....

Best regards,

Greg

Dec 28, 2023, 3:37:36 PM12/28/23

to lmfi...@googlegroups.com

Hi Greg,

If I understand correctly, the "prediction interval" is the quadrature

sum of the confidence_interval (`ModelResult.dely` returned by

`ModelResult.eval_uncertainty()`) and reduced chi-square, all them

multiplied by the same "scaling factor" for the sigma-level and

degrees of freedom of the fit:

scale = (stats.t.ppf((1+erf(sigma/np.sqrt(2)))/2.0, ndata-nvarys).

where `ndata` is the length of the data used in the fit. In

principle, we could add that to `eval_uncertainty()`. I think we

could add that. I would probably have that be set as a ModelResult

attribute `dely_predicted`, and have the return value still be only

`dely`. That is, after the calculation of df2

dely = scale * np_sqrt(df2)

dely_predicted = scale * np.sqrt(df2 + redchi)

--Matt

If I understand correctly, the "prediction interval" is the quadrature

sum of the confidence_interval (`ModelResult.dely` returned by

`ModelResult.eval_uncertainty()`) and reduced chi-square, all them

multiplied by the same "scaling factor" for the sigma-level and

degrees of freedom of the fit:

scale = (stats.t.ppf((1+erf(sigma/np.sqrt(2)))/2.0, ndata-nvarys).

where `ndata` is the length of the data used in the fit. In

principle, we could add that to `eval_uncertainty()`. I think we

could add that. I would probably have that be set as a ModelResult

attribute `dely_predicted`, and have the return value still be only

`dely`. That is, after the calculation of df2

dely = scale * np_sqrt(df2)

dely_predicted = scale * np.sqrt(df2 + redchi)

--Matt

Dec 28, 2023, 5:20:33 PM12/28/23

to lmfi...@googlegroups.com

Hi Matt,

Thanks for all of your work on this, it is very much appreciated

It is great news that you may have a fairly easy way to add the option of the output of either "prediction intervals" or "confidence intervals" or both. This would be a great addition to lmfit

I think you could use the analysis of of "prediction intervals" in the the Old Faithful data set by Julien Chiquet at this link as a test to see if you can reproduce his results:

Best

Greg

--

You received this message because you are subscribed to the Google Groups "lmfit-py" group.

To unsubscribe from this group and stop receiving emails from it, send an email to lmfit-py+u...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/lmfit-py/CA%2B7ESboTmyZuKi-1ZQzTgTwr5XBe34Eipz8nfd%3DqLCg10_m%2B8g%40mail.gmail.com.

Dec 29, 2023, 2:49:45 PM12/29/23

to lmfi...@googlegroups.com

Hi Greg,

Yes, the predicted interval is included in the code in the Pull

Request on Github (PR #933), and gives results that look very close to

the page you link to. I expect to merge this into the master branch

later today and hope to get a new release out in the next week or so.

> To view this discussion on the web visit https://groups.google.com/d/msgid/lmfit-py/CANW1RSxaUWdxS5116L-Hti827ftre4Eqm9z0uaXE5585jfbENA%40mail.gmail.com.

--

--Matt Newville <newville at cars.uchicago.edu> 630-327-7411

Yes, the predicted interval is included in the code in the Pull

Request on Github (PR #933), and gives results that look very close to

the page you link to. I expect to merge this into the master branch

later today and hope to get a new release out in the next week or so.

--

--Matt Newville <newville at cars.uchicago.edu> 630-327-7411

Dec 29, 2023, 8:13:12 PM12/29/23

to lmfit-py

Hi Matt,

This is excellent news! Thank you very much for your hard work making these additions.

Looking forward to trying it out

Best

Greg

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu