Uncertainties and reduced chi-sq

168 views
Skip to first unread message

Ulla Vainio

unread,
Feb 12, 2021, 8:30:14 AM2/12/21
to lmfi...@googlegroups.com
Hi Everyone,

Thanks for this great library and the new releases. Lmfit enables
tremendous flexibility with fitting for me.

Now I have encountered a problem with reduced chi square when
switching from 0.9.15 to 1.0.1 and 1.0.2. I cannot share very easily a
minimal example, but I try to explain the problem I'm seeing if that's
ok.

With 0.9.15 I did not notice that my uncertainties were actually never
calculated for certain fits, like this but this is not what concerns
me at this point:
[[Fit Statistics]]
# fitting method = least_squares
# function evals = 6
# data points = 742
# variables = 7
chi-square = 33777.2497
reduced chi-square = 45.9554417
Akaike info crit = 2847.09960
Bayesian info crit = 2879.36504
## Warning: uncertainties could not be estimated:
energy_d: at initial value
[[Variables]]
scaling: 1 (fixed)
energy_a: 0.9993022 (fixed)
energy_b: -0.06663236 (fixed)
energy_c: -0.02071173 (fixed)
energy_d: 0.05423630 (init = 0.0542363)


Now with 1.0.1 and 1.0.2, I get reduced chi square -inf every now and
then (but now always!), so uncertainties are now calculated as before,
but reduced chi square is -inf:
[[Fit Statistics]]
# fitting method = least_squares
# function evals = 101
# data points = 1
# variables = 7
chi-square = 1.000e-250
reduced chi-square = -inf
Akaike info crit = -561.646273
Bayesian info crit = -575.646273
## Warning: uncertainties could not be estimated:
energy_d: at initial value
[[Variables]]
scaling: 1 (fixed)
energy_a: 0.9993022 (fixed)
energy_b: -0.06663236 (fixed)
energy_c: -0.02071173 (fixed)
energy_d: 0.05423630 (init = 0.0542363)

The "data points" has suddenly become 1 instead of 742 which was the
number of points in the fit, and number of evaluations has gone to
maximum. The fit result does not suffer but remains exactly the same,
but now because data points are 1, the reduced chi square becomes -Inf
and that is a problem for me because I'd like to see the actual
reduced chi square value.

The weird thing is that here in both cases the maximum number of
evaluations was set to 100. When I increase it to 200, then 0.9.15 and
1.0.1 perform the same, no -inf seen in chi square and number of
evaluations is 7.

So I am just wondering, why does the number of fit evaluations and
also the error calculation depend so strongly on the max number of
evaluations? And why does the number of data points suddenly become 1
instead of the original number?

Kind regards,

Ulla Vainio

Matt Newville

unread,
Feb 12, 2021, 11:19:52 AM2/12/21
to lmfit-py
Hi Ulla, 

Hm, that does seem odd.  Without seeing example code, it is very hard to guess, but it must have something to do with the number of data points being 1 instead of 742.  Is the data a simple 1-D numpy array or some other format or source?  It also seems strange to me that chi-square is 1.e-250, but maybe that is actually related too.  

I guess, what I would say is to try to figure out why the number of data points is 1...

--Matt




--
You received this message because you are subscribed to the Google Groups "lmfit-py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lmfit-py+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lmfit-py/CALWqSNLqBjQFF61d2O5A9cTqCTnkrYXfWG3HygBe%3D%3D9rRYRAng%40mail.gmail.com.


--
--Matt Newville <newville at cars.uchicago.edu> 630-327-7411

Ulla Vainio

unread,
Feb 13, 2021, 1:15:46 AM2/13/21
to lmfi...@googlegroups.com
Hi Matt,

The data points are in a 1D numpy array. I checked the number of data
points which go into the fit and they seemed to be correct. Maybe if
the fit fails, the uncertainty calculation uses in certain cases wrong
parameters to get the 'ndata' from. If I read the
ModelResult.eval_uncertainty correctly, 'ndata' seems to be defined in
two different ways.

The fitted parameters values of the result are always as expected
irrespective of what the uncertainty calculation is showing, so this
seems to be only a problem in the uncertainty calculation.

But I managed to fix my code now so that it does not run into this
problem so easily by removing some parameters which are actually not
affecting the fit. This also meant using the same max_nfev works
again.

I can try to create a set of code which would make the error happen
again but it can take some time because this was actually pretty hard
to spot since it happens only sometimes. So when there are 30 similar
datasets with nearly identical values, it may happen in 7 of them. So
maybe for it to happen it needs just the right number of variables to
not change at all from initial value or something like that.

Have a good weekend!

Kind regards,

Ulla

pe 12. helmik. 2021 klo 18.19 Matt Newville
(newv...@cars.uchicago.edu) kirjoitti:
> To view this discussion on the web visit https://groups.google.com/d/msgid/lmfit-py/CA%2B7ESbqb4SxVL%2BS%2BGkvZaFwVLLzFsYBZh-9pOsBKb4645CyAaQ%40mail.gmail.com.

Ulla Vainio

unread,
Feb 13, 2021, 3:15:01 PM2/13/21
to lmfi...@googlegroups.com
Hi again Matt,

What got me talking about eval_uncertainty... The weird thing with
ndata happens of course at _calculate_statistics() in class
MinimizerResult. Now I understand why you asked about the numpy array.
In that function ndata is set to 1 if residual is not a numpy array.
Is it possible that the residual is not calculated if number of max
evaluations is reached and will it then be left in it's default value
"None"?

Is it a desired feature or a bug? In principle, it's maybe nice to get
reduced chi square even if fit fails. But if you want to be really
strict and want people to notice the failing of the fit, then
returning -Inf as reduced chi square is actually a pretty good way to
get the needed attention.

Kind regards,

Ulla

Matt Newville

unread,
Feb 13, 2021, 5:10:07 PM2/13/21
to lmfit-py
Hi Ulla,

On Sat, Feb 13, 2021 at 2:15 PM Ulla Vainio <ulla....@gmail.com> wrote:
Hi again Matt,

What got me talking about eval_uncertainty... The weird thing with
ndata happens of course at _calculate_statistics() in class
MinimizerResult. Now I understand why you asked about the numpy array.
In that function ndata is set to 1 if residual is not a numpy array.
Is it possible that the residual is not calculated if number of max
evaluations is reached and will it then be left in it's default value
"None"?

 
Sorry, but I really do not understand what you talking about.  You have not posted any code that might point to a problem.

It does seem like maybe whatever problem you are having is related to hitting max_nfev function evaluations.  If so, why are you setting that? 

But really, no kidding, you will have to post a minimal but complete example if you want us to be able to help you.

--Matt

Ulla Vainio

unread,
Feb 14, 2021, 2:24:10 AM2/14/21
to lmfi...@googlegroups.com
Hi Matt,

I'm setting the max_nfev because my function evaluations used to take
very long, so I wanted to be sure that I'm not stuck for hours with
some data. But maybe it was also partially because I was fitting so
many parameters which did not change, so this finally caught that bug
in my code, thanks.

But I actually just wanted to know if it is intended behaviour that if
max_nfev is reached, then residual is not calculated? Because in your
commit https://github.com/lmfit/lmfit-py/commit/98ec309cf7dbacba2e23185a3ddfbf1345251441
reaching max_nfev has become AbortFitException which is then treated
partially the same way as user abort, for which statistics are not
calculated, if I understand correctly.

The reason I'm asking is because in previous versions of lmfit
(0.9.15) reaching maximum number of evaluations did not have this
effect on the statistics calculation. So was it intended as an
improvement?

What this really shows to me is that it's too easy to use some code
wrong if there are no real consequences for using it wrong. A tiny
warning is easily ignored especially in automated data processing, but
the -inf in reduced chi square really got me to find the reason why
I'm reaching the max_nfev sometimes.

Thanks again for developing this great package and keep up the good
work even if the users are not always so great!

Kind regards,

Ulla


su 14. helmik. 2021 klo 0.10 Matt Newville
(newv...@cars.uchicago.edu) kirjoitti:
> --
> You received this message because you are subscribed to the Google Groups "lmfit-py" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to lmfit-py+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/lmfit-py/CA%2B7ESbp%3DDBZjpaETLrmBhtXfCZUQ%2BsYi9Cm7DsSyDp6ysCQSpw%40mail.gmail.com.

Matt Newville

unread,
Feb 14, 2021, 8:40:50 AM2/14/21
to lmfit-py
Ulla,


On Sun, Feb 14, 2021 at 1:24 AM Ulla Vainio <ulla....@gmail.com> wrote:
Hi Matt,

I'm setting the max_nfev because my function evaluations used to take
very long, so I wanted to be sure that I'm not stuck for hours with
some data. But maybe it was also partially because I was fitting so
many parameters which did not change, so this finally caught that bug
in my code, thanks.

But I actually just wanted to know if it is intended behaviour that if
max_nfev is reached, then residual is not calculated? Because in your
commit https://github.com/lmfit/lmfit-py/commit/98ec309cf7dbacba2e23185a3ddfbf1345251441
reaching max_nfev has become AbortFitException which is then treated
partially the same way as user abort, for which statistics are not
calculated, if I understand correctly.

If a fit exceeds the maximum number of function evaluations, the fit is not successful and may be left in a non-optimal and inconsistent state.  By setting max_nfev, the user is explicitly directing the program to stop working immediately if it exceeds that number of function evaluations. 

Again, if you do not provide example code, we can not provide much help.  If you continue to consistently provide no example code despite being told repeatedly to do so, we will not provide help.

--Matt
Reply all
Reply to author
Forward
0 new messages