Small mistake in the documentation

46 views
Skip to first unread message

Andres Zelcer

unread,
Sep 20, 2021, 9:16:12 AM9/20/21
to lmfit-py
Hi,
    I think I've found a silly mistake in the example presented in the introduction to lmfit (https://lmfit.github.io/lmfit-py/intro.html). There, a synthetic data including noise is generated to use as an example. The noise is taken from a normal distribution with a 0.2 width and kept in a variable called 'eps_data'. In the residual function, the difference between the data and the model is not divided by the "typical uncertainty" but by 'eps_data', so each point has its own (and exact) noise. If you agree, I'll make the appropriate changes and issue a pull request.

           Thank you,

                    Andrés

Renee Otten

unread,
Sep 20, 2021, 9:35:43 AM9/20/21
to lmfi...@googlegroups.com
Hi Andres, 


I am not sure what you mean with “typical uncertainty”, but the residual function returns "return (data-model) / eps_data”, where “eps_data” is the uncertainty in the experimental value. so to me that seems all correct. What do you think it should be?

Best,
Renee

 

--
You received this message because you are subscribed to the Google Groups "lmfit-py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lmfit-py+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lmfit-py/fe3c9bcd-c3ae-40ae-a6e9-e3f1b1e78edcn%40googlegroups.com.

Andres Zelcer

unread,
Sep 20, 2021, 11:24:30 AM9/20/21
to lmfit-py
Hi Renee
   I think it should be 0.2. The value of "eps_data" is *not* the uncertainty in the experimental value (as it should be), but the generated random noise for each point. In other words, when using the same parameters used to generate the synthetic data, all the residuals would be 1 instead of random.

    Best regards,

                  Andrés

Renee Otten

unread,
Sep 20, 2021, 12:24:04 PM9/20/21
to lmfi...@googlegroups.com
Yes, since we are “simulating” an experiment here, “eps_data” is random noise for each point and that corresponds to the uncertainty you would have in your measured experimental value (if you would have done the experiment). Again, to me there doesn’t seem to be anything wrong with it.

Matt Newville

unread,
Sep 20, 2021, 12:50:36 PM9/20/21
to lmfit-py
Hi Andres, 

Sorry, I don't understand what you mean.  What is the change you propose?  From what to what?  

In that example, the data is generated, and the uncertainty at each data point is generated.  Part of the intention of the example is to show that one can pass in an array of uncertainty values. 

--Matt


--
You received this message because you are subscribed to the Google Groups "lmfit-py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lmfit-py+u...@googlegroups.com.
Message has been deleted

Matt Newville

unread,
Sep 21, 2021, 11:54:34 AM9/21/21
to lmfit-py
Hi Andres, 

Ah, yes, those values should not be negative!, and I think I now better understand your point.


On Tue, Sep 21, 2021 at 8:31 AM Andres Zelcer <zelcer...@gmail.com> wrote:
Hi Matt.
   I hadn't noticed that the intention was to show that one can pass an array of uncertainty values.
   Maybe I am missing something, but what I find strange of using eps_data both as noise and as uncertainty is:

Well, those are the same thing, at least to my way of thinking.  We want to weight the difference between the model and data by the noise in the data.
a) The uncertainty values are random. Every measuring device I know has a well defined (I do not mean constant) uncertainty.

Uh, well, there will be fluctuations and many measurements have noise that scales with intensity ("square root of intensity" is fairly common for pulse counting measurements), or may scale with measurement time or other external factors.  So, there can be some variation in the uncertainty for a set of measurements.  Granted, most of the time you will know an "uncertainty scale" well before you know individual uncertainties, but ometimes, people know those uncertainties pretty well.
 
b) Some uncertainty values are negative (eps_data is distributed around 0)

That is a problem! and should be fixed.  
c) For each point, the values of the noise and uncertainty are *exactly* the same


Yeah, we're making up a case where we know what the uncertainties are.  It seems completely reasonable to have a different array for the uncertainties so that it is not exactly the same as the noise we add. 


I'd expect the residuals to be normally distributed around zero, but the output from this snippet is a uniform array of ones :
    # Initialize the parameters with the values used to generate the simulated data
    params = Parameters()
    params.add('amp', value=7.5)
    params.add('decay', value=0.01)
    params.add('phase', value=2.5)
    params.add('frequency', value=0.22)
    res = residual(params, x, data, eps_data)
    print("Sum of residuals ", res.sum())
    print(res)

Supposing the data would have been obtained from an instrument that has a constant-uncertainty, the change I propose would be to replace these two lines:
    eps_data = random.normal(size=x.size, scale=0.2)
    data = 7.5 * sin(x*0.22 + 2.5) * exp(-x*x*0.01) + eps_data
by these three lines:
     uncertainty = 0.2
     noise = random.normal(size=x.size, scale=uncertainty)
     data = 7.5 * sin(x*0.22 + 2.5) * exp(-x*x*0.01) + noise
And the line:
     out = minimize(residual, params, args=(x, data, eps_data))
by:
    out = minimize(residual, params, args=(x, data, uncertainty))

I'd have to think of a better example in order to show that one can pass an array of uncertainty values.

It could be something like (or better suggestions welcome):

     from numpy import random
     noise = random.normal(size=x.size, scale=0.2)
     uncertainty = abs(0.16 + random.normal(size=x.size, scale=0.05))

and use `noise` to add to data and `uncertainty` for the weighting?   

I think that would help the explanation and still allow that `uncertainty` can be an array with different values per measurement.   If you are up for making a PR that improves that more, and for adding some explanation to the doc, that would be great!

     Best regards,

    Andrés

Renee Otten

unread,
Sep 22, 2021, 12:57:00 AM9/22/21
to lmfi...@googlegroups.com
Right, adding the random noise/uncertainty to the data is fine (and we do that in a lot of the examples to generate synthetic data) and having random/non-constant uncertainties in experimental data is very common. But I overlooked the fact that we were using the same noise here to scale our residual function… so yes, you are right about that: these values shouldn’t be negative… We should change it as Matt suggested soonish so that it will make it in the next release.

Best,
Renee


Reply all
Reply to author
Forward
0 new messages