How to reduce the influence of outliers

AD

unread,

Dec 2, 2017, 9:09:40 AM12/2/17

to lmfit-py

Using scipy.optimize.least_squares(), I could use the parameter loss='huber' (for example) to reduce the influence of outliers.
How would I do this with lmfit.Model.fit()

Matt Newville

unread,

Dec 2, 2017, 10:44:26 AM12/2/17

to lmfit-py

On Sat, Dec 2, 2017 at 8:09 AM, AD <anton....@gmail.com> wrote:

Using scipy.optimize.least_squares(), I could use the parameter loss='huber' (for example) to reduce the influence of outliers.
How would I do this with lmfit.Model.fit()

You can pass the fitting method used and keywords to that method, as with:

result = model.fit(ydata, params, ..., method='least_squares', fit_kws={'loss':'huber'})

Allow me to restate my amazement that when using a "loss" keyword other than "linear", the scipy.optimize method named "least_squares" is then most definitely **not** doing a least-squares fit. Also, as it turns out, when using `least_squares` with its default `trf` method that is required for using non-least-squares losses (wow, does least_squares() have too many interdependent options!!!), uncertainties will not be estimated.

You could also use one of the scalar minimizers and give a "reduce_fcn" as one of the "fit_kws" entries to Model.fit(). This argument specifies how to turn the ndarray residual (data - model)*weights into a scalar value. There isn't a huber-like reduce function built-in, but you can provide your own, approximately as (warning: untested code):

def huber_loss(z):

out = z * 1.0

out[np.where(z > 1)] = 2*np.sqrt(z) - 1

return out

result = model.fit(ydata, params, ...., method='powell', fit_kws={'reduce_fcn': huber_loss})

I guess that huber loss function assumes the residual is properly weighted.

Again, that approach will not estimate uncertainties in the parameters.

--Matt

AD

unread,

Dec 3, 2017, 11:45:52 AM12/3/17

to lmfit-py

Thanks, that works. I presume the least_squares code got additional features and options over the years.

Matt Newville

unread,

Dec 3, 2017, 12:30:02 PM12/3/17

to lmfit-py

On Sun, Dec 3, 2017 at 10:45 AM, AD <anton....@gmail.com> wrote:

Thanks, that works.

Great to hear it worked. As I look at the untested code for the scalar minimizer, that should be returning `out.sum()`, not the nd-array `out`.

I presume the least_squares code got additional features and options over the years.

Actually, it all came at once, rather recently, and well after `leastsq` had been the only least-squares wrapper. It just came as a really poor design, and now we're stuck with it.

It sort of points out one of the basic problems of scipy and why the 'scikit' packages (of which lmfit is one) are a good idea: no one can really pay attention to all the moving parts of scipy (and to be clear, it does an outstanding job on handling the compiled code parts), and the highest-level user interface often suffers or is left with significant differences in API.

--Matt

AD

unread,

Dec 3, 2017, 2:54:08 PM12/3/17

to lmfit-py

I admit I only tested the first option! Just getting familiar with lmfit--so far so good.

Reply all

Reply to author

Forward