Definition of Turkey Biweight (robust fitting)

Keld Lundgaard

unread,

Jun 14, 2013, 3:09:54 PM6/14/13

to pystat...@googlegroups.com

Dear developers of StatsModels

I a PhD candidate working at SLAC (Stanford) developing exchange correlation functionals.

For this wanted to have a python implementation of a ridge regression with a robust MM-estimator [Maronna 2011], and I have taken inspiration in the StatsModels code for this.

Here is my most urgent question:

Where do you have your definition of Turkey bisquare rho-function from?

- It is different from the one of [Maronna 2011], so I am a little confused about it.

- Unfortunately I do not have readily access to the books you are referring to in the header of norms so it is hard for me to check.

Cheers,

Keld

Maronna 2011 : doi:10.1198/TECH.2010.09114

josef...@gmail.com

unread,

Jun 14, 2013, 3:51:59 PM6/14/13

to pystat...@googlegroups.com

Skipper said that he used mainly the Huber book when he rewrote RLM.

I just checked with Yohai's book, the psi function is the same,
however the rho function for TukeyBiweight differs by a constant and a
scaling factor.

I don't know why there are the differences, but they are irrelevant
for optimization, but maybe not if you add a penalization term.

In Huber's book I can find right now only the psi function which is
also the same.

Josef

josef...@gmail.com

unread,

Jun 14, 2013, 4:17:30 PM6/14/13

to pystat...@googlegroups.com

because the constant 1 is dropped, the cutoff part outside of c is
zero, compared to Yohai and Maronna.

As far as I can see from a code search, the rho function is only used
in deviance and I didn't find any tests for the deviance.
deviance is only used in the convergence check, and for that usage
scaling and constant are also irrelevant.

optimization, and parameter estimation relies on psi and weights, I think.

robust RLM was (re)written by Skipper in his first GSOC and hasn't
been worked on much since then.
We haven't heard of any problems, but there could still be some in
code that is not usually used.

If you figure out Ridge M or MM, then you could also think about
donating the code, in case you make it open source. I'm sure there
will be wider interest for it.

Josef

josef...@gmail.com

unread,

Jun 14, 2013, 4:43:10 PM6/14/13

to pystat...@googlegroups.com

If I read that correctly, that is

here is another version
http://research.microsoft.com/en-us/um/people/zhang/INRIA/Publis/Tutorial-Estim/node24.html
with c**2 / 6 factor but also with the constant 1, but it has the
factor in the outside range.

Josef

Keld Lundgaard

unread,

Dec 20, 2013, 6:56:23 PM12/20/13

to pystat...@googlegroups.com

Hi Joseph!

Thank you very much for your earlier replies.

I am in the progress of writing a paper using the code that I have written with MM estimator combined with RR and bootstrap resampling.

After I have finished this work in the early 2014, I will be able to look into how to share the code.