MatLab Robustfit

675 views
Skip to first unread message

Wesley Bowman

unread,
Jun 23, 2014, 9:21:26 AM6/23/14
to pystat...@googlegroups.com
Is there something in statsmodels that behaves exactly as MatLab's robustfit? I need the same functionalty, i.e. the same weighted functions that are used in this and produce the same results.

This comes about from my work translating some MatLab code into python, and it needs to have identical outputs since the MatLab code is widely used, and to get people to convert, it has to have identical functionality.

Thanks.

josef...@gmail.com

unread,
Jun 23, 2014, 9:46:59 AM6/23/14
to pystatsmodels
On Mon, Jun 23, 2014 at 9:21 AM, Wesley Bowman <wesley....@gmail.com> wrote:
Is there something in statsmodels that behaves exactly as MatLab's robustfit? I need the same functionalty, i.e. the same weighted functions that are used in this and produce the same results.

This comes about from my work translating some MatLab code into python, and it needs to have identical outputs since the MatLab code is widely used, and to get people to convert, it has to have identical functionality.

From the description in your linked documentation to robustfit:

I think it is the same algorithm as our RLM and both should produce the same results.
Maybe we don't have all of the same norms/weights, and the defaults might be different.

Another possible, but unlikely, difference might be in when the scale updating is done. That shouldn't matter in general, but the convergence behavior in some badly behaved cases might be different.

Also the behavior might be different if more than half of the observation are the same or on a straight line. (we can perfectly fit half of the observations.)  There is an open PR for that case, but I'm still not convinced what the right behavior is supposed to be.

The best for you and for us would be if you write some simple test cases, and we can compare whether we get the same results. If not, then it should be possible to add some options.
If we add the test cases to the unittests, then it would make it sure that we don't change it by accident.
The behavior in corner cases will still change, for regular cases RLM is stable.

I don't have matlab anymore, but this is essentially what I do when writing unit tests against Stata. 
I add options to make it possible to replicate Stata results, with a few intentional exceptions, and sometimes with different defaults.

Josef
 

Thanks.

josef...@gmail.com

unread,
Jun 23, 2014, 10:34:03 AM6/23/14
to pystatsmodels
reading the "small print" more carefully

two differences

"

The value r in the weight functions is

r = resid/(tune*s*sqrt(1-h))

where resid is the vector of residuals from the previous iteration, h is the vector of leverage values from a least-squares fit

"

RLM doesn't divide by sqrt(1-h) and I haven't seen that one before, but could be added as option.


scale=MAD 

"If there are p columns in X, the smallest p absolute deviations are excluded when computing the median."

We don't exclude any values from median calculations, and I dropped the median completely in the PR IIRC, assuming that the expected value of the residual is zero (which it is supposed to be).


Which I guess it means that you only get a few decimal agreement in regular cases, and different behavior corner or extreme cases.


What's the variance of [1, 1, 1, 1, 1, 1, 2, -1] ?

Do we have two outliers and variance=0, or should be have a strictly positive variance?

It's only really possible to guess from the context, but I'm still not sure what the default assumption should be.


Josef


 

Josef
 

Thanks.


Virgile Fritsch

unread,
Jun 23, 2014, 12:14:35 PM6/23/14
to pystat...@googlegroups.com
Hi,

I remember having seen a Matlab code implementing (Huber's) robust regression once. I think it was more or less corresponding to the paper [1] and I do not know about its relationship with Matlab's robustfit.
But one thing I remember well is that the scale update was done with a MAD (divided by a factor similar to sqrt(1 - h), where h reflects the diagonal entries of the covariance matrix of the design... but that part I do not remember exactly).

In all the implementations that I found and all the literature of applied statistics that I browsed, the scale update was not done according to Huber's recommendations. Only statsmodels allows to do it correctly.
Of course, I have not been exhaustive in my research.

Hope this helps.

Virgile

josef...@gmail.com

unread,
Jun 23, 2014, 12:41:50 PM6/23/14
to pystatsmodels
On Mon, Jun 23, 2014 at 12:14 PM, Virgile Fritsch <virgile...@gmail.com> wrote:
Hi,

I remember having seen a Matlab code implementing (Huber's) robust regression once. I think it was more or less corresponding to the paper [1] and I do not know about its relationship with Matlab's robustfit.
But one thing I remember well is that the scale update was done with a MAD (divided by a factor similar to sqrt(1 - h), where h reflects the diagonal entries of the covariance matrix of the design... but that part I do not remember exactly).

Virgile, what is "paper [1]" 

h is the diagonal of the hat/projection matrix.
We use it in several other places, because it has better small sample properties for the residuals. But I haven't seen it in the context of robust M-estimators. 
 

In all the implementations that I found and all the literature of applied statistics that I browsed, the scale update was not done according to Huber's recommendations. Only statsmodels allows to do it correctly.
Of course, I have not been exhaustive in my research.


One problem with robust estimation is that there are too many choices/options. The other problem is that we need a reference model to interpret outliers, which is by default the normal distribution.
(and data like [1, 1, 1, 1, 1, 1, 2.5, -1.5]  definitely doesn't look "normal", maybe someone rounded too much.)

For example we only have one Norm for the M-estimation of scale, Huber scale. I thought about replicating RLM to do the same for scale estimation that allows different "scale norms".

Josef

Virgile Fritsch

unread,
Jun 23, 2014, 1:05:51 PM6/23/14
to pystat...@googlegroups.com
Sorry, here is the missing reference:
Mark Woolrich, Robust group analysis using outlier inference, NeuroImage, Volume 41, Issue 2, June 2008, Pages 286-301, ISSN 1053-8119, http://dx.doi.org/10.1016/j.neuroimage.2008.02.042.
(http://www.sciencedirect.com/science/article/pii/S1053811908001778)

At the time I did my bibliographic work, I was surprised by the fact that people tend to use a version o
f the algorithm that has never been shown to converge, even though I understand the theoretical version (with the right scale update) is a bit trickier.
This is always the same problem with robustness: you have to be ready to loose something somewhere in order to avoid some misqtakes in some cases. But this is far too general and can be applied in so many contexts that I think it is great to have tools like statsmodels that implements (almost) everything :)
It helps finding what suits best for a given application.

Sturla Molden

unread,
Jun 24, 2014, 12:04:47 AM6/24/14
to pystat...@googlegroups.com
<josef...@gmail.com> wrote:

> One problem with robust estimation is that there are too many
> choices/options.

There is just one, integrate over nuisance parameters :-D

> The other problem is that we need a reference model to
> interpret outliers,

Not a problem for a Bayesian :-D

> which is by default the normal distribution.
> (and data like [1, 1, 1, 1, 1, 1, 2.5, -1.5] definitely doesn't look
> "normal", maybe someone rounded too much.)

Which is why scientistis should start to use GLM or GNMs when the
least-squares
model is inadequate.

In the 1930s the linear least-squares model were the only tractable, when
the available tools were
slipsticks and mechanical calculators (often with female operators called
"computers"). Today, we have desktop computers with (in comparison)
infinite computing power. It is really a shame that the applied statistics
we teach to most students is stuck in the 1930s, or so it seems. There is a
universe of possibilities beyond ANOVA and linear regression today. We
don't have to model the world as linear and normally distributed.

Sturla

josef...@gmail.com

unread,
Jun 24, 2014, 7:59:54 AM6/24/14
to pystatsmodels
On Tue, Jun 24, 2014 at 12:04 AM, Sturla Molden <sturla...@gmail.com> wrote:
<josef...@gmail.com> wrote:

> One problem with robust estimation is that there are too many
> choices/options.

There is just one, integrate over nuisance parameters :-D

> The other problem is that we need a reference model to
> interpret outliers,

Not a problem for a Bayesian :-D

Even more so for a Bayesian, which **always** needs to have a likelihood, (and a prior) and the entire analysis is committed to it being correctly specified.

We are just doing M-estimation, which gives us correct standard errors even if the norm/likelihood is misspecified.
IIRC all we need is a symmetric "inlier distribution" to get consistent estimates.

Users can throw things at it whether the outliers are from heavy or long tails or recording errors or come from a mixture distribution.
If we know that you have a mixture distribution with given likelihood functions, then we can write the MLE for it or use scikit-learn GMM (not our GMM).

 

> which is by default the normal distribution.
> (and data like [1, 1, 1, 1, 1, 1, 2.5, -1.5]  definitely doesn't look
> "normal", maybe someone rounded too much.)

All I'm saying is that users need to decide, and that the default options that we, or other packages, pick, might not be the most appropriate one. 
IIRC, in my pull request, scale = MAD gives you perfect fit with variance is zero in this example. If scale=Huber, then the standard deviation will be strictly positive.
But I have no idea which is better for a sample that the user throws at the estimator, when I don't know anything about the context.

 

Which is why scientistis should start to use GLM or GNMs when the
least-squares
model is inadequate.

What's GLM or GNM here?

 


In the 1930s the linear least-squares model were the only tractable, when
the available tools were
slipsticks and mechanical calculators (often with female operators called
"computers"). Today, we have desktop computers with (in comparison)
infinite computing power. It is really a shame that the applied statistics
we teach to most students is stuck in the 1930s, or so it seems. There is a
universe of possibilities beyond ANOVA and linear regression today. We
don't have to model the world as linear and normally distributed.

Most of robust statistic has been developed in the last 20 to 30 years.  (although the first Huber M-estimator article is 1964)

However, Just because something has been invented a long time ago, doesn't imply that it's not useful anymore.

The point of RLM is that it's not "Least-Squares", but it's least-something.

Josef
 

Sturla


Sturla Molden

unread,
Jun 24, 2014, 8:38:02 AM6/24/14
to pystat...@googlegroups.com
<josef...@gmail.com> wrote:

> What's GLM or GNM here?

Generalized (non)linear models (cf. glm and gnm in R).

josef...@gmail.com

unread,
Jun 24, 2014, 3:32:18 PM6/24/14
to pystatsmodels
We have GLM, and I was just trying to figure out which bonus features we want to add to it.
But, I've never seen an example for generalized nonlinear models, but it would be relatively easy to add with a bit of work.

But they won't make RLM obsolete, and I had started to look at how to add (outlier) robust estimation to GLM and the discrete Models.

Josef

Sturla Molden

unread,
Jun 24, 2014, 7:31:19 PM6/24/14
to pystat...@googlegroups.com
<josef...@gmail.com> wrote:

> We have GLM, and I was just trying to figure out which bonus features we
> want to add to it.
> But, I've never seen an example for generalized nonlinear models, but it
> would be relatively easy to add with a bit of work.

GNM and GAM can be used on many of the same problems, but GNM allows us to
specify the function instead of just using a lowess (or loess?) smoother.

For example we can have repeated measures over a period of time, and we
might know that the shape of the response against time curve should be
logistic. Many researchers i biology and medicine will routinely use
repeated-measures ANOVA to analyse these data sets, i.e. they will look for
a group x time interaction, but the model is usually misspecified. GNMs can
deal with these problems much more correctly.

> But they won't make RLM obsolete, and I had started to look at how to add
> (outlier) robust estimation to GLM and the discrete Models.

Since they allow us to specify a probability family, they can better handle
a long-tailed error distribution.

Sturla

josef...@gmail.com

unread,
Jun 24, 2014, 8:10:04 PM6/24/14
to pystatsmodels
On Tue, Jun 24, 2014 at 7:31 PM, Sturla Molden <sturla...@gmail.com> wrote:
<josef...@gmail.com> wrote:

> We have GLM, and I was just trying to figure out which bonus features we
> want to add to it.
> But, I've never seen an example for generalized nonlinear models, but it
> would be relatively easy to add with a bit of work.

GNM and GAM can be used on many of the same problems, but GNM allows us to
specify the function instead of just using a lowess (or loess?) smoother.

For example we can have repeated measures over a period of time, and we
might know that the shape of the response against time curve should be
logistic. Many researchers i biology and medicine will routinely use
repeated-measures ANOVA to analyse these data sets, i.e. they will look for
a group x time interaction, but the model is usually misspecified. GNMs can
deal with these problems much more correctly.

Ok, that sounds like a useful application.
For now, we would have to use polynomials or splines which are linear combinations in some basis functions.

 

> But they won't make RLM obsolete, and I had started to look at how to add
> (outlier) robust estimation to GLM and the discrete Models.

Since they allow us to specify a probability family, they can better handle
a long-tailed error distribution.

Ok, depends on what you assume you know about the underlying distribution, and about the outliers.

Large parts in the literature about robust GLM or discrete models are about robustness to "x-outliers", to influential observations. We don't have anything yet (in master) for those cases, not even for linear.

There is a TLinearModel in statsmodels for heavy tailed distributions, but I don't think (m)any of the GLM families allow for heavy tails.

For example what would you use for Poisson with a few outliers (large y)?

Some of the RLM norms correspond to a likelihood model, not just least squares.
Soon we will need some overview documentation to compare similar but not identical models.

(RLM calculates the standard errors under the assumption that the norm/loss/likelihood is misspecified.
GLM assumes correctly specified heteroscedasticity and no correlation for the standard errors, but that will soon change.)

Josef
 

Sturla


Sturla Molden

unread,
Jun 25, 2014, 1:00:51 AM6/25/14
to pystat...@googlegroups.com
On 25/06/14 02:10, josef...@gmail.com wrote:

> For example what would you use for Poisson with a few outliers (large y)?

Poisson is a bit tricky. I would probably use a mixture of two Poissons.
One would model the noise as a nuisance parameter. And then the mixing
proportions for each point (1-noise and noise) would also be nuisance
parameters (or latent variables, depending on perspective).

This is actually conceptually similar to the re-weighting we do to fit
M-estimators. You can consider the weights as nuisance parameters.

If it were a Gaussian a rough method is just to use a t-distribution
instead.

Sturla


Sturla Molden

unread,
Jun 25, 2014, 1:24:00 AM6/25/14
to pystat...@googlegroups.com
On 25/06/14 07:00, Sturla Molden wrote:

> Poisson is a bit tricky. I would probably use a mixture of two Poissons.
> One would model the noise as a nuisance parameter. And then the mixing
> proportions for each point (1-noise and noise) would also be nuisance
> parameters (or latent variables, depending on perspective).

I would probably also use a Beta(0.5,0.5) prior for the noise parameter,
just to make sure it is unlikely to be 0.5, but rather something closer
to 0 or 1.

There is a similar example here, jut Jake Vanderplas does not use a
prior for the noise parameter, which is the same as saying its prior is
uniform over [0,1].

http://jakevdp.github.io/blog/2014/06/06/frequentism-and-bayesianism-2-when-results-differ/


Sturla


Reply all
Reply to author
Forward
0 new messages