LTS - least trimmed squares - robust estimation

josef...@gmail.com

unread,

Jul 18, 2012, 6:01:27 PM7/18/12

to pystatsmodels

I started to look at robust estimation again.

I implemented a LTS estimation, which drops the most likely outliers
from the regression. The advantage of LTS is that it is robust to
having almost 50% of the observations being outliers.

The problem is that to get an exact solution, we would have to check a
large number of permutations of possible outliers, and, therefore,
there are way too many possibilities for moderate sample size. To be
able to handle larger datasets approximate random solutions are used.
However, there are a large number of competing approximate algorithms.

My question right now is which LTS algorithms to implement. Does
anyone know of a good reference for what would be currently considered
to be the best algorithm?

right now I think I have implemented (based on second hand references)
Rousseeuw PJ, van Driessen K (2006) Computing LTS regression for large
data sets. Data Min Knowl Discov 12:29–45
available as a working paper with google searsh.

LTS has a high breakdown point (which means it's robust to a larger
number of outliers) but low efficiency (which means the estimates are
not very good if there are no outliers). But it should be useful as a
starting point for robust regression both in the linear and non-linear
(GSOC project of Divyanshu) case.

Joawd

Skipper Seabold

unread,

Jul 18, 2012, 6:06:46 PM7/18/12

to pystat...@googlegroups.com

Heh, I implemented fast LTS last week. I wrote an e-mail, but didn't
send it. I agree it's a good starting point, but we need
MM-estimators.

Decent starting point paper

http://creates.au.dk/fileadmin/site_files/filer_oekonomi/subsites/creates/Seminar_Papers/2011/ELTS.pdf

R code for advanced robust estimators (newer than the 60s...) is in
package robustbase

Recipe for random subset indices you might want to use

import random
import numpy as np
def random_subset(sample, k, niter=1000, seed=12345):
"""
Returns a generator
"""
random.seed(seed)
niter = min(int(np.round(comb(len(sample), k))), niter)
for i in range(niter):
yield random.sample(sample, k)

Skipper

Skipper Seabold

unread,

Jul 18, 2012, 6:20:18 PM7/18/12

to pystat...@googlegroups.com

On Wed, Jul 18, 2012 at 6:06 PM, Skipper Seabold <jsse...@gmail.com> wrote:
> On Wed, Jul 18, 2012 at 6:01 PM, <josef...@gmail.com> wrote:
>> I started to look at robust estimation again.
>>
>> I implemented a LTS estimation, which drops the most likely outliers
>> from the regression. The advantage of LTS is that it is robust to
>> having almost 50% of the observations being outliers.
>>
>> The problem is that to get an exact solution, we would have to check a
>> large number of permutations of possible outliers, and, therefore,
>> there are way too many possibilities for moderate sample size. To be
>> able to handle larger datasets approximate random solutions are used.
>> However, there are a large number of competing approximate algorithms.
>>
>> My question right now is which LTS algorithms to implement. Does
>> anyone know of a good reference for what would be currently considered
>> to be the best algorithm?
>>
>>
>> right now I think I have implemented (based on second hand references)
>> Rousseeuw PJ, van Driessen K (2006) Computing LTS regression for large
>> data sets. Data Min Knowl Discov 12:29–45
>> available as a working paper with google searsh.
>>

Actually re-reading your e-mail. Yes, I think this is the current algo
used AFAIK. Referenced in the high level paper I sent.

josef...@gmail.com

unread,

Jul 18, 2012, 6:40:10 PM7/18/12

to pystat...@googlegroups.com

Thanks for the reference, Doornik is always good (trying to get the
best algorithms for his statistical package, Ox).
I will go through it tomorrow.

my current version for random sub-sampling without replacement (using
boolean indexing)

def subsample(n, k, max_nrep=20):
idx = np.ones(n, bool)
idx[:(n-k)] = False
for i in xrange(max_nrep):
np.random.shuffle(idx)
yield idx

I'm getting good results for the stackloss data (same result as 1_ in
a few iterations), but I just started and don't have the difficult
cases yet.

1_ Douglas M. Hawkins, 1994, The feasible solution algorithm for least
trimmed squares regression

I looked briefly at robustbase::nlrob as background for Divyanshu's work.

Josef

>>
>> Skipper

Skipper Seabold

unread,

Jul 18, 2012, 6:43:58 PM7/18/12

to pystat...@googlegroups.com

I still need to do some cleanup. I will put these in the statsmodels
org repo, but...

https://github.com/jseabold/tutorial/blob/master/robust_models.ipynb
https://github.com/jseabold/tutorial/blob/master/robust_models.py

Skipper

PS.

Going through pysal now for spatial econometrics. Serge said y'all
have spoken before. There's significant overlap and we would greatly
benefit from collaborating on parts. I git a svn2git pass over their
code.

https://github.com/jseabold/pysal/

Skipper

josef...@gmail.com

unread,

Jul 18, 2012, 7:12:20 PM7/18/12

to pystat...@googlegroups.com

looks good, is there any LTS in there? I didn't see any.
I need to run it properly to figure out the details.

It's good to get a tutorial for the different parts of statsmodels.

>
> Skipper
>
> PS.
>
> Going through pysal now for spatial econometrics. Serge said y'all
> have spoken before. There's significant overlap and we would greatly
> benefit from collaborating on parts. I git a svn2git pass over their
> code.
>
> https://github.com/jseabold/pysal/

They started to expand the econometrics part in the spring release,
especially the spatial GMM. But I haven't looked at it for a few
months. Potentially a big overlap, and they have quite a lot of
resources, but statsmodels doesn't have a lot of panel/spatial data
analysis yet, and pysal didn't have a lot of econometrics yet the last
time I looked.
(Their BSD license makes it one of my favorite packages to keep track
off. A big push for econometrics and economic data analysis in
python.)

Keep us updated if you see something interesting.

Josef

>
> Skipper

justin

unread,

Jul 20, 2012, 3:12:42 AM7/20/12

to pystat...@googlegroups.com

Hi group,

I have a non-Empirical likelihood question realted to my GSOC project.ï¿½ It is mostly theoretical.

The model I am working with is an AFT; it is a model with random right censoring:

http://en.wikipedia.org/wiki/Accelerated_failure_time_model

The model is a simple single variable least squares y = a +Bx, and there is a vector, d, that indicates whether or not the response is right censored.ï¿½

Now here is my issue.ï¿½ The way to proceed with estimating such a model is using WLS where the weights are a function of whether or not the model is censored and the Kaplan-Meier estimate.ï¿½ Anyway, I am dealing with 3 papers.ï¿½ The first 2 are EXACTLY the same.ï¿½ In fact it seems as one is almost directly copied from the other.ï¿½ For reference, they are:

Li and Wang. "Empirical Likelihood Regression Analysis for Right Censored Data." Statistica Sinica, 13, 51-68.

Qin and Jing. "Empirical Likelihood for Censored Linear Regression". Scandanavian Journal of Statistics. Vol 28 661-663

Zhou, Kim, Bathke, "Empirical Likelihood Analysis for the Heteroskedastic Accelerated Failure time Models" Statistica Sinica 22 (2012), 295-316

The first two are exactly the same.ï¿½ The third is different (and the third was also written by the person who wrote the censored EL regression library in R).
ï¿½
The difference between the first two and the third is that in the first two, the authors suggest a formula for weighting the Y's only,ï¿½ and then performing OLS on the new Y data, not performing WLS. To me this seems strange as I am unfamiliar with this approach.ï¿½ The third paper suggests weighting observations (not just the response variable) by the same formula that the other two papers suggested weighting the endogenous variables.ï¿½

Now, the only way I am able to replicate the parameter estimates for 'B' in R is if I follow the third paper.ï¿½ To add another layer of confusion, in the third paper the authors acknowledge that there are 2 ways of weighting observations to estimate the parameters a and B and they use the other way in the emplike package in R.ï¿½

And one last element of frustration, even though I am able to replicate the results in R, the results are not the same as the results reported in the first paper (the data was publicly available).ï¿½ In other words, using the data from the Li and Wang paper and running it through the R program designed to estimate AFT model parameters did not produce the same results as the authors of the original paper.ï¿½

Now, to finally ask a concrete question, is it acceptable to proceed in the way that is consistent with the R package and the Zhou paper even though there are discrepancies in the other papers?ï¿½ï¿½ Also,ï¿½ the more I explore this, the more I realize that there doesn't seem to be a unified way to estimate a randomly censored regression model.ï¿½ Maybe all of the methods are correct in some way?

Thanks for any advice you might have,

Justin

P.S. Just to be clear again, this is not an empirical likelihood question, but simply a question on how to estimate randomly censored data and what the conventions are for matching results with other programs.ï¿½

josef...@gmail.com

unread,

Jul 20, 2012, 5:45:40 AM7/20/12

to pystat...@googlegroups.com

On Fri, Jul 20, 2012 at 3:12 AM, justin <jg3...@student.american.edu> wrote:
> Hi group,
>

> I have a non-Empirical likelihood question realted to my GSOC project. It

> is mostly theoretical.
>
> The model I am working with is an AFT; it is a model with random right
> censoring:
>
> http://en.wikipedia.org/wiki/Accelerated_failure_time_model
>
> The model is a simple single variable least squares y = a +Bx, and there is
> a vector, d, that indicates whether or not the response is right censored.
>

> Now here is my issue. The way to proceed with estimating such a model is

> using WLS where the weights are a function of whether or not the model is

> censored and the Kaplan-Meier estimate. Anyway, I am dealing with 3 papers.
> The first 2 are EXACTLY the same. In fact it seems as one is almost
> directly copied from the other. For reference, they are:

>
> Li and Wang. "Empirical Likelihood Regression Analysis for Right Censored
> Data." Statistica Sinica, 13, 51-68.
>
> Qin and Jing. "Empirical Likelihood for Censored Linear Regression".
> Scandanavian Journal of Statistics. Vol 28 661-663

they are referring to Koul, ... published in The Annals of Statistics,
Vol. 9, No. 6 (Nov., 1981), cited 363 times

>
> Zhou, Kim, Bathke, "Empirical Likelihood Analysis for the Heteroskedastic
> Accelerated Failure time Models" Statistica Sinica 22 (2012), 295-316
>
>

> The first two are exactly the same. The third is different (and the third

> was also written by the person who wrote the censored EL regression library
> in R).
>

> The difference between the first two and the third is that in the first two,

> the authors suggest a formula for weighting the Y's only, and then

> performing OLS on the new Y data, not performing WLS. To me this seems

> strange as I am unfamiliar with this approach. The third paper suggests

> weighting observations (not just the response variable) by the same formula
> that the other two papers suggested weighting the endogenous variables.
>

> Now, the only way I am able to replicate the parameter estimates for 'B' in

> R is if I follow the third paper. To add another layer of confusion, in the

> third paper the authors acknowledge that there are 2 ways of weighting
> observations to estimate the parameters a and B and they use the other way
> in the emplike package in R.
>

> And one last element of frustration, even though I am able to replicate the
> results in R, the results are not the same as the results reported in the

> first paper (the data was publicly available). In other words, using the

> data from the Li and Wang paper and running it through the R program
> designed to estimate AFT model parameters did not produce the same results
> as the authors of the original paper.
>

> Now, to finally ask a concrete question, is it acceptable to proceed in the
> way that is consistent with the R package and the Zhou paper even though

> there are discrepancies in the other papers? Also, the more I explore

> this, the more I realize that there doesn't seem to be a unified way to

> estimate a randomly censored regression model. Maybe all of the methods are
> correct in some way?

My guess is that these are different estimators that are all "correct".

The best would be to find a reference to which is "better" (or for
which cases one or the other is better), and implement both or the
better.

Matching an R package is fine, the quality and usefulness depends a
bit on which R package has it.

I think you could proceed with the R equivalent, and commit to
whatever you have implemented into the example folder or the sandbox,
so whenever we want to expand to different methods for AFT or censored
regression we have a place to start.

Skipper has Tobit mostly finished, but from only a quick look at some
of your references, I cannot tell how much overlap there is. (Skipper
implemented maximum likelihood, but as far as I remember there are
also some methods that use linear models with transformed data.)

Josef

josef...@gmail.com

unread,

Jul 20, 2012, 5:56:39 AM7/20/12

to pystat...@googlegroups.com

On Fri, Jul 20, 2012 at 5:45 AM, <josef...@gmail.com> wrote:
> On Fri, Jul 20, 2012 at 3:12 AM, justin <jg3...@student.american.edu> wrote:
>> Hi group,
>>
>> I have a non-Empirical likelihood question realted to my GSOC project. It
>> is mostly theoretical.
>>
>> The model I am working with is an AFT; it is a model with random right
>> censoring:
>>
>> http://en.wikipedia.org/wiki/Accelerated_failure_time_model
>>
>> The model is a simple single variable least squares y = a +Bx, and there is
>> a vector, d, that indicates whether or not the response is right censored.
>>
>> Now here is my issue. The way to proceed with estimating such a model is
>> using WLS where the weights are a function of whether or not the model is
>> censored and the Kaplan-Meier estimate. Anyway, I am dealing with 3 papers.
>> The first 2 are EXACTLY the same. In fact it seems as one is almost
>> directly copied from the other. For reference, they are:
>>
>> Li and Wang. "Empirical Likelihood Regression Analysis for Right Censored
>> Data." Statistica Sinica, 13, 51-68.
>>
>> Qin and Jing. "Empirical Likelihood for Censored Linear Regression".
>> Scandanavian Journal of Statistics. Vol 28 661-663
>
> they are referring to Koul, ... published in The Annals of Statistics,
> Vol. 9, No. 6 (Nov., 1981), cited 363 times
>
>>
>> Zhou, Kim, Bathke, "Empirical Likelihood Analysis for the Heteroskedastic
>> Accelerated Failure time Models" Statistica Sinica 22 (2012), 295-316

might be a version of Buckley-James estimator

maybe there is more on linear least squares approaches to
semiparametric censored regression
http://www.jstor.org/stable/20441267 (I didn't look at it yet)

Tobit is fully parametric, with usually normal distributed errors,
your references look like they are not using a distributional
assumption, and estimate the distribution non-parametrically.
Given the estimated distribution the methods might be very similar
then to Tobit or related methods.

I'm a bit slow today, this should have been obvious from the title and
reference to Kaplan-Meier.

Josef

justin

unread,

Jul 20, 2012, 11:51:56 AM7/20/12

to pystat...@googlegroups.com

On 07/20/2012 05:56 AM, josef...@gmail.com wrote:
> On Fri, Jul 20, 2012 at 5:45 AM, <josef...@gmail.com> wrote:
>> On Fri, Jul 20, 2012 at 3:12 AM, justin <jg3...@student.american.edu> wrote:
>>> Hi group,
>>>
>>> I have a non-Empirical likelihood question realted to my GSOC project. It
>>> is mostly theoretical.
>>>
>>> The model I am working with is an AFT; it is a model with random right
>>> censoring:
>>>
>>> http://en.wikipedia.org/wiki/Accelerated_failure_time_model
>>>
>>> The model is a simple single variable least squares y = a +Bx, and there is
>>> a vector, d, that indicates whether or not the response is right censored.
>>>
>>> Now here is my issue. The way to proceed with estimating such a model is
>>> using WLS where the weights are a function of whether or not the model is
>>> censored and the Kaplan-Meier estimate. Anyway, I am dealing with 3 papers.
>>> The first 2 are EXACTLY the same. In fact it seems as one is almost
>>> directly copied from the other. For reference, they are:
>>>
>>> Li and Wang. "Empirical Likelihood Regression Analysis for Right Censored
>>> Data." Statistica Sinica, 13, 51-68.
>>>
>>> Qin and Jing. "Empirical Likelihood for Censored Linear Regression".
>>> Scandanavian Journal of Statistics. Vol 28 661-663
>> they are referring to Koul, ... published in The Annals of Statistics,
>> Vol. 9, No. 6 (Nov., 1981), cited 363 times

I've actually came across that as well as several other papers that
reference that model.

>>
>>> Zhou, Kim, Bathke, "Empirical Likelihood Analysis for the Heteroskedastic
>>> Accelerated Failure time Models" Statistica Sinica 22 (2012), 295-316
> might be a version of Buckley-James estimator

They mention Stute (1993). Consistent Estimation Under Random censorship
when Covariables are present. Journal of Multivariate Analysis. Vol 45
Issue 1

This is what I gathered too. I think I was just thrown off by how many
different estimation methods there are for the same model.

>>
>> The best would be to find a reference to which is "better" (or for
>> which cases one or the other is better), and implement both or the
>> better.

It wouldn't be difficult to implement both at all so I am going to go
ahead and do that. However, only one of the models can be tested. My
next step would be to look for papers that use the Koul method and then
try and match the paper results, that way one of the models wouldn't sit
in the sandbox forever.

>>
>> Matching an R package is fine, the quality and usefulness depends a
>> bit on which R package has it.
>>
>> I think you could proceed with the R equivalent, and commit to
>> whatever you have implemented into the example folder or the sandbox,
>> so whenever we want to expand to different methods for AFT or censored
>> regression we have a place to start.
>>
>> Skipper has Tobit mostly finished, but from only a quick look at some
>> of your references, I cannot tell how much overlap there is. (Skipper
>> implemented maximum likelihood, but as far as I remember there are
>> also some methods that use linear models with transformed data.)
> Tobit is fully parametric, with usually normal distributed errors,
> your references look like they are not using a distributional
> assumption, and estimate the distribution non-parametrically.
> Given the estimated distribution the methods might be very similar
> then to Tobit or related methods.
>
> I'm a bit slow today,

I know the feeling. Thank you as always for the feedback.

josef...@gmail.com

unread,

Jul 20, 2012, 12:42:46 PM7/20/12

to pystat...@googlegroups.com

If you finish also the other model, then there is no reason to let it
sit in the sandbox. For large samples the differences should be
relatively small across models and we can use approximate tests if we
don't find any directly verifiable case.

Josef

josef...@gmail.com

unread,

Jul 22, 2012, 7:03:59 AM7/22/12

to pystat...@googlegroups.com

On Wed, Jul 18, 2012 at 6:06 PM, Skipper Seabold <jsse...@gmail.com> wrote:

How far did you get with fast-LTS?

I have a working version, enough to understand what's going on and see
how I can use it for non-linear estimation, but it's very slow
compared to R.

I finally started to read more details in Rousseeuw and van Driessen,
and there are still many "tricks" to speed up the calculations, most
of them I have not implemented yet.

It's starting to look like "work" and I'd rather not duplicate any
more of what you have already done.

If you don't have a fast Fast-LTS yet, then I keep going and finish
implementing Rousseeuw and van Driessen.

Josef

Skipper Seabold

unread,

Jul 22, 2012, 11:05:02 AM7/22/12

to pystat...@googlegroups.com

Just the basics to see that it works. It's not optimized.

> I have a working version, enough to understand what's going on and see
> how I can use it for non-linear estimation, but it's very slow
> compared to R.

For what settings? The LTS estimation in robustbase is written in
Fortran, so it's going to be hard to beat.

josef...@gmail.com

unread,

Jul 22, 2012, 11:20:52 AM7/22/12

to pystat...@googlegroups.com

default settings with modified wood data (nobs=20)
robustbase is instantaneous, mine takes time (a second or a few ?)

It's not just fortran, better starting points and early termination of
bad paths reduce the number of OLS calls and increase accuracy.
For large datasets, nobs>600 or nobs>1500, Rousseeuw and van Driessen
split up the dataset into groups in the first part.

Josef

Skipper Seabold

unread,

Jul 22, 2012, 11:23:32 AM7/22/12

to pystat...@googlegroups.com

They have two code paths in the C/Fortran for small n and large n too.

Skipper Seabold

unread,

Jul 23, 2012, 1:00:45 PM7/23/12

to pystat...@googlegroups.com

Looking some more at this. You're talking about ltsReg and not lmrob
correct? This is my non-expert impression. The former doesn't do any
sophisticated sub-sampling as far as I can tell. It's just all
fortran, and there are no OLS calls. They're doing the decompositions
by hand in either Fortran for the former or C for the latter. The docs
for the subroutine rfrdraw is

cc Draws ngroup nonoverlapping subdatasets out of a dataset of size n,
cc such that the selected case numbers are uniformly distributed from 1 to n.

If n is small enough for a given number of regressors <= 6, they do
exhaustive search, if not, they do the sub-sampling. As best I can
tell, they just throw out the result if they get a rank deficient
subset in this one. I think the Fortran makes this cheap.

For the lmrob, they're doing the constrained sub-sampling by default.
Again as best I can tell, they're doing the decomposition by hand here
and are checking whether there are any k x k square sets of the entire
sub-sample that are full rank, if so then they're fine. If not, they
make some modification to make sure it's full rank I think. They
reference some algorithm in Golub and Van Loan's Matrix Computations,
though they don't mention which one. I assume they've modified some
method for efficient matrix subset selection to fit their problem. I
think this is usually used for selecting the subset columns for
variable selection / data mining.

Might get some help on the numpy list if you ask about submatrix
selection. Seems to be a "fun" computational linear algebra problem...
though I don't know how you can use any existing linear algebra
routines for this without modifying the source unless someone has one
already somewhere.

Skipper

Virgile Fritsch

unread,

Sep 6, 2012, 10:48:12 AM9/6/12

to pystat...@googlegroups.com

Some times ago, I implemented a Fast-MCD algorithm in scikit-learn to compute a robust estimator of covariance.
The code has been speeded up recently thanks to Vlad Niculae but I think it is concerned by the same kind of problems that you may encounter with your LTS implementation.

I do not know if my experience can be helpful but here are some thoughts I had regarding the use of "c-step-based" algorithms such as Fast-LTS or Fast-MCD:

Instead of randomly drawing initial subsamples, I considered projecting the data using one-dimensional random projections and each time taking the 50% most concentrated observation around the projected data median as an initial sample. I ended up using that initialization scheme systematically.
One other initialization method that I use a lot is to run the Fast-MCD algorithm only one one sample which is determined as the 50% most concentrated data points according to a nonparametric Parzen Window density estimator. This can only be applied, of course, when the data are unimodal.

I am currently reading your LTS code, just in case I can give you useful feedback.

Virgile

josef...@gmail.com

unread,

Sep 6, 2012, 11:14:37 AM9/6/12

to pystat...@googlegroups.com

On Thu, Sep 6, 2012 at 10:48 AM, Virgile Fritsch
<virgile...@gmail.com> wrote:
> Some times ago, I implemented a Fast-MCD algorithm in scikit-learn to
> compute a robust estimator of covariance.
> The code has been speeded up recently thanks to Vlad Niculae but I think it
> is concerned by the same kind of problems that you may encounter with your
> LTS implementation.
>
> I do not know if my experience can be helpful but here are some thoughts I
> had regarding the use of "c-step-based" algorithms such as Fast-LTS or
> Fast-MCD:
>
> Instead of randomly drawing initial subsamples, I considered projecting the
> data using one-dimensional random projections and each time taking the 50%
> most concentrated observation around the projected data median as an initial
> sample. I ended up using that initialization scheme systematically.
> One other initialization method that I use a lot is to run the Fast-MCD
> algorithm only one one sample which is determined as the 50% most
> concentrated data points according to a nonparametric Parzen Window density
> estimator. This can only be applied, of course, when the data are unimodal.

I need to think about whether or how this can be used for regression.
The main difference to MCD is that what the most "inlying"
observations are, depends on the regression parameters. And, in
general, we cannot just assume that (y,x) jointly are normally or
similarly, unimodal distributed. x could have trends or categorical
variables.

However, in the R package for similar estimation they mention the use
of MCD for the covariance matrix, but I haven't figured out the
connection yet.

>
> I am currently reading your LTS code, just in case I can give you useful
> feedback.

Thank you, I appreciate any feedback.

My main target after coding LTS (and maximum trimmed likelihood, MTL)
was to get more efficient (in statistical sense) estimators,
efficient LTS and MM-estimators.
The advantage of MTL (and generic LTS) is that we can throw other
models at it, like the discrete models, Logit, Poisson, ...

So there are still possiblities to speed up LTS for the purely linear case.

Josef

josef...@gmail.com

unread,

Sep 6, 2012, 11:31:44 AM9/6/12

to pystat...@googlegroups.com

I opened a WIP-PR, to make commenting easier

https://github.com/statsmodels/statsmodels/pull/452

Josef

Virgile Fritsch

unread,

Sep 6, 2012, 11:57:01 AM9/6/12

to pystat...@googlegroups.com

However, in the R package for similar estimation they mention the use
of MCD for the covariance matrix, but I haven't figured out the
connection yet.

A first (weak) connection is that performing an LTS regression with only an intercept give an estimate of location which is the same than the location obtained by averaging the 50% most concentrated data points found thanks to MCD.

What I understand from Rousseeuw's book is that performing an MCD on the data np.hstack(Y, X) would point out leverage points, without making it possible to distinguish between good and bad ones. The robust MCD-based Mahalanobis distances still provide some insight about the data.
I would say that in the absence of good leverage points, the robust covariance associated to the regression parameter (the_fit.bcov_scaled) would be (very close to) the MCD...

Rousseeuw's book says that algorithms for MVE and LTS are so similar that they could run "at the same time". I think that means that they could be implemented using a generic code but a different objective function.

Hasliza

unread,

Sep 14, 2015, 7:22:54 AM9/14/15

to pystatsmodels

Hi.

I would like to ask about Robust regression by using least trimmed square. I am still in learning in this topic. I using SAS to analysis data and there were no output of standard error and confident interval if using least trimmed square estimation. Its there any specific common on it? Sorry if I am out of topic. I hope I get a reply soon.

Thank you.

josef...@gmail.com

unread,

Sep 14, 2015, 8:56:15 AM9/14/15

to pystatsmodels

On Mon, Sep 14, 2015 at 5:20 AM, Hasliza <hasliza...@gmail.com> wrote:

Hi.

I would like to ask about Robust regression by using least trimmed square. I am still in learning in this topic. I using SAS to analysis data and there were no output of standard error and confident interval if using least trimmed square estimation. Its there any specific common on it? Sorry if I am out of topic. I hope I get a reply soon.

You might get a better answer about SAS on a SAS related forum or stats.stackexchange.

I haven't looked at this in some time, and don't have time right now to get back into this (I'm trying to get back into GAM and splines right now).

From what I remember

standard LTS by itself is not very efficient (in terms of asymptotic standard errors) and is mostly used a pre-estimator for more efficient estimators like MM.

Efficient LTS with data dependent cutoffs have high efficiency for the case with normal distribution and can be used as final estimator.

I don't remember any theoretical reason not to calculate standard errors for the parameter estimates in LTS. AFAIR it still converges to the asymptotic values at the standard rate (although at larger standard errors).

I think it can use the standard OLS results for the reduced sample with scale fixed at the LTS estimate.

(My guess is that trimmed t-tests are just a special case, and those also use standard asymptotic results.)