Mahalanobis distance, SPSS, and multivariate outlier question

Rick Holigrocki

unread,

Jul 3, 1995, 3:00:00 AM7/3/95

to

In preparation for the Manova I am running, I am checking for
multivariate outliers. Tabachnick and Fidell recommend using Mahalanobis
distance as a method of examining cases for multivariate outliers. They
provide an example in BMDPAM.

I am having difficulty finding how to do this in SPSS. The linear
regression menu, where Mahalanobis is accessed, provides me with a text
box in which I must enter the Independent and Dependent variable.

There are 4 variables that I am interested in testing for multivariate
outliers.

If you could provide me with a suggestion as to my next step, I would
appreciate it. In my study, all of these variables are dependent
variables and I only have 1 independent variable (instructional condition
with four levels). The design is a mixed design profile analysis. My
dependent variables are gain scores PrepostA PrepostB PostFu-A PostFu-B.

rick

--
Rick Holigrocki Please address your eMailto: g...@uwindsor.ca
Department of Psychology Home Page: http://server.uwindsor.ca:8000/~gk2/
University of Windsor
Windsor ON Canada N9B 3P4

Ismail Parsa

unread,

Jul 3, 1995, 3:00:00 AM7/3/95

to

|> From: Rick Holigrocki <g...@UWINDSOR.CA> write

|> In preparation for the Manova I am running, I am checking for
|> multivariate outliers. Tabachnick and Fidell recommend using Mahalanobis
|> distance as a method of examining cases for multivariate outliers. They
|> provide an example in BMDPAM.
|>
|> I am having difficulty finding how to do this in SPSS. The linear
|> regression menu, where Mahalanobis is accessed, provides me with a text
|> box in which I must enter the Independent and Dependent variable.
|>
|> There are 4 variables that I am interested in testing for multivariate
|> outliers.
|>
|> If you could provide me with a suggestion as to my next step, I would
|> appreciate it. In my study, all of these variables are dependent
|> variables and I only have 1 independent variable (instructional condition
|> with four levels). The design is a mixed design profile analysis. My
|> dependent variables are gain scores PrepostA PrepostB PostFu-A PostFu-B.

I have not used SPSS for some time now. So I will outline the pseudo
code for multivariate outlier detection. If you have access to SAS,
there is a macro written by Dr. Micheal Friendly. This macro is freely
available in several places on the internet, including Dr. Friendly's
own home page. You could conduct an archie search for the file called
_outlier.sas_. This macro and the associated caveats in detecting
multivariate outliers are laid out in his book _SAS System for Statistical
Graphics_, p. 450, published by the SAS Institute. The original idea (of
using components) has been attributed to Warren S. Sarle, a regular
contributor to this list, and F. W. Young.

In a nutshell, the mahalonobis distance is equivalent to the Euclidean
distance for a standardized bi-variate normal under independence.

So ,

1) Transform your data to STANDARDIZED principal component scores;
2) Calculate the sum of squares of the principal component scores
to get the hahalonobis distance (D^2.) You could implement this using
one of SPSS' standard functions. In SAS this corresponds to using the
USS() function on the components.

To visually detect the outliers you could plot D^2 against chi-square
quintiles.

Much more is implemented in Dr. Friendly's macro and explained in his book
(like iterative multivariate trimming...)

Duchesne Pierre

unread,

Jul 4, 1995, 3:00:00 AM7/4/95

to

Ismail Parsa <s...@EPSILON.COM> writes:

>|> From: Rick Holigrocki <g...@UWINDSOR.CA> write

>In a nutshell, the mahalonobis distance is equivalent to the Euclidean
>distance for a standardized bi-variate normal under independence.

>1) Transform your data to STANDARDIZED principal component scores;

>2) Calculate the sum of squares of the principal component scores
>to get the hahalonobis distance (D^2.) You could implement this using
>one of SPSS' standard functions. In SAS this corresponds to using the
>USS() function on the components.

>To visually detect the outliers you could plot D^2 against chi-square
>quintiles.

Hello,

Is the mahalanobis distance constructed with the sample mean and
sample variance-covariance matrix? Because if it's so, it's not the
better way to find multivariates outliers. It is known that the
mahalanobis distance defined in the usual way is function of the
"leverage" h(i,i), and they are not a good measure for outliers.
That measure suffers of the masking problem, so if you have a
lot of outliers, maybe one will mask another outlier, or maybe you will
not find an outlier at all!

A good way is exposed in Rousseeuw and van Zomeren (JASA, 1991), where
a robust mahalanobis distance is defined. For that matter, replace
the mean and the variance by a robust method, for example the
MVE (minimun volume ellipsoid).

If you're working in a context of regression, the plot of robust
residuals (obtained with LMS for example) versus robust mahalanobis
distance could be useful. With that plot, we can divise each point
in one of 4 categories:
- good observation
- good leverage point
- bad leverage point
- vertical outlier (big residual but small mahalanobis
distance).

hope this helps

Pierre Duchesne
duch...@nord.stat.umontreal.ca

--

pierre duchesne
etudiant UdM statistique, bureau 4257, fax: 343-5700
e-mail: duch...@brise.ere.umontreal.ca

Duchesne Pierre

unread,

Jul 4, 1995, 3:00:00 AM7/4/95

to

After verification, the exact reference is:

Rousseeuw, P. J. and van Zomeren, B. C. (1990). Unmasking
multivariate outliers and leverage points (with discus-
sion). Journal of the American Statistical Association
85, 633-651.

There also an easy paper with some examples in:

Rousseeuw, P. J. (1991). A diagnostic plot for regression
outliers and leverage points. Computational Statistics
and Data Analysis 11, 127-129.

Bye