I have a problem where I have a non Gaussian Distribution (15 dimensional space). I have no
clue about the underlying distribution of the data except that there could be several modes
in the distribution. Is there any method to find out the modes present? In otherwords,
even for a simple case as bimodal Gaussian distribution, is there a nice mathematical
representation (I am not sure if I should be using the mixture distributions) or is there an
elegant method to find out the modes in a data collection for any distribution. Any help
would be appreciated. I have tried clustering and it would not give me a good result.
Regards,
Sreeram.
--
Dept of Electrical Engineering, P.O.Box 4400,
University of New Brunswick, Fredericton,
NB., Canada E3B 5A3. Tel #: (506) 452 6137
Mixture models and the usual clustering methods may or may not find
modes, but there are methods using nonparametric density estimation
for identifying modes in multivariate distributions. Here are some
references:
Koontz, W.L.G. and Fukunaga, K. (1972), "Asymptotic Analysis of a
Nonparametric Clustering Technique," IEEE Transactions on Computers,
C-21, 967-974.
Koontz, W.L.G., Narendra, P.M., and Fukunaga, K. (1976), "A
Graph-Theoretic Approach to Nonparametric Cluster Analysis," IEEE
Transactions on Computers, C-25, 936-944.
Minnotte, M.C. (1992), "A Test of Mode Existence with
Applications to Multimodality," Ph.D. thesis, Rice University,
Department of Statistics.
Mizoguchi, R. and Shimura, M. (1980), "A Nonparametric Algorithm for
Detecting Clusters Using Hierarchical Structure," IEEE Transactions on
Pattern Analysis and Machine Intelligence, PAMI-2, 292-300.
Mueller, D.W. and Sawitzki, G. (1991), "Excess mass estimates and tests
for multimodality," JASA 86, 738-746.
Polonik, W. (1993), "Measuring Mass Concentrations and Estimating
Density Contour Clusters--An Excess Mass Approach," Technical Report,
Beitraege zur Statistik Nr. 7, Universitaet Heidelberg.
SAS Institute Inc. (1993), _SAS/STAT Software: The MODECLUS Procedure_,
SAS Technical Report P-256, Cary, NC: SAS Institute Inc., included in
the Version 7 SAS/STAT User's Guide.
Silverman, B.W. (1986), _Density Estimation_, New York: Chapman and
Hall.
Wong, M.A. (1982), "A Hybrid Clustering Method for Identifying
High-Density Clusters," Journal of the American Statistical
Association, 77, 841-847.
Wong, M.A. and Lane, T. (1983), "A _k_th Nearest Neighbor Clustering
Procedure," _Journal of the Royal Statistical Society_, Series B, 45,
362-368.
Wong, M.A. and Schaack, C. (1982), "Using the _k_th Nearest Neighbor
Clustering Procedure to Determine the Number of Subpopulations,"
_American Statistical Association 1982 Proceedings of the Statistical
Computing Section_, 40-48.
--
Warren S. Sarle SAS Institute Inc. The opinions expressed here
sas...@unx.sas.com SAS Campus Drive are mine and not necessarily
(919) 677-8000 Cary, NC 27513, USA those of SAS Institute.
>I have a problem where I have a non Gaussian Distribution (15 dimensional space). I have no
>clue about the underlying distribution of the data except that there could be several modes
>in the distribution. Is there any method to find out the modes present? In otherwords,
>even for a simple case as bimodal Gaussian distribution, is there a nice mathematical
>representation (I am not sure if I should be using the mixture distributions) or is there an
>elegant method to find out the modes in a data collection for any distribution. Any help
>would be appreciated. I have tried clustering and it would not give me a good result.
The first thing which should be taught to anyone wanting to use
statistics is that, in the real world, NOTHING is normally
distributed. For SOME purposes, it may be a reasonable
approximation. For other purposes, such as regression, the
consequences of it being false are not likely to be great.
In 15 dimensions, without a model or good clustering, it is not
likely that much can be done without an enormous number of
observations. In 1 dimension, with enough data (several thousand
observations are likely to be needed if anything more than a crude
estimate of the distribution is wanted) one can use density
estimation methods. But these become rapidly worse with increasing
dimensionality.
--
This address is for information only. I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
hru...@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558