Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

check for Gaussianity in high dimensional data sets

3 views
Skip to first unread message

Yan Tian

unread,
Dec 23, 2009, 2:25:46 PM12/23/09
to
Hi,

I have a sample data matrix X with d features (rows) and n samples
(columns). I want to check if X comes from a Gaussian

Distribution with unknown mean and covariance. The dimension of the
features is several order or magnitude larger than the columns (sample
covariance is rank deficient).

Does anybody knows some tests to check for Gaussianity in high
dimensional data sets? How do I know if my sample follows a Gaussian
distribution?

Thanks.

Rich Ulrich

unread,
Dec 23, 2009, 7:04:19 PM12/23/09
to
On Wed, 23 Dec 2009 11:25:46 -0800 (PST), Yan Tian
<tiany...@gmail.com> wrote:

>
>I have a sample data matrix X with d features (rows) and n samples
>(columns). I want to check if X comes from a Gaussian

- not really important, but almost everyone describes
features in columns and separate cases in separate rows...

>
>Distribution with unknown mean and covariance. The dimension of the
>features is several order or magnitude larger than the columns (sample
>covariance is rank deficient).
>
>
>
>Does anybody knows some tests to check for Gaussianity in high
>dimensional data sets? How do I know if my sample follows a Gaussian
>distribution?
>

Extract principal components and apply univariate tests.

The really obvious question here is, "What is your purpose?"

Almost nothing is really Gaussian (Normal) if your N can be
large enough for a powerful test. Fortunately, that hardly
ever matters.

Backing up a step, do you really need Normality, or do you
want to know that you have robustness for statistical testing?
- "outliers" are the most common destroyer of robustness, followed
by other sources of heterogeneity of variance.
- The requirement for "normality" in OLS testing is something that
is applied to the residuals. Normality in the variables is more
likely to be a good convenience along the way, than a technical
requirement.

Are you looking for outliers to drop, or for recommendations for
transformations, or is this critical to your theory in some other
fashion? Plotting important Components or variables against
each other is often informative if you are accustomed to looking
at such scatterplots.

--
Rich Ulrich

0 new messages