Verification that variables are normal?

1 view
Skip to first unread message

Axel Boldt

unread,
Feb 7, 2003, 7:14:51 PM2/7/03
to
Hi,

I am looking for a reference to a book or article where somebody
checked that
several common variables, such as human male adult heights, IQ scores,
measurement errors etc., are actually (approximately) normally
distributed.

Any help is appreciated.

Axel

Gus Gassmann

unread,
Feb 9, 2003, 1:58:18 PM2/9/03
to

Axel Boldt wrote:

> Hi,
>
> I am looking for a reference to a book or article where somebody
> checked that
> several common variables, such as human male adult heights, IQ scores,
> measurement errors etc., are actually (approximately) normally
> distributed.

If you cannot find such a reference, I suspect you did not look very hard.

Hoping that I am not doing your work for you: My standard reference is
Johnson and Wichern, Applied multivariate statistical analysis, chapter 4.

Herman Rubin

unread,
Feb 9, 2003, 3:01:47 PM2/9/03
to
In article <40200384.03020...@posting.google.com>,

>Any help is appreciated.

Except for IQ scores, I do not believe that they are
normally distributed, or even that the normal distribution
is a good approximation.

The reason for the IQ scores being an exception is that
the raw numbers are often massaged to get a normal
distribution. This is why many IQ tests do not give
the very high scores which are really earned; the sample
size is not large enough for it.

There may well be other places where the data is made
more normal artificially; this practice should never
have been started.
--
This address is for information only. I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Deptartment of Statistics, Purdue University
hru...@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558

Axel Boldt

unread,
Feb 10, 2003, 5:59:25 PM2/10/03
to
Gus Gassmann <hgas...@mgmt.dal.ca> wrote

That chapter goes over the well-known procedures to test a given
variable for normality. The only sentence relevant to the discussion
at hand is in 4.1: "The importance of the normal distribution rests on
its dual role as both population model for certain natural phenomena
and approximate sampling distribution for many statistics." Like in
virtually all other statistics text books, no reference is provided
for the first claim.

Axel

Rich Ulrich

unread,
Feb 11, 2003, 4:33:16 PM2/11/03
to
On 7 Feb 2003 16:14:51 -0800, axel...@yahoo.com (Axel Boldt) wrote:

> Hi,
>
> I am looking for a reference to a book or article where somebody
> checked that
> several common variables, such as human male adult heights, IQ scores,
> measurement errors etc., are actually (approximately) normally
> distributed.

Try Stigler, The history of statistics; or his other book.

One of his books talks about the 19th century devotion to
'normal', in the course of discussing why it got that name.
I think people figured out the error, about the time 'curiosity'
started becoming admirable (1860s and later).


"What's made up of a bunch of tiny deviations will tend
toward the normal"; there's a theory about that. That's all
I get after I stare at your question for a while.

--
Rich Ulrich, wpi...@pitt.edu

http://www.pitt.edu/~wpilib/index.html

Stan Thomas

unread,
Feb 17, 2003, 1:06:19 PM2/17/03
to
Rich, et. al.,

I have a similar problem. I am measuring a physical quantity. I
want to test the hypothesis that the physical quantity is normally
distributed. A normal distribution should be a good approximation.

Unfortunately, I do not have access to the real, physical quantity. What
I have are integer numbers measured by an ADC (Analog to Digital
Converter).

My first thoughts are to perform a log likelihood fit for the gaussian
parameters or just compute the sample mean and variance. Once I have my
parameters, compute the Chi^2/dof. I am not sure exactly how I would do
this ... thinking ...

The real problem gets a bit more complicated. I get some contamination
of the data with noise triggers. That is, values near the baseline that
do not belong to the "real" distribution. If I am measuring a low signal
level it may be difficult to simply cut out the noise data with an x>cut
test.

I also get saturation. That is when the physical quantity is larger than
can be measured by the ADC. Sometimes this results in maximum readings
(1024, 2048, or something like that). In some experiments it can result
in worse cases like negative values or values near the baseline.

I have many thousands of these data channels to analyze so I need to
fully automated procedure. That is, can't afford to examine each
distribution by hand.

Any thoughts?

Thanks,

Stan

Rich Ulrich

unread,
Feb 17, 2003, 3:55:01 PM2/17/03
to

a) Why do you care if they are not normal?
If there is a *property* that matters, try to check for that.

b) When there has been a "problem", what is the form
that the distribution has taken, instead of normal?
Does the past give useful guidance?

c) Given that Extremes are what most often matters,
you could be worried by either skewness or kurtosis,
or both; and given that you have limits to the instruments,
your circumstance suggests to me a simple count
of the top OVERFLOW value or a couple more; and
the bottom value or two.

You can be warned of too-many extremes if the sum of the
two counts is more than some cutoff; you can be warned
of skewness if the difference of the counts (raw? binomial
test?) is over a cutoff or fails a test.

Hope this helps, or is suggestive.

Stan Thomas

unread,
Feb 17, 2003, 6:10:15 PM2/17/03
to

It matters that they are "approximately" normal because I am computing
another statistic that depends on the validity of the normality. The
actual statistic I am computing is

X = constant x sample_mean*sample_mean/sample_variance

This statistic represents a physical quantity. The number of
photoelectrons produced at the photo cathode of a photo multiplier tube.

> b) When there has been a "problem", what is the form
> that the distribution has taken, instead of normal?
> Does the past give useful guidance?
>

Possibly. There are various problems that can arise. I described two but
there are numerous hardware problems that can cause the distribution to
become distorted and cause the value of X above to become a invalid
estimator for the physical quantity. The actual, unbiased estimator I am
using is a bit more complicated than I described above but for large
sample sizes the above estimator is fine. That is, as long as the
distribution is not distorted in some way. It is also true under these
circumstances that the distribution should be close to a normal
distribution. I want to test this in order to verify that these
conditions are valid. If the distribution is not normal then there is a
problem with the channel that would invalidate the estimate for X.

> c) Given that Extremes are what most often matters,
> you could be worried by either skewness or kurtosis,
> or both; and given that you have limits to the instruments,
> your circumstance suggests to me a simple count
> of the top OVERFLOW value or a couple more; and
> the bottom value or two.
>

I am doing some tests along these lines. I am planning on simulating the
whole process along with various modes of hardware problems to study the
effectiveness of my tests.

> You can be warned of too-many extremes if the sum of the
> two counts is more than some cutoff; you can be warned
> of skewness if the difference of the counts (raw? binomial
> test?) is over a cutoff or fails a test.
>
> Hope this helps, or is suggestive.
>

Your suggestions are always helpful. But please, feel free to keep
thinking about this.

One stupid idea, that did not really work, was to bin the data into a
histogram and fit a gaussian to it using an off-the-shelf tool. The
problem is that when you bin discrete data like this into a histogram
you tend to get binning effects that give artificially high Chi^2/dofs
for the fits. This tends to catch bad gaussians but it will also throw
out many good gaussians.

Small sample sizes are not a real problem. I need large samples in any
case to get a good estimate. The standard deviation of my estimator X
divided by it's mean is sqrt(2/(N-5)) for large N. It's a bit uglier for
small N. This assumes that the sample distribution is gaussian. I have
been trying, unsucessfully, to compute the distribution for X without
the gaussian approximation (i.e. for low numbers of photo electrons and
for general amplification statistics).

You can see from this that even for a sample size of 1000 the one sigma
"uncertainty" for the photo electron estimate is still 4.5%. It takes a
sample size of 10,000 to bring this down to 1.4%.

This is for real valued samples. Since my samples are integer the
situation is not quite as good.

Thanks,

Stan

Beliavsky

unread,
Feb 25, 2003, 8:48:07 PM2/25/03
to
> I have a similar problem. I am measuring a physical quantity. I
>want to test the hypothesis that the physical quantity is normally
>distributed. A normal distribution should be a good approximation.
>
>Unfortunately, I do not have access to the real, physical quantity. What
>I have are integer numbers measured by an ADC (Analog to Digital
>Converter).
>
>My first thoughts are to perform a log likelihood fit for the gaussian
>parameters or just compute the sample mean and variance. Once I have my
>parameters, compute the Chi^2/dof. I am not sure exactly how I would do
>this ... thinking ...

The Jarque-Bera test, based on the sample skewness and kurtosis, is often used
to test normality.

Reply all
Reply to author
Forward
0 new messages