1 view

Skip to first unread message

Feb 7, 2003, 7:14:51 PM2/7/03

to

Hi,

I am looking for a reference to a book or article where somebody

checked that

several common variables, such as human male adult heights, IQ scores,

measurement errors etc., are actually (approximately) normally

distributed.

Any help is appreciated.

Axel

Feb 9, 2003, 1:58:18 PM2/9/03

to

Axel Boldt wrote:

> Hi,

>

> I am looking for a reference to a book or article where somebody

> checked that

> several common variables, such as human male adult heights, IQ scores,

> measurement errors etc., are actually (approximately) normally

> distributed.

If you cannot find such a reference, I suspect you did not look very hard.

Hoping that I am not doing your work for you: My standard reference is

Johnson and Wichern, Applied multivariate statistical analysis, chapter 4.

Feb 9, 2003, 3:01:47 PM2/9/03

to

In article <40200384.03020...@posting.google.com>,

>Any help is appreciated.

Except for IQ scores, I do not believe that they are

normally distributed, or even that the normal distribution

is a good approximation.

The reason for the IQ scores being an exception is that

the raw numbers are often massaged to get a normal

distribution. This is why many IQ tests do not give

the very high scores which are really earned; the sample

size is not large enough for it.

There may well be other places where the data is made

more normal artificially; this practice should never

have been started.

--

This address is for information only. I do not claim that these views

are those of the Statistics Department or of Purdue University.

Herman Rubin, Deptartment of Statistics, Purdue University

hru...@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558

Feb 10, 2003, 5:59:25 PM2/10/03

to

Gus Gassmann <hgas...@mgmt.dal.ca> wrote

That chapter goes over the well-known procedures to test a given

variable for normality. The only sentence relevant to the discussion

at hand is in 4.1: "The importance of the normal distribution rests on

its dual role as both population model for certain natural phenomena

and approximate sampling distribution for many statistics." Like in

virtually all other statistics text books, no reference is provided

for the first claim.

Axel

Feb 11, 2003, 4:33:16 PM2/11/03

to

On 7 Feb 2003 16:14:51 -0800, axel...@yahoo.com (Axel Boldt) wrote:

> Hi,

>

> I am looking for a reference to a book or article where somebody

> checked that

> several common variables, such as human male adult heights, IQ scores,

> measurement errors etc., are actually (approximately) normally

> distributed.

Try Stigler, The history of statistics; or his other book.

One of his books talks about the 19th century devotion to

'normal', in the course of discussing why it got that name.

I think people figured out the error, about the time 'curiosity'

started becoming admirable (1860s and later).

"What's made up of a bunch of tiny deviations will tend

toward the normal"; there's a theory about that. That's all

I get after I stare at your question for a while.

--

Rich Ulrich, wpi...@pitt.edu

Feb 17, 2003, 1:06:19 PM2/17/03

to

Rich, et. al.,

I have a similar problem. I am measuring a physical quantity. I

want to test the hypothesis that the physical quantity is normally

distributed. A normal distribution should be a good approximation.

Unfortunately, I do not have access to the real, physical quantity. What

I have are integer numbers measured by an ADC (Analog to Digital

Converter).

My first thoughts are to perform a log likelihood fit for the gaussian

parameters or just compute the sample mean and variance. Once I have my

parameters, compute the Chi^2/dof. I am not sure exactly how I would do

this ... thinking ...

The real problem gets a bit more complicated. I get some contamination

of the data with noise triggers. That is, values near the baseline that

do not belong to the "real" distribution. If I am measuring a low signal

level it may be difficult to simply cut out the noise data with an x>cut

test.

I also get saturation. That is when the physical quantity is larger than

can be measured by the ADC. Sometimes this results in maximum readings

(1024, 2048, or something like that). In some experiments it can result

in worse cases like negative values or values near the baseline.

I have many thousands of these data channels to analyze so I need to

fully automated procedure. That is, can't afford to examine each

distribution by hand.

Any thoughts?

Thanks,

Stan

Feb 17, 2003, 3:55:01 PM2/17/03

to

a) Why do you care if they are not normal?

If there is a *property* that matters, try to check for that.

b) When there has been a "problem", what is the form

that the distribution has taken, instead of normal?

Does the past give useful guidance?

c) Given that Extremes are what most often matters,

you could be worried by either skewness or kurtosis,

or both; and given that you have limits to the instruments,

your circumstance suggests to me a simple count

of the top OVERFLOW value or a couple more; and

the bottom value or two.

You can be warned of too-many extremes if the sum of the

two counts is more than some cutoff; you can be warned

of skewness if the difference of the counts (raw? binomial

test?) is over a cutoff or fails a test.

Hope this helps, or is suggestive.

Feb 17, 2003, 6:10:15 PM2/17/03

to

It matters that they are "approximately" normal because I am computing

another statistic that depends on the validity of the normality. The

actual statistic I am computing is

X = constant x sample_mean*sample_mean/sample_variance

This statistic represents a physical quantity. The number of

photoelectrons produced at the photo cathode of a photo multiplier tube.

> b) When there has been a "problem", what is the form

> that the distribution has taken, instead of normal?

> Does the past give useful guidance?

>

Possibly. There are various problems that can arise. I described two but

there are numerous hardware problems that can cause the distribution to

become distorted and cause the value of X above to become a invalid

estimator for the physical quantity. The actual, unbiased estimator I am

using is a bit more complicated than I described above but for large

sample sizes the above estimator is fine. That is, as long as the

distribution is not distorted in some way. It is also true under these

circumstances that the distribution should be close to a normal

distribution. I want to test this in order to verify that these

conditions are valid. If the distribution is not normal then there is a

problem with the channel that would invalidate the estimate for X.

> c) Given that Extremes are what most often matters,

> you could be worried by either skewness or kurtosis,

> or both; and given that you have limits to the instruments,

> your circumstance suggests to me a simple count

> of the top OVERFLOW value or a couple more; and

> the bottom value or two.

>

I am doing some tests along these lines. I am planning on simulating the

whole process along with various modes of hardware problems to study the

effectiveness of my tests.

> You can be warned of too-many extremes if the sum of the

> two counts is more than some cutoff; you can be warned

> of skewness if the difference of the counts (raw? binomial

> test?) is over a cutoff or fails a test.

>

> Hope this helps, or is suggestive.

>

Your suggestions are always helpful. But please, feel free to keep

thinking about this.

One stupid idea, that did not really work, was to bin the data into a

histogram and fit a gaussian to it using an off-the-shelf tool. The

problem is that when you bin discrete data like this into a histogram

you tend to get binning effects that give artificially high Chi^2/dofs

for the fits. This tends to catch bad gaussians but it will also throw

out many good gaussians.

Small sample sizes are not a real problem. I need large samples in any

case to get a good estimate. The standard deviation of my estimator X

divided by it's mean is sqrt(2/(N-5)) for large N. It's a bit uglier for

small N. This assumes that the sample distribution is gaussian. I have

been trying, unsucessfully, to compute the distribution for X without

the gaussian approximation (i.e. for low numbers of photo electrons and

for general amplification statistics).

You can see from this that even for a sample size of 1000 the one sigma

"uncertainty" for the photo electron estimate is still 4.5%. It takes a

sample size of 10,000 to bring this down to 1.4%.

This is for real valued samples. Since my samples are integer the

situation is not quite as good.

Thanks,

Stan

Feb 25, 2003, 8:48:07 PM2/25/03

to

> I have a similar problem. I am measuring a physical quantity. I

>want to test the hypothesis that the physical quantity is normally

>distributed. A normal distribution should be a good approximation.

>

>Unfortunately, I do not have access to the real, physical quantity. What

>I have are integer numbers measured by an ADC (Analog to Digital

>Converter).

>

>My first thoughts are to perform a log likelihood fit for the gaussian

>parameters or just compute the sample mean and variance. Once I have my

>parameters, compute the Chi^2/dof. I am not sure exactly how I would do

>this ... thinking ...

>want to test the hypothesis that the physical quantity is normally

>distributed. A normal distribution should be a good approximation.

>

>Unfortunately, I do not have access to the real, physical quantity. What

>I have are integer numbers measured by an ADC (Analog to Digital

>Converter).

>

>My first thoughts are to perform a log likelihood fit for the gaussian

>parameters or just compute the sample mean and variance. Once I have my

>parameters, compute the Chi^2/dof. I am not sure exactly how I would do

>this ... thinking ...

The Jarque-Bera test, based on the sample skewness and kurtosis, is often used

to test normality.

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu