Normal probability plots

Stan Brown

unread,

Nov 10, 2012, 6:27:38 PM11/10/12

to

My students have TI-83 or TI-84 calculators, which can make normal
probability plots. The idea is to test whether the data (probably)
came from a normal distribution; the closer the plot is to a straight
line, the more likely that they did.

But it can be hard with a small sample to see whether the line is
straight. Minitab (which we don't have) plots boundary curves, and
if the points are all inside those bounds then we say that the data
were (probably) normal.

1. I've done quite a lot of Googling, but have been unable to
discover how Minitab computes those bounds. Can anyone state clearly
how they are computed?

2. Is there any theoretical justification for the bounds that Minitab
computes?

3. Some authors instead suggest computing the correlation coefficient
of the plot, and comparing it to a critical value. If the correlation
coefficient is below the critical value, we reject the hypothesis of
normality. The trouble is that different authors give different
critical values. Two examples are at
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3676.htm
and
http://www.minitab.com/uploadedFiles/Shared_Resources/Documents/Artic
les/normal_probability_plots.pdf (on page 6).
I *think* that they are giving critical values for the same
computation, but are they? Ryan and Joiner (1976), the second
reference, say that their critical values come from Monte Carlo
simulations; the NIST (first reference) refers to simulations bu
Filliben and Devaney. How is one to choose which to use (if either)?

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com
Shikata ga nai...

Bruce Weaver

unread,

Nov 11, 2012, 1:43:46 PM11/11/12

to

Hi Stan. I don't have an answer to your question. I'm just wondering
*why* you want to test for normality. As George Box said,

“…the statistician knows…that in nature there never was a normal
distribution, there never was a straight line, yet with normal and
linear assumptions, known to be false, he can often derive results which
match, to a useful approximation, those found in the real world.” (JASA,
1976, Vol. 71, 791-799)

So if you're working with real data (as opposed to simulated data), the
population is *not* normally distributed.

HTH.

--
Bruce Weaver
bwe...@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/Home
"When all else fails, RTFM."

Rich Ulrich

unread,

Nov 11, 2012, 5:08:20 PM11/11/12

to

On Sat, 10 Nov 2012 18:27:38 -0500, Stan Brown
<the_sta...@fastmail.fm> wrote:

>My students have TI-83 or TI-84 calculators, which can make normal
>probability plots. The idea is to test whether the data (probably)
>came from a normal distribution; the closer the plot is to a straight
>line, the more likely that they did.
>
>But it can be hard with a small sample to see whether the line is
>straight. Minitab (which we don't have) plots boundary curves, and
>if the points are all inside those bounds then we say that the data
>were (probably) normal.
>
>1. I've done quite a lot of Googling, but have been unable to
>discover how Minitab computes those bounds. Can anyone state clearly
>how they are computed?
>
>2. Is there any theoretical justification for the bounds that Minitab
>computes?
>
>3. Some authors instead suggest computing the correlation coefficient
>of the plot, and comparing it to a critical value.

That sounds to me like the essence of the Shapiro-Wilk statistic
for normality. That's a test that has a very good reputation
for its overall generality and power.

> If the correlation
>coefficient is below the critical value, we reject the hypothesis of
>normality. The trouble is that different authors give different
>critical values. Two examples are at
>http://www.itl.nist.gov/div898/handbook/eda/section3/eda3676.htm
>and
>http://www.minitab.com/uploadedFiles/Shared_Resources/Documents/Artic
>les/normal_probability_plots.pdf (on page 6).
>I *think* that they are giving critical values for the same
>computation, but are they? Ryan and Joiner (1976), the second
>reference, say that their critical values come from Monte Carlo
>simulations; the NIST (first reference) refers to simulations bu
>Filliben and Devaney. How is one to choose which to use (if either)?

I might trust the name-fame of NIST over Minitab, speaking
as a person who knows very little about either.

The problem with any simulation is same factor that creates
the gain: The usefulness depends on whether the given
set of alternatives (simulated) is a match for your data.
However, I do not find a reference for evaluating the S-W
test except for the original S-W 1965 paper (which, I now
guess, used simulations). Simulations in 1965 used smaller Ns.

The Wikip page on tests of normaility includes tests and
authorities that I'm not familiar with, but S-W is still rated high.

--
Rich Ulrich

Stan Brown

unread,

Nov 14, 2012, 8:21:20 PM11/14/12

to

Thanks to the two of you who responded.

Agreed that no real data are ever exactly normally distributed, or at
least that an exact normal distribution would warrant a very hard
look. I think the real criterion is "close enough to normal that we
can use a normal approximation without throwing the p-value off by
very much."

Regarding Shapiro-Wilk, when I was googling before posting it was
different. Instead of order statistics like a normal probability
plot, it used coefficients generated from "means, variances, and
covariances" of the order statistics according to NIST. But given
Rich's remark about power, I guess I should get hold of S-W's paper
and look into their method in more detail.

Bruce Weaver

unread,

Nov 14, 2012, 9:17:10 PM11/14/12

to

On 14/11/2012 8:21 PM, Stan Brown wrote:
>
> Thanks to the two of you who responded.
>
> Agreed that no real data are ever exactly normally distributed, or at
> least that an exact normal distribution would warrant a very hard
> look. I think the real criterion is "close enough to normal that we
> can use a normal approximation without throwing the p-value off by
> very much."

--- snip ---

And of course, in linear models (including t-tests, ANOVA, etc), it is
the *errors* that are assumed to be normal, not the dependent variable.