Findining a P value from a T distribution

gregc...@bigpond.com

unread,

Jan 14, 2006, 3:27:26 PM1/14/06

to

Hi

I am trying to frind the P value from the following:

x bar is 224
mu is 224.05
standard error is 0.1545
the t distribution I get is -3.1108

I am having trouble finding the corresponding P value. Could anyone
help?

Greg

Luis Amaral Afonso

unread,

Jan 14, 2006, 6:13:05 PM1/14/06

to

Greg said.
***Hi
I am trying to find the P value from the following:

I am having trouble finding the corresponding P value. Could anyone help? Greg ***

My response:

I guess you are referring to the Student T Distribution. If it is so, one parameter must b known: the degrees of freedom. Because you didn´t say what it is, I cannot (no one can) answer your request.
Furthermore, the standard deviation is useless to the purpose.

______licas (Luis A. Afonso)

Richard Ulrich

unread,

Jan 14, 2006, 7:21:27 PM1/14/06

to

You are having trouble computing the t-value, unless
you have mis-labeled the SD as SE.

Difference divided by standard error:
-0.05/ 0.1545 is about -0.32.

The normal table (for z) will be pretty accurate for a
t-value in the middle of the distribution, like 0.32.

--
Rich Ulrich, wpi...@pitt.edu
http://www.pitt.edu/~wpilib/index.html

Luis Amaral Afonso

unread,

Jan 14, 2006, 7:57:32 PM1/14/06

to

Concerning the later post:

Values of P:

___1 d.f.______p(t<=-0.32)=0.401
__10__________________ =0.378
__20__________________ =0.376
__50__________________ =0.375

Whereas
_____________p(z<=-0.32)=0.374
____z for N(mu=0, sigma=1)

________licas (Luis A. Afonso)

gregc...@bigpond.com

unread,

Jan 14, 2006, 11:20:40 PM1/14/06

to

There is no mention of the degrees of freedom

gregc...@bigpond.com

unread,

Jan 14, 2006, 11:22:07 PM1/14/06

to

Standard error is .01545 my mistake.

gregc...@bigpond.com

unread,

Jan 14, 2006, 11:28:17 PM1/14/06

to

Sorry, degrees of freedom = 15

Reef Fish

unread,

Jan 14, 2006, 11:32:33 PM1/14/06

to

Greg,

Two other people tried to help, but they both missed the key element
of a P value is that you CANNOT determine a P value unless you
know the Alternative Hypothesis, whether it is one-tailed or
two-tailed,
and if one-tailed, WHICH tail.

This topic had an extended discussion in sci.stat.math, about the
definition of a "p-value" as one being "more extreme" than the observed
value of the statistic. You cannot determine which direction is "more
extreme" unless you know if the alternative hypothesis is:

1. mu .NE. 224.05
2. mu > 224.05
3, mu < 224.05

I think you meant your observed xbar = 224.05 while the mu being
tested is 224 (but I kept it the way you had stated) because it would
seem rather odd that your observed value of xbar is exactly 224,
while the hypothesized value is exactly 224.05,

But if what you stated did not their roles reversed, the answer can
be fixed accordingly.

Richard Ulrich was correct in pointing out that you had mislabeled
the the standard deviation as SE, because if it were the SE of
xbar, the observed t would have been (224 - 224.05)/.1545 which
is -.3236, and not your value -3.1106.

I just saw the OP's correction that the SE was .01545, which would
make the observed t-value -3.236 for the values given.

Luis Afonzo was correct in pointing out that you need to know the
"degree of freedom" of your t-distribution, but erred in assuming
that your alternative hypothesis is of type (3).

Thus, for the appropriate d.f. of the t, the desired P-value would be

Pr( abs(T) > 3.236) for alternative (1)
Pr( T > -3.236) for alternative (2),
Pr( T < -3.236) for alternative (3).

-- Bob.

gregc...@bigpond.com

unread,

Jan 14, 2006, 11:56:09 PM1/14/06

to

Its two tailed because my hypothesis is:
Ho: μ = 224.05 mm
HA: μ ≠ 224.05 mm

Reef Fish

unread,

Jan 15, 2006, 12:14:44 AM1/15/06

to

Then, your P-value would be Pr( abs(T) > 3.236 = P(T<-3.236) +
P(T>3.236)

or 2 times P(T < -3.236) for the appropriate d.f. for your T.

-- Bob.

Luis Amaral Afonso

unread,

Jan 15, 2006, 3:47:57 AM1/15/06

to

I agree with all the later remarks.

To achieve a conclusion it is enough to choose a confidence level.

__df=5
___level=5%______critical value=2.571
which being lesser than 3.236 leads to accept H1 (rejecting H0).
___level=1%______critical value=4.032
greater than 3.236 leads to accept H0.

At this level, reading on the Tables:
____df= 9____c.v.=3.250 (rejecting H0)
____df=10______ =3.169 (accepting H0)

_________licas (Luis A. Afonso)

Art Kendall

unread,

Jan 15, 2006, 9:28:30 AM1/15/06

to

Assuming that you want to compare the mean you showed to a constant. And
the most commonly used constant is zero. and that since you gave the
sign you are interested in the left tail.

You can find the probability associated with a t (or with other stats)
with the CDF.* functions in SPSS.

this is an example syntax, I threw in a few other t's and df's to demo.

new file.
data list list /t (f7.4) df (f7.3).
begin data.
-3.1108 15
3.1108 15
2 15
-2 15
1.96 800.5
end data.
compute p = cdf.t(t,df).
formats p (f5.3).
list.

t df p

-3.1108 15.000 .004
3.1108 15.000 .996
2.0000 15.000 .968
-2.0000 15.000 .032
1.9600 800.500 .975

Art
A...@DrKendall.org
Social Research Consultants

Luis Amaral Afonso

unread,

Jan 15, 2006, 11:28:01 AM1/15/06

to

***Not quite correct.
A frequentist would say the probability of heads is 9/9 = 1. If he threw a tenth time and got a tail, the frequentist should say the probabiltiy was 9/10, which is the maximum likelihood solution for p.***

Are you not confusing with the EMPIRISTIC point of view?. What is, in your opinion, wrong I my analysis to estimate a lower *credible* limit of p – the probability to get Head in the 10th flip?

________licas (Luis A. Afonso)

Reef Fish

unread,

Jan 15, 2006, 12:09:30 PM1/15/06

to

Luis Amaral Afonso wrote:
> I agree with all the later remarks.
>
> To achieve a conclusion it is enough to choose a confidence level.

However, you still missed the point that a P-value depends on the
Alternative Hypothesis which defines the meaning of "more extreme".

The examples you gave for your 1-1 correspondence between the
use of a confidence interval and p-value for hypothesis testing
are applicable ONLY to the two-sided alternative, ONE of the
THREE possible alternative hypotheses for the OP's test.

In that respect, the probability of your response being correct is
at most 1/3.

-- Bob.

Bruce Weaver

unread,

Jan 15, 2006, 2:59:40 PM1/15/06

to

Here is a nice piece of software you can use to obtain the p-value:

http://www.cytel.com/Products/StaTable/default.asp

You can use the java program online, or download the free Windows version.

--
Bruce Weaver
bwe...@lakeheadu.ca
www.angelfire.com/wv/bwhomedir

Reef Fish

unread,

Jan 15, 2006, 4:37:19 PM1/15/06

to

Bruce Weaver wrote:
> Reef Fish wrote:
> > gregc...@bigpond.com wrote:
> >
> >>Its two tailed because my hypothesis is:
> >>Ho: μ = 224.05 mm
> >>HA: μ ≠ 224.05 mm
> >
> >
> > Then, your P-value would be Pr( abs(T) > 3.236 = P(T<-3.236) +
> > P(T>3.236)
> >
> > or 2 times P(T < -3.236) for the appropriate d.f. for your T.
> >
> > -- Bob.
> >
>
> Here is a nice piece of software you can use to obtain the p-value:
>
> http://www.cytel.com/Products/StaTable/default.asp

Uh, definitely NOT recommended.

Apart from giving p-values only for Z (and not T), whereas Z could
have given as a limiting case of T with infinite degrees of freedom,
and give the p-values for any T with any degree of freedom,

It's making the SAME error most folk make on computing p-values,
not taking into consideration the THREE possible alternative
hypotheses.

The table into program gives only TWO of the THREE, the two-
tail case and the left tail case.

Thus, it can be said that the program, even for Z, is correct only
2/3 of the time.

-- Reef Fish Bob.

Bruce Weaver

unread,

Jan 15, 2006, 8:15:40 PM1/15/06

to

Reef Fish wrote:
> Bruce Weaver wrote:
>
>>Reef Fish wrote:
>>
>>>gregc...@bigpond.com wrote:
>>>
>>>
>>>>Its two tailed because my hypothesis is:
>>>>Ho: μ = 224.05 mm
>>>>HA: μ ≠ 224.05 mm
>>>
>>>
>>>Then, your P-value would be Pr( abs(T) > 3.236 = P(T<-3.236) +
>>>P(T>3.236)
>>>
>>>or 2 times P(T < -3.236) for the appropriate d.f. for your T.
>>>
>>>-- Bob.
>>>
>>
>>Here is a nice piece of software you can use to obtain the p-value:
>>
>> http://www.cytel.com/Products/StaTable/default.asp
>
>
> Uh, definitely NOT recommended.

I disagree, and stand by my recommendation. See below.

>
> Apart from giving p-values only for Z (and not T),

Look again, Bob. In the java version, "Continuous Student's t" is 4th
from the top in the pop-up list of distributions.

> whereas Z could
> have given as a limiting case of T with infinite degrees of freedom,
> and give the p-values for any T with any degree of freedom,
>
> It's making the SAME error most folk make on computing p-values,
> not taking into consideration the THREE possible alternative
> hypotheses.

If the program is, so are virtually all of the tables of critical values
in the backs of textbooks. They typically give only the positive
critical values for symmetrical sampling distributions.

>
> The table into program gives only TWO of the THREE, the two-
> tail case and the left tail case.

I think the good folks at Cytel probably figured that most of us could
work out the right tail case, given the left tail case and a symmetrical
sampling distribution. ;-)

>
> Thus, it can be said that the program, even for Z, is correct only
> 2/3 of the time.
>
> -- Reef Fish Bob.

Cheers,
Bruce

Reef Fish

unread,

Jan 16, 2006, 12:35:57 AM1/16/06

to

Bruce Weaver wrote:
> Reef Fish wrote:
> > Bruce Weaver wrote:
> >
> >>Reef Fish wrote:
> >>
> >>>gregc...@bigpond.com wrote:
> >>>
> >>>
> >>>>Its two tailed because my hypothesis is:
> >>>>Ho: μ = 224.05 mm
> >>>>HA: μ ≠ 224.05 mm
> >>>
> >>>
> >>>Then, your P-value would be Pr( abs(T) > 3.236 = P(T<-3.236) +
> >>>P(T>3.236)
> >>>
> >>>or 2 times P(T < -3.236) for the appropriate d.f. for your T.
> >>>
> >>>-- Bob.
> >>>
> >>
> >>Here is a nice piece of software you can use to obtain the p-value:
> >>
> >> http://www.cytel.com/Products/StaTable/default.asp
> >
> >
> > Uh, definitely NOT recommended.
>
> I disagree, and stand by my recommendation. See below.
>
> >
> > Apart from giving p-values only for Z (and not T),
>
>
> Look again, Bob. In the java version, "Continuous Student's t" is 4th
> from the top in the pop-up list of distributions.

I stand corrected and withdraw my objection (which was minor anyway)
because I wasn't aware of the tab for distributions.

>
>
> > whereas Z could
> > have given as a limiting case of T with infinite degrees of freedom,
> > and give the p-values for any T with any degree of freedom,
> >
> > It's making the SAME error most folk make on computing p-values,
> > not taking into consideration the THREE possible alternative
> > hypotheses.
>
> If the program is, so are virtually all of the tables of critical values
> in the backs of textbooks. They typically give only the positive
> critical values for symmetrical sampling distributions.

That does NOT excuse the program from making the same ERROR
of not considering what is "more extreme" in the Alternative
Hypothesis.
How would anyone ever get a p-value GREAT THAN 0.5 using this
program? That was one of the essential points in the discussion
thread about p-values -- that it CAN have values greater than 0.5
when the Alternative Hypothesis calls for the "more extreme"
direction. Of course even in textbooks giving only the smaller
tails, it should have given in the text the explanation of how to use
the smaller tail by adding 0.5 if the alternative dictates that
p-value.

In a computer PROGRAM, the designer should have simply ASKED
for the option of the Alternative: "<", ">", or "unequal" and give the
correct answer accordingly.

Failing to do so, it was an unmistable sign that either the programmer
is UNAWARE of the correct definition of p-value, or is simply sloppy
in implementing something that's prone to giving erroneous answers
for the ">" Alternative.

>
> >
> > The table into program gives only TWO of the THREE, the two-
> > tail case and the left tail case.
>
> I think the good folks at Cytel probably figured that most of us could
> work out the right tail case, given the left tail case and a symmetrical
> sampling distribution. ;-)

I am highly skeptical of your conjecture about "most of us could work
out the right tail". Having been around this group for awhile, I am
unfortunately led to the belief that "most of us" don't even know the
definition of a p-value; and I even suspect your good folks at Cytel
may not know it themselves, else they would have programmed the
routine correctly. ;-)

>
> > Thus, it can be said that the program, even for Z, is correct only
> > 2/3 of the time.

In light of the foregoing discussion, that statement stands. :-)

Jerry Dallal

unread,

Jan 16, 2006, 8:07:41 AM1/16/06

to

Reef Fish wrote:
> Bruce Weaver wrote:
>>>>Here is a nice piece of software you can use to obtain the p-value:
>>>>
>>>> http://www.cytel.com/Products/StaTable/default.asp
>>>

> That does NOT excuse the program from making the same ERROR

> of not considering what is "more extreme" in the Alternative
> Hypothesis.

It should be noted that Statable is not presented as a P value
calculator. Rather, it is described as providing "immediate access to
the twenty-five most commonly used statistical distributions. With just
a few keystrokes, the tail area or percentage point you want appears in
a pop-up window. StaTable eliminates hunting for books of tables,
interpolation, and the possibility of errors in calculation." Users
should be their own lookout for what they choose to do with it.

The one thing I find lacking is that it can't handle handle tail areas
less than 0.001.

That said, it would have been nice if all three values were available.
I won't second guess them, but StaTable handles discrete distributions
as well as continuous, which raise the question,

If the left tail is P(X<=k), should the right tail be P(X>k) or P(X>=k)?

At least the left tail has convention going for it, that is, it is
usually P(X<=k). I wish the displays were clearer. All the Java
version says is "left tail", leaving some uncertainty about how it might
be defined (<k or <-k). However, the accompanying manual is explicit
about what is being calculated. The Windows version is only marginally
better, as the graphic is small.

> Failing to do so, it was an unmistable sign that either the programmer
> is UNAWARE of the correct definition of p-value, or is simply sloppy
> in implementing something that's prone to giving erroneous answers
> for the ">" Alternative.

FWIW, I suspect it was more of an interface design issue. While no one
is perfect, I would put *very low* on my list the possibility that the
folks at Cytel are unaware of the definition of a P value or how to
calculate one properly, since they have a slew of papers in refereed
statistics journals on the subject and that P values are the reason why
StatXact and LogXact (their flagship products) exist.

--Jerry

Bruce Weaver

unread,

Jan 16, 2006, 12:32:29 PM1/16/06

to

Jerry Dallal wrote:

> It should be noted that Statable is not presented as a P value
> calculator. Rather, it is described as providing "immediate access to
> the twenty-five most commonly used statistical distributions. With just
> a few keystrokes, the tail area or percentage point you want appears in
> a pop-up window. StaTable eliminates hunting for books of tables,
> interpolation, and the possibility of errors in calculation." Users
> should be their own lookout for what they choose to do with it.
>
> The one thing I find lacking is that it can't handle handle tail areas
> less than 0.001.

--- snip the rest ---

Just to clarify what Jerry is saying, you can enter a tail area and
solve for z, t, or whatever (e.g., using the standard normal, entering
0.05 as the two-tailed area returns z = 1.96). But, the program will
not accept a tail area less than 0.001 or greater than 0.999. (I agree
that this is a curious limitation. of the program.)

But, when you enter a z or t or whatever, and solve for the tail area,
it displays the result to 6 decimals. E.g., using the standard normal,
and entering z = 3.5, I get the following values.

Left tail: 0.999767
Two tailed: 0.000465

And if I wanted the right tail area for z = 3.5, I could always use the
left-tail value for z = -3.5, which is 0.000233. ;-)

Luis Amaral Afonso

unread,

Jan 16, 2006, 1:58:14 PM1/16/06

to

Bob said:

***Luis Amaral Afonso wrote: > I agree with all the later remarks. >> To achieve a conclusion it is enough to choose a confidence level. However, you still missed the point that a P-value depends on the Alternative Hypothesis which defines the meaning of "more extreme". The examples you gave for your 1-1 correspondence between the use of a confidence interval and p-value for hypothesis test ingare applicable ONLY to the two-sided alternative, ONE of the THREE possible alternative hypotheses for the OP's test. In that respect, the probability of your response being correct is at most 1/3. -- Bob. >>
__df=5___level=5%______critical value=2.571> which being lesser than 3.236 leads to accept H1 (rejecting H0).

___level=1%______critical value=4.032 greater than 3.236 leads to accept H0.
At this level, reading on the Tables:
____df=9____c.v.=3.250 (rejecting H0)

____df=10_____ =3.169 (accepting H0)

_________licas (Luis A. Afonso)

My response:

SINCE Jan 14, 2006, 11:56 PM that it was stated by Greg (the beginner of this thread) that the test was a TWO-TAIL one. Therefore:
__Your Jan 15, 2006, 12:09 PM is a complete nonsense.
__The probability Bob to fail his purposes is 1.
WORSE: the distribution is DISCRETE.

________licas (Luis A. Afonso)

Reef Fish

unread,

Jan 16, 2006, 2:18:56 PM1/16/06

to

Bruce Weaver wrote:
> Jerry Dallal wrote:
>
> > It should be noted that Statable is not presented as a P value
> > calculator. Rather, it is described as providing "immediate access to
> > the twenty-five most commonly used statistical distributions.

Jerry pointed out the crux of the issue in our discussion. As a
general
probability calculator, that's a separate issue. When you recommended
it for use as a P value calculator, that was the entirety of my
objection,
that it didn't even consider the THREE Alternative hypotheses, which
is essential and fundamental in any p-value determination.

> > With just
> > a few keystrokes, the tail area or percentage point you want appears in
> > a pop-up window. StaTable eliminates hunting for books of tables,
> > interpolation, and the possibility of errors in calculation." Users
> > should be their own lookout for what they choose to do with it.
> >
> > The one thing I find lacking is that it can't handle handle tail areas
> > less than 0.001.
>
> --- snip the rest ---
>
> Just to clarify what Jerry is saying, you can enter a tail area and
> solve for z, t, or whatever (e.g., using the standard normal, entering
> 0.05 as the two-tailed area returns z = 1.96). But, the program will
> not accept a tail area less than 0.001 or greater than 0.999. (I agree
> that this is a curious limitation. of the program.)

I took a quick look at the program after seeing your comments, and
I quickly found many INCONSISTENCIES in its presentation of
probabilities OR the deviates corresponding to tail probabilities.

I would consider all those inconsistencies the work of "amateurs",
Cytel notwithstanding!

>
> But, when you enter a z or t or whatever, and solve for the tail area,
> it displays the result to 6 decimals. E.g., using the standard normal,
> and entering z = 3.5, I get the following values.
>
> Left tail: 0.999767
> Two tailed: 0.000465

That is another sign of amateurism. The program is CAPABLE
of calculating the probabilities to MORE than six significant
figures. But why present the two-tail probability as .000465
instead of .465258E-3, to be comparable in precision to the
6 significant digits in the left tail?

Actually the program is CAPABLE of accepting the value of z
greater than 6 digits and give the corresponding correct
probabilities.

For the left tail of 1.23456, it gives .891503
For the left tail of 1.23456789, it gives .891504

the latter of which is the rounded version of .89150431722664...

> And if I wanted the right tail area for z = 3.5, I could always use the
> left-tail value for z = -3.5, which is 0.000233. ;-)
>
> --
> Bruce Weaver
> bwe...@lakeheadu.ca
> www.angelfire.com/wv/bwhomedir

which should have been given as .232639E-3, to 6 significant figures.

But it's the z value look-up corresponding to a given tail probability
that the true amateurism of the program showed itself -- in fact,
I would characterize its performance as BAD amateur work, to
be WRONG in the digits shown!

Example, for the left tail value of .891504, it gave a z value of
1.2373. As we had seen, the correct value of z should have been
1.23456789, or any of its correctly rounded values.

But if you gave the corresponding two-tail value of .21699137,
it gives the correct z value, shown as 1.2345.

1.2373 (wrong in the 4th digit) and 1.2345 (correct to the 5th digit)
on the SAME z, depending on whether the tail or two-tail
probability was given as input.

Is that amateurism or what?

So, while the StaTable is excusable for its inappropriateness as
a P-value computer, given Jerry's explanation of what that program
was meant to be, just a probability calculator.

As a probability calculator, the program also leaves MUCH to be
desired, as those few little examples above are meant to illustrate.

I'll have a much more substantive follow-up to Jerry post on
computer programs that do a MUCH better job than the StaTable
is intended to do, and address a few related issues, in response
to Jerry's questions and comments, and I'll change the SUBJECT
to Computer-Oriented Probabilities for Statistical Distributions.

I'll get to that later in the day.

-- Bob.

Luis Amaral Afonso

unread,

Jan 16, 2006, 3:47:55 PM1/16/06

to

John:

The Jaynes book *Probability Theory…* could be rather interesting but it is not my *business* (as I could grasp in a first reading), nor the Philosophy behind it.
I did a very simple thing:
___To suppose that a random trial gives rise to a one of two outputs: Yes or No.
___And that the trial can be infinitely repeated with the same probability p of Yes´s.

I illustrated (by simulation) that, based on the proportion of Yes in the course of n trials, I could have access (almost always) to an interval containing the true probability p.
Not more, not less.
You can claim (reasonably) that this procedure is a careless way to solve the problem finding probabilities, but …if you give us a clue?.
Meanwhile, thank you for your concern.

________licas (Luis A. Afonso)

Reef Fish

unread,

Jan 16, 2006, 7:19:13 PM1/16/06

to

Jerry Dallal wrote:
> Reef Fish wrote:
> > Bruce Weaver wrote:
> >>>>Here is a nice piece of software you can use to obtain the p-value:
> >>>>
> >>>> http://www.cytel.com/Products/StaTable/default.asp
> >>>
>
> > That does NOT excuse the program from making the same ERROR
> > of not considering what is "more extreme" in the Alternative
> > Hypothesis.
>
> It should be noted that Statable is not presented as a P value
> calculator. Rather, it is described as providing "immediate access to
> the twenty-five most commonly used statistical distributions.

I had commented on this key issue, pointed out by Jerry, which may
indeed excuse StaTable for its inadquacies as a program for computing
p-values -- which was my sole objection up to that point, that it was
missing the key ingredient for any p-value computation.

In this post, I am addressing some of the related issues raised by
Jerry, relative to StaTable's pros and cons.

> StaTable eliminates hunting for books of tables,
> interpolation, and the possibility of errors in calculation."

That's where the "computer-oriented" programs for the
computation of probabilities from statistical distributions are
intended to eliminate: (1) the use of tables, (2) the need
for interpolation (all computations can be "exact"), and
(3) LESSENS the probability of errors in calculation, but
certainly not eliminate the.

The errors in (3) are the result of the programmer using
inexact methods when exact methods are available, or
programming errors, which include unnecessary loss of
precision from roundoffs.

> Users should be their own lookout for what they choose to do
> with it.

This includes looking for inconsistencies within the program, as
well erroneous results, as those examples I gave in my follow-up
to Bruce Weaver's follow-up to Jerry's post.

>
> The one thing I find lacking is that it can't handle handle tail areas
> less than 0.001.

That's a defect carried over from textbook tables. Completely
unnecessary. More importantly, in any numerical computation,
it's not the number of decimal places that matters but the number
of significant figures. 0.001 has only ONE significant figure --
in textbooks, that implies a probability less than or equal to .001
presumably too small to be worth the bother.

But that could also imply a loss of all accuracy for studying
random deviates that have have tail probabilities < .001, not to
mention one may want to know the extreme percentile points
of certain distributions.

For example, if we want the Z probabilities for small right tails,
you can find it from the National Bureau of Standards Handbook
of Mathematical Functions (NBS),; or compute it from a program
I had on my laptop since 1989, named ET (Electronic Tables),
given to me by the developer, when he saw some papers I had
published on the approximation of tail probabilities; or from the
computing package Speakeasy that had those functions since
the 1960s, and I did a detailed review of the micro-Speakeasy
version for the American Statistician in 1987 (41, No.1, 71-76).

p Z (NBS) ET Speakez
StaTable?
.0001 3.71912 3.7190164... 3.7190164...
1(-5) 4.26489 4.2648907... 4.2648907,,,
1(-6) 4.75342 4.7534243... 4.7534243...
...
1(-10) 6.36134 6.3613409... 6.3613409..
1(-20) 9.26234 9.2623400,,, 9.2623400..

These are what I consider "professional" products, in terms of
reliability and accuracy. All of them have been around for
DECADES. So, why are statisticians still using inferior
products? My guess is probably better than yours. :-)

Notice, I accidentally discovered that the NBS handbook
probably had a typo for the .0001 case (NBS has thousands
of typos) because the other two completely independent
products had the identical results for all the p's tabulated
above.

What the above tells us is just how SHORT a tail the Gaussian
distribution has. You only have to go slightly beyond 9
standard deviations above the mean to have reached the
99.999999999999999999th percentile of the distribution.

StaTable, ET, and Speakeasy (so are it's later generation of
relatives S, S Plus, etc.) have built-in tail-distribution calculations
for most of the commonly used continuous distribution.

You can check to see how well (or poorly) they do on the
examples above, and other examples in probability calculations.

But there is a rarely known FACT (still not widely known) that
that the cdf of most discrete distributions have EXACT relation
to the tails of continuous distribution. Thus, given the fact that
we know how to compute the exact probabilities for cdfs of
continuous distributions, there is absolutely NO NEED for any
tables of discrete distributions that are usually found in
textbood -- which waste the time of instructors, students, to
learn how to use those INEXACT and inadequately tabulated
values.

For that reason, I submitted a paper to the American Statistician,
which appeared in 1992 (February, Vol 46, No. 1, 53-54), titled
"Just Say No to Binomial (and other Discrete Distributions) Tables".

Relevance to

> StaTable eliminates hunting for books of tables,

All continuous probability calculators eliminate hunting for tables
of commonly used discrete distributions which are grossly
incomplete.

That was the only paper I've ever submitted to any journal that
was accepted, in its original submission, without a single word
of suggested correction, except changing the TITLE from a more
mundane/boring one to its actual (more eye-catching one). which
was suggested by the idea of the Editor, as the ever popular
saying of those days, "Just Say No to Drugs" to the acid heads. :-)

The identities between continuous distribution tails and discrete
distribution tails are so obscure that they are not found in the
Johnson & Kotz bibles on continuous or discrete distributions;
they were not found in the Encyclopedia of Statistical Sciences;
but I found them in a 1968 JASA paper by Pratt and Peizer;
and found them to be extremely useful in my own JASA publications
in 1978 (73, 274-283, "A Study of the Accuracies of Some
Approximations for ...", and later in 1984 (79, 49-60, with John
Pratt, "The Acuracy of Peizer Approximations in the Hypergeometric
Distributions,"

Those approximation papers were in the dinosaur age of
statistical computation. Today, the power of the micro computer
makes them obsolete to program most of those approximations
because the EXACT cdfs can be computed quickly by using finite
or infinite series, which I used to check the accuracies of the
various approximations.

The foregoing historical notes are just my comment on the fact that
most of the "computing statisticians" are still using dinosaurs and
inaccurate software products when science and technology for
much better products had already been around for 3 to 4 DECADES.

>
> That said, it would have been nice if all three values were available.
> I won't second guess them, but StaTable handles discrete distributions
> as well as continuous, which raise the question,
>
> If the left tail is P(X<=k), should the right tail be P(X>k) or P(X>=k)?

This issue matters only for DISCRETE distributions. By standard
convention of cdf, the left tail is always the "<=" variety.

In another standard convention, e.g., NBS (26.2.2 for the normal),
where
the left tail, the cdf is denoted by P(X), the right tail, denoted by
Q(X) is
defined as the complement of P(X), or (1 - P(X)). This notion can
certainly be adopted to the discrete r.v., so that right tail would be
P(X>k). And since the right-tail, no matter how you define it, can
always be related to the left-tail, and the left-tail, the cdf, can
always
be related to the left-or-right tail of a continuous pdf, that would
seem
to be the natural way to resolve all potential ambiguity as to what's
the right-tail and what's the left-tail of a discrete distribution.

> At least the left tail has convention going for it, that is, it is
> usually P(X<=k).

That's the cdf convention I spoke about in the preceding paragraphs.

> I wish the displays were clearer. All the Java
> version says is "left tail", leaving some uncertainty about how it might
> be defined (<k or <-k).

Actually there's NO ambiguity in StaTable about that. For any
given value k, the left tail is always P( < k) -- which is why it is
so
misleading and inappropriate for p-values!

In the 3.236 example of the OP, the left tail is given as .999394
If you use the value -3.236, then you get the left tail as 6.06E-4,
using Z,

> However, the accompanying manual is explicit
> about what is being calculated. The Windows version is only marginally
> better, as the graphic is small.

Speaking of manuals, software products are now so user-UNfriendly
that a 3-inch thick manual usually has its better-selling manual
"Such and Such for Dummies" in the bookstores.

I still like this analogy of what a computer product SHOULD BE --
in terms of user-friendliness, which I attribute to Paul Velleman,
John Tukey's student, a developer of some user-friendly stat.
software products himself, "Have you EVER seen a kid playing
one of those game machines with a MANUAL in his hand?".

That was why I had developed the IDA (Interactive Data Analysis)
in 1972, DESIGNED to be used without ANY manual -- and it
proved to be usable that way, because it had been used in
dozens of universities WITHOUT a manual of any kind, until
the first Manual appeared in 1982, with this preface "this manual
is different from ANY OTHER MANUAL in one major respect:
this manual is dispensible, i.e., one can learn about, and make
proficient use of the system IDA without having to read this
manual at all." And so it was proven true by tens of thousands
of students who had used it between 1972 and 1982, The
ease-of-use features of the system made it so highly demanded
by other universities, by word of mouth, that it's populatity killed
itself. :-) (The U of Chicago couldn' t handle the sales and
maintenance, and so it to a competitor who promptly shelved it).
The system was moribund (nobody sold it) since about 1983.

>
> > Failing to do so, it was an unmistable sign that either the programmer
> > is UNAWARE of the correct definition of p-value, or is simply sloppy
> > in implementing something that's prone to giving erroneous answers
> > for the ">" Alternative.

I think this question had been resolved. It was never intended to
be a p-value evaluator, but simply a program to calculate certain
tail probabilities.

>
> FWIW, I suspect it was more of an interface design issue.

The design issue is TRIVIAL. If it was intended for p-value use,
any low-level programmer could have put in a p-value option and
simply ask about WHICH of the three options is the Alternative,
and the answer would have also been trivial, for anyone who can
calculate what the package is capable of calculating.

> While no one
> is perfect, I would put *very low* on my list the possibility that the
> folks at Cytel are unaware of the definition of a P value or how to
> calculate one properly, since they have a slew of papers in refereed
> statistics journals on the subject and that P values are the reason why
> StatXact and LogXact (their flagship products) exist.
>
> --Jerry

I have not used either of those products or any of their slews of
papers on P values. It is quite possible they ALWAYS considered
two-tailed tests and ONLY the left tail version in those papers,
then of course they wouldn't err in those cases, and by the same
token, they would not have shed any light on how p-values are
DEFINED (and executed). I recall we had this extended discussion
about this subject in which you embraced Doksum and Bickel's
NON-definition, which muddled the entire SIMPLE idea of using
the Alternative Hypothesis to determine what is "more extreme" in
the definition of a p-values.

Likewise, you appealed to the names of Doksum and Bickel as
well-known statisticians who are unlikely to err; but I simply
would not take that as a valid argument for their MUDDLING in the
explanation of what a p-value is, with respect to some alpha.
The value of alpha has NOTHING to do with a p-value. You
don't have to know anything about alpha to determine the p-value
as long as you know what the Alternative Hypothesis is!

Interested readers in my allegation of that muddle can go back
to the original tedious thread on p-values and judge for yourself,
why a p--value is INDEPENDENT of any alpha in a hypothesis test
problem, and it depends only on the "observed statistics" and
whether the Alternative is "<", ">", or "unequal".

At any rate, Jerry's comments gave me the opportunity to bring
out several closely related issues in statistical computing and how
"inadequate", "inaccurate" . and "unfriendly" many of the products
current in the market are, including some of the best-sellers.

JMO, supported by some historical facts and observations.

-- Reef Fish Bob.

Jerry Dallal

unread,

Jan 16, 2006, 8:22:38 PM1/16/06

to

Reef Fish wrote:

> These are what I consider "professional" products, in terms of
> reliability and accuracy. All of them have been around for
> DECADES. So, why are statisticians still using inferior
> products? My guess is probably better than yours. :-)
>

Maybe, maybe not, or we're both thinking the same thing.

People want free and convenient. Usually free trumps convenient.
StaTable is free. Another freebie is Jerry Hintze's Probability
Calculator. http://www.ncss.com/download.html

Anyone who wants to write a high-quality probability calculator and give
it away will likely see it used widely.

FWIW, I need extreme tail areas because journals are willing to allow
investigators to publish results about differences without reporting the
SD of the difference. The SDs for the two measurements are reported,
instead. However, the results are often accompanied by a P value for
the difference, which allows me to work backwards.