Who "invented" the null hypthesis?

Dieter Folz

unread,

Apr 28, 2006, 2:39:21 AM4/28/06

to

Hi everyone!

Talking about stats, some time during the explanations I was
practically asked by a friend what the origins and the fundamental
logic of the null hypothesis was. I didn*t really know the answer. In
fact I started to sweat as I explainied that you want to prove that,
let's say two groups differ significantly in a measure (body weight for
example) from each other (=your hypothesis, called H_1). Therefore,
due to aristotelian logic(?), (hegelian?) dialectic(?) or poppersian
falsificationism(?), you choose a null hypothsis (H_0) which says the
contratictory thing to H_1 (here: no difference). Then your finding
(which of course should prove your real hypothesis H_1) must be THAT
unlikely under the assumption that H_0 is true, that we have to say,
considerig an eror of x% (mostly 5%), the H_0 can't be true, because it
then is so immensly unlikely that we would have found these kind of
effects.

Apart from the allegation of cheating to just choose an H_0 to
pretend to do a falsification, I had more and more problems to explain
the concept clearly, and had more and more the impression that I don't
understand it myself correctly nor sufficient. Esp. I have no idea what
the real reason or basic idea of the null hypothesis is, what the
exact, specific and datailed ideas, mathematical and logical
backgrounds are.

Who "invented" the null hypothesis, whre does it come from, which
concepts are the basis for it? Hope, somebody can help me out.

Cheers, Dieter

P.S.: Of course I tried to clip(?) talking about effect size, power,
sample size, confidence intervalls and stuff ... but that is not the
realyity of stats and, as far as I understönd it, you need the H_0
too?!

Karl Ove Hufthammer

unread,

Apr 28, 2006, 4:38:15 AM4/28/06

to

Dieter Folz wrote:

> Apart from the allegation of cheating to just choose an H_0 to
> pretend to do a falsification, I had more and more problems to explain
> the concept clearly, and had more and more the impression that I don't
> understand it myself correctly nor sufficient. Esp. I have no idea what
> the real reason or basic idea of the null hypothesis is, what the
> exact, specific and datailed ideas, mathematical and logical
> backgrounds are.
>
> Who "invented" the null hypothesis, whre does it come from, which
> concepts are the basis for it? Hope, somebody can help me out.

See this article (it's much better than the same author's earlier article
on the same topic in American Statistician):

Author(s): Hubbard, R
Title: Blurring the distinctions between p's and alpha's in psychological
research
Source: THEORY & PSYCHOLOGY, 14 (3): 295-327 JUN 2004

Through your university, school or library, you may have online access to it
in PDF format from either:

http://www.swetswise.com/eAccess/viewAbstract.do?articleID=20094688&titleID=201074
http://tap.sagepub.com/cgi/reprint/14/3/295
or
http://md3.csa.com/ids70/linking.php?linktype=FulltextPDF&docid=sage-set-c%2FTAP_2004_14_3_cln.wais+38631+sagepsyc-set-c&SID=19ebc298f25d2322b4a076ae1db24802

--
Karl Ove Hufthammer

Karl Ove Hufthammer

unread,

Apr 28, 2006, 4:44:16 AM4/28/06

to

Karl Ove Hufthammer skreiv:

> See this article (it's much better than the same author's earlier article
> on the same topic in American Statistician):
>
> Author(s): Hubbard, R
> Title: Blurring the distinctions between p's and alpha's in psychological
> research
> Source: THEORY & PSYCHOLOGY, 14 (3): 295-327 JUN 2004

I'll also recommend a few more papers dealing with P-values and
significance testing that you'll be glad to have read. :)

1. An Applied Statician's Creed
Marks R. Nester
Applied Statistics, Vol. 45, No. 4 (1996), pp. 401-410

2. The Insignificance of Statistical Significance
Johnson, Douglas H. 1999.
Testing.Journal of Wildlife Management 63(3):763-772.
Jamestown, ND: Northern Prairie Wildlife Research
Center Home Page.
http://www.npwrc.usgs.gov/resource/1999/statsig/statsig.htm

The paper's available on the Web page above. Johson even got an
award for this one. And he deserved it.

3. The Earth is Round (p < .05)
Cohen, Jacob. 1994.
American Psychologist 49(12):997-1003.

A classic and a must-read!

If you are hooked up to a University or research network, you may
have access to the articles through JSTOR "http://www.jstor.org/".

--
Karl Ove Hufthammer

David Jones

unread,

Apr 28, 2006, 5:26:59 AM4/28/06

to

Dieter Folz wrote:
>
> Who "invented" the null hypothesis, whre does it come from,
which
> concepts are the basis for it? Hope, somebody can help me out.
>
>

See http://members.aol.com/jeff570/mathword.html under H for
hypothesis testing.

The following are extracts ...

"Although the use of probability in testing hypotheses is almost as
old as the study of probability, the modern terminology was largely
created by R. A. Fisher and the team of J. Neyman and E. S. Pearson in
the 1920s and -30s. "

"Null hypothesis appears in 1935 in Fisher's The Design of
Experiments. He writes, "[W]e may speak of this hypothesis as the
'null hypothesis,' and it should be noted that the null hypothesis is
never proved or established, but is possibly disproved, in the course
of experimentation." "

"This entry was contributed by John Aldrich. See also ASYMPTOTIC
RELATIVE EFFICIENCY, CHI-SQUARE, CRITICAL REGION, NEYMAN-PEARSON
(FUNDAMENTAL) LEMMA, NUISANCE PARAMETER, P-VALUE, POWER, SIGNIFICANCE,
SIMILAR REGION, SIZE, TYPE I ERROR, STUDENT'S t-DISTRIBUTION"

You should see the whole article, and possibly follow up on the other
keywords listed in the 3rd extract.

David Jones

Reef Fish

unread,

Apr 28, 2006, 11:11:56 AM4/28/06

to

Dieter Folz wrote:
> Hi everyone!
>
> Talking about stats, some time during the explanations I was
> practically asked by a friend what the origins and the fundamental
> logic of the null hypothesis was. I didn*t really know the answer.
>

> Who "invented" the null hypothesis, whre does it come from, which
> concepts are the basis for it? Hope, somebody can help me out.

Your questions are excellent ones, and the subject is sufficiently
important (and oft misunderstood) that I am cross-posting it in the
three groups), because I had found the material below in an
elementary textbook we had adopted as the textbook of a very
large enrollment course.

The textbook was excellent in most respects, except the author(s)
were just as confused about the hypothesis testing issue and the
role of the Null and Alternative hypotheses as you are.

So, I sent the publisher an analogy I had used in my courses,
for HIM to relate to, or give to, the author(s) so that they
would revise their textbook and not make the same mistakes in
setting up the direction(s) of Ho vs H1.

The analogy is between how Hypothesis Testing works as an
EXACT analogy of how the courts of (criminal) law work.

Lo and behold, in the next edition of the textbook, I saw what was
virtually the identical diagram (analogy) as I am showing below.
It was a clear case of plagiarism on their part, not so much because
their diagram looked exactly like what I had given the publisher
(which almost anyone could have independently done the same),
but because even after they stole my material, the LEFT THE
SAME MISTAKES in the textbook that were contrary to what they
said in their text, stolen from me! :-)

Here it is:

Since the alignment of two 2 x 2 tables may be bad here, I'll
show each 2 x 2 separately to draw the analogy:

Law: the person on trial is the Defendant. There are two possible
true states (Guilty or Not Guilty), and the Jury's verdict is
"Guilty" or "Not Guilty" (ignore hung jury and other
anamalies).

So, the 2 x 2 table would look like this:

LAW Not Guilty Guilty
(Defendant) verdict verdict

true state Correct Incorrect
Not Guilty verdict verdict

true state Incorrect Correct
Guilty verdict verdict

In statistical Hypothesis Testing, the corresponding 2 x 2 table is

Ho Accept Ho Reject Ho
(Null Hyp)

Ho TRUE Correct Type I Error

Ho NOT true Type II Error Correct

These are the KEY elements of the analogy:

1. In a court of law, a defendant is INNOCENT until PROVEN guilty.
In Hyp testing, Ho is assumed to be TRUE until proven otherwise

Defendant = Ho
Guilty = Reject Ho

2. When a "Not Guilty" verdict is rendered, it does NOT prove the
innocence of the defendant; it simply means the evidence is
not sufficiently strong to warrant a "Guilty" verdict.

(Think of the O.J. murder trial, In the criminal court, much
publicized by national TV, O.J. was acquitted ("not guilty")
even though the evidence was strongly against him. The
criminal court requires a "guilty" verdict to be rendered ONLY
IF the evidence of guilt is BEYOND REASONABLE DOUBT
(some phrase at as "beyond a shadow of a doubt"). In other
words, the "not guilty" verdict simply means the evidence was
not strong enough, by criminal court standards, to convict
O.J.; it DID NOT support (and was not supposed to support)
O.J's innocence. O.J. would have been declared "Guilty"
in Judge Wapler/Judy TV/civil courts and other lower courts.

KEY: In a court of law, you CANNOT strengthen the belief
of innocence. You can only accept the assumed
innocence of the Defendant, or PROVE GUILT.

The analogy in Hypothesis testing is clear.

The verdict language is either: "Ho is accepted" or "Ho is
"rejected". It is NOT proper to state any decision in terms
of H1. To say Ho is accepted does NOT strengthen the
truth of Ho. It requires different strengths of evidence to
"reject Ho" just as different strengths of evidence to
render a "Guilty" verdict in different kinds of courts.

The Court standards are: "Beyond a reasonable doubt" in
criminal courts, and "A preponderance of the evidence is
against the Defendant" in Civil courts (the TV Wapner type).

For more details about courts of LAW and standards, see

http://groups.google.com/group/rec.scuba.locations/msg/61f47f08ea2fe7ec?hl=en&

or (same) see: Law for Dummies http://tinyurl.com/e7lzm

We FINALLY come to the PROBABILITY and ASYMMETRY part
of the analogy.

In criminal court, because of the "beyond a reasonable doubt",
most judges equate the strengh of the evidence to 90 percent
or higher probability of guilty. That's why real criminals are
declared "not guilty" most of the time, under the Western Law
of putting high priority on NOT convicting the innocent.

In URL cited above, I had these facts from a published article
in the American Statistician, by Joseph Gastwirth:
======================
The FOUR standards are:

1. Preponderance of the evidence
2. Clear and convincing evidence
3. Clear, convincing, and unequivocal evidence
4. Evidence beyond a reasonable doubt

There were 10 judges in the Gastwirth panel, each of whom gave
PROBABILITY OF GUILT required to convict a defendant under each of
those FOUR standards.

On average, these 10 judges rated (1) to be 50+%; (2) to be
around 60%; (3) to be around 75%; and (4) to be about 90%.
=======================

In hypothesis testing,

alpha = Pr (Type I Error) = Pr (rejecting Ho when Ho is true).

Thus, the 90% in Law roughly corresponds to the alpha = .05
used in hypothesis testing, or "statistical law". ;)

beta = Pr (Type II Error) = Pr (Accepting Ho when Ho is false)

The analogy is complete.

Let's say you want to compare the means of two populations.
If you want to PROVE that mu1 is greater than mu2, you must
put that as the Alternative hypothesis: H1: mu1 < mu2
while the null Hyp is Ho: mu2 = mu1.

Relation to p-value in a test.

For most hypothesis tests of this kind, I told my students to use
"=" in the Ho ALWAYS. Then there are THREE possible
alternative hypotheses: "<" , ">", or "not equal", and the
direction of the ALTERNATIVE hypothsis in turn determines
the p-value of a test because it is the probability of observing
a value of a test statistic MORE EXTREME than the one
observed, and "more extreme" means Pr (Z < Z*), or
Pr(Z > Z*), or Pr( Z < -|Z*| or Z > |Z*|) if Z is the test
statistic, and Z* is the observed value in the text.

-- Bob.

Kevin E. Thorpe

unread,

Apr 28, 2006, 11:45:59 AM4/28/06

to

Reef Fish wrote:

> The verdict language is either: "Ho is accepted" or "Ho is
> "rejected". It is NOT proper to state any decision in terms
> of H1. To say Ho is accepted does NOT strengthen the
> truth of Ho. It requires different strengths of evidence to
> "reject Ho" just as different strengths of evidence to
> render a "Guilty" verdict in different kinds of courts.

I have tended to avoid the phrase "Ho is accepted" in favour
of "Ho is not rejected" to drive home the point that we
have not proved the null hypothesis is true. I am concerned
about acceptance being equated with truth.

--
Kevin E. Thorpe
Assistant Professor, Department of Public Health Sciences
Faculty of Medicine, University of Toronto

Reef Fish

unread,

Apr 28, 2006, 1:28:41 PM4/28/06

to

Kevin E. Thorpe wrote:
> Reef Fish wrote:
>
> <snip very clear description>
>
> > The verdict language is either: "Ho is accepted" or "Ho is
> > "rejected". It is NOT proper to state any decision in terms
> > of H1. To say Ho is accepted does NOT strengthen the
> > truth of Ho. It requires different strengths of evidence to
> > "reject Ho" just as different strengths of evidence to
> > render a "Guilty" verdict in different kinds of courts.
>
> I have tended to avoid the phrase "Ho is accepted" in favour
> of "Ho is not rejected" to drive home the point that we
> have not proved the null hypothesis is true. I am concerned
> about acceptance being equated with truth.
>
> --
> Kevin E. Thorpe

Excellent point. As a matter of fact, I PREFER the form
"Ho is not rejected" It in fact a better direct analogy
between "Ho is is not accepted" and "the Defendant is
not guilty".

The essential point of the analogue is that "Ho" is the
"Defendant" in a court or law.

I may have indeed used that in my 2 x 2 slides. I'll check
on that when I take the time to look at my own handout. :-)

I wrote the entire post from memory.

> Assistant Professor, Department of Public Health Sciences
> Faculty of Medicine, University of Toronto

When you are ready to get promoted, ask your Committee Chair
to request me for a letter of recommendation. :-)

One of my co-authors was denied Full Professorship promotion
in a VERY prestigious Ivy League University. I thought it was a
clear case of injustice caused by an unfriendly Committee Chair,
upheld by the Faculty and Dean (as the usual rubber stamp of
the Chair's recommendation). So, I took his academic credentials,
requested letters of recommendations from a dozen or so ASA
Fellows, and wrote MY strong recommendation and nominated
him to be a Fellow. He was elected ASA Fellow. I told him to
show my nomination dossier to the Dean, who promptly reversed
reversed the promotion decision (as a FIRST in that university's
history, to the best of our knowledge), and retroactively promoted
him to Full Professor, about two years after the initial denial of
promotion.

I found that effort of mine MUCH more rewarding than publishing
a few papers to add to the published pollution in academia already.

-- Bob.

Jerry Dallal

unread,

Apr 28, 2006, 1:34:08 PM4/28/06

to

Kevin E. Thorpe wrote:
> Reef Fish wrote:
>
> <snip very clear description>
>
>> The verdict language is either: "Ho is accepted" or "Ho is
>> "rejected". It is NOT proper to state any decision in terms
>> of H1. To say Ho is accepted does NOT strengthen the
>> truth of Ho. It requires different strengths of evidence to
>> "reject Ho" just as different strengths of evidence to
>> render a "Guilty" verdict in different kinds of courts.
>
> I have tended to avoid the phrase "Ho is accepted" in favour
> of "Ho is not rejected" to drive home the point that we
> have not proved the null hypothesis is true. I am concerned
> about acceptance being equated with truth.
>

Agreed. I deduct points when students "accept the null hypothesis". :-)

In fact, Bob got it right with the verdict. It is NOT "guilty" or
"innocent" ("reject" or "accept"). It is "guilty" or "not guilty"
("reject" or "fail to reject").

Jerry Dallal

unread,

Apr 28, 2006, 1:35:24 PM4/28/06

to

I see Bob's and my notes crossed. I believe his beat mine by a minute
or two.

Reef Fish

unread,

Apr 28, 2006, 2:06:22 PM4/28/06

to

Jerry Dallal wrote:
> Kevin E. Thorpe wrote:
> > Reef Fish wrote:
> >
> > <snip very clear description>
> >
> >> The verdict language is either: "Ho is accepted" or "Ho is
> >> "rejected". It is NOT proper to state any decision in terms
> >> of H1. To say Ho is accepted does NOT strengthen the
> >> truth of Ho. It requires different strengths of evidence to
> >> "reject Ho" just as different strengths of evidence to
> >> render a "Guilty" verdict in different kinds of courts.
> >
> > I have tended to avoid the phrase "Ho is accepted" in favour
> > of "Ho is not rejected" to drive home the point that we
> > have not proved the null hypothesis is true. I am concerned
> > about acceptance being equated with truth.
> >
>
> Agreed. I deduct points when students "accept the null hypothesis". :-)

Fascist! :-))

It would make know difference whatever my students write as long
as they UNDERSTAND what that means or implies. It's 100%
Open Book and Notes. :)

>
> In fact, Bob got it right with the verdict. It is NOT "guilty" or
> "innocent" ("reject" or "accept"). It is "guilty" or "not guilty"
> ("reject" or "fail to reject").

Would you like to hang the judge and jury who ACQUITTED
the Defendant, hence imply the status quo of the Defendant's
"Innocent" status? <BG>

-- Bob.

Old Mac User

unread,

Apr 28, 2006, 2:18:24 PM4/28/06

to

Thanks for your link to the Johnson paper "The Insignificance of
Statistical Significance".
It's excellent. I've not used the expressions "null hypothesis" or
"alternative hypothesis"
or even "statistical significance" since about 1963. I teach stat
courses for many companies
and other organizations but I march to a different drummer. Now... if
we can get the word
out to textbook publishers (ha ha) and to academics....

Reef Fish

unread,

Apr 28, 2006, 2:39:08 PM4/28/06

to

Or preferably, Do not reject Ho Reject Ho as
pointed out by

Kevin, Jerry, and myself

I've just noticed a SERIOUS typo that must be corrected! This
was the reason for THIS post:

> Let's say you want to compare the means of two populations.
> If you want to PROVE that mu1 is greater than mu2, you must
> put that as the Alternative hypothesis: H1: mu1 < mu2

Of course I meant H1: mu1 > mu2, Then, the rejection of Ho

would have supported H1. That was actually not so much of a
typo than changing ">" to "<" but forgot to change the "greater
than" to "less than". :-) The change to "<" was because in
the p-value section, I consistently used the order "<", ">" and
"not equal".

At any rate, that gives one more opportunity to emphasize
if you want to PROVE something, DON'T put it in the null
hypothesis Ho. but in the Alternative, H1.

-- Bob.

Bruce Weaver

unread,

Apr 28, 2006, 2:58:12 PM4/28/06

to

Here are a couple of interesting usenet posts on the hypothesis testing
debate by one of those academics. ;-)

www.angelfire.com/wv/bwhomedir/notes/mcleanht.html

--
Bruce Weaver
bwe...@lakeheadu.ca
www.angelfire.com/wv/bwhomedir

David A. Heiser

unread,

Apr 28, 2006, 8:26:17 PM4/28/06

to

"Reef Fish" <Large_Nass...@Yahoo.com> wrote in message
news:1146249548.7...@y43g2000cwc.googlegroups.com...

>
> Reef Fish wrote:
>> Dieter Folz wrote:
>> > Hi everyone!

see preveious messages for all this dialog left out.

> At any rate, that gives one more opportunity to emphasize
> if you want to PROVE something, DON'T put it in the null
> hypothesis Ho. but in the Alternative, H1.
>
> -- Bob.

This is the essential point. The only two strong logical inferences are
modus ponens and modus tollens. If the world is binary (a hypothesis is
based on a binary world), then if the null hypothesis is destroyed, then the
only valid conclusion (modus tollens) is that the null hypothesis is
"destroyed". The statement then that the alternate hypothesis is true is
then the only logical conclusion. However our world is not binary, and other
alternate hypotheses may also be true or false. Under Popper, we are left
open, unable to claim that the alternate hypothesis is really true. This is
the essential philosophical problem of science here.

DAH

David A. Heiser

unread,

Apr 28, 2006, 9:16:51 PM4/28/06

to

"Dieter Folz" <diete...@gmx.de> wrote in message
news:1146206361....@u72g2000cwu.googlegroups.com...
Hi everyone!

Cheers, Dieter
++++++++++++++++++++++++++++++++++++++++++++++
The book "What If There Were No Significance Tests?" by Harlow, Mulaik and
Steiger (Erlbaum 1997) is recommended for backgraound on the essential
problems in its use today..

Harlow points out that Karl Pearson in 1901 laid the groundwork for a
scientific hypothesis with sample data. Fisher then proposed a set of
methods for testing (ca 1930) based on p values. Then of course Neyman came
out with the basic method (1928). The book "The Lady Tasting Tea", describes
the uproar in the Royal Society when Neyman proposed his theory and was
challenged. The controversy between Fisher and Neyman in the 1930's is part
of the interesting history here.

Hubbard and Bayarri's TAS article ("Confusion Over Measures of Evidence
(p's) Versus Error (alpha) in Classical Statistical Testing") in the Aug
2003 issue of TAS describes the differences between the Fisher view,
Neyman's views, and the horrible mixup of ideas that is the current
hypothesis test. Serlin's article in JASM, vol 1, no. 2 puts it all in the
historic view of the pusuit of a philosophy of science.

Neyman's original view was that hyposthesis testing was only to be used a a
tool for quality control in manufacturing processes. His terminology and
usage can only be properly understood from this context.

DAH

Anon.

unread,

Apr 29, 2006, 10:10:47 AM4/29/06

to

I guess it might be more accurate to say "guilty" or "not proven" (not
proven is a possible verdict in Scottish law).

Bob

--
Bob O'Hara
Department of Mathematics and Statistics
P.O. Box 68 (Gustaf Hällströmin katu 2b)
FIN-00014 University of Helsinki
Finland

Telephone: +358-9-191 51479
Mobile: +358 50 599 0540
Fax: +358-9-191 51400
WWW: http://www.RNI.Helsinki.FI/~boh/
Journal of Negative Results - EEB: www.jnr-eeb.org

Reef Fish

unread,

Apr 29, 2006, 10:51:39 AM4/29/06

to

Anon. wrote:
> Jerry Dallal wrote:
> > Kevin E. Thorpe wrote:
> >> Reef Fish wrote:
> >>
> >> <snip very clear description>
> >>
> >>> The verdict language is either: "Ho is accepted" or "Ho is
> >>> "rejected". It is NOT proper to state any decision in terms
> >>> of H1. To say Ho is accepted does NOT strengthen the
> >>> truth of Ho. It requires different strengths of evidence to
> >>> "reject Ho" just as different strengths of evidence to
> >>> render a "Guilty" verdict in different kinds of courts.
> >>
> >> I have tended to avoid the phrase "Ho is accepted" in favour
> >> of "Ho is not rejected" to drive home the point that we
> >> have not proved the null hypothesis is true. I am concerned
> >> about acceptance being equated with truth.
> >>
> >
> > Agreed. I deduct points when students "accept the null hypothesis". :-)
> >
> > In fact, Bob got it right with the verdict. It is NOT "guilty" or
> > "innocent" ("reject" or "accept"). It is "guilty" or "not guilty"
> > ("reject" or "fail to reject").
>
> I guess it might be more accurate to say "guilty" or "not proven" (not
> proven is a possible verdict in Scottish law).

> --
> Bob O'Hara

Anon Bob O'Hara,

What is "not proven"? If you meant to say that the non-rejection of
a Null Hypothesis or the acceptance of a Null Hypothesis is
synoymous with "Ho is not proven", then you have missed the
boat, the harbour, and the ocean of this thread.

The Null Hypothesis is not MEANT to be proven, and it can NEVER
be proven in a Hypothesis Testing setting.

The correctness of a Null Hypothesis or the Innocence of a Defendant
cannot be strengthened by a statistical or law verdict.

While we are on the subject of Law, I should have mentioned that the
analogy I wrote about applies primarily to the practice of law in the
USA.

In countries that have laws under the Napolean code, such as France,
Spain, and Mexico, a Defendant is GUILTY until proven innocent --
a completely opposite scenario from the US courts of law.

-- Reef Fish Bob.

Anon.

unread,

Apr 29, 2006, 3:05:57 PM4/29/06

to

You either reject the null hypothesis ("guilty"), or fail to reject it
("not proven"). This is apt because in practice the null hypothesis is
a straw man, so hypothesis testing can be seen as little more than
asking if you have enough data to reject a hypothesis that you know is
wrong anyway.

robert...@gmail.com

unread,

Apr 29, 2006, 3:18:08 PM4/29/06

to

> > Who "invented" the null hypothesis, whre does it come from, which
> > concepts are the basis for it? Hope, somebody can help me out.

The term "null hypothesis" seems to have been invented by Fisher,
although Pearson and Neyman floated similar terms for similar concepts
around the same time (1925--1935). The earliest example of a test in
the style of Fisher et al. (that I know of) is "An Argument for Divine
Providence,
taken from the constant Regularity observed in the Births of both
Sexes,"
by John Arbuthnott, Philosophical Transactions of the Royal Society
of London 27, (1710-1712), 186-190.

Although Fisher debated various details, noisily and at great length,
with P & N, they did agree that probablilities can't be assigned
to hypotheses; hypothesis and/or significance testing (call it what
you will, depending on whose camp you follow) is a work-around
for the lack of probabilities of hypotheses. It is quite unnatural,
hence proponents must go to great lengths to first relieve students
of their unhealthy longing for P(H), and then to implant the sacred
teachings. Luckily, it doesn't always take hold.

For more on the history of statistics, see anything by Stephen Stigler.
A very interesting on-line resource is "Earliest Known Uses of Some of
the Words of Mathematics" (http://members.aol.com/jeff570) by Jeff
Miller.

HTH
Robert Dodier

illywhacker

unread,

Apr 29, 2006, 7:43:52 PM4/29/06

to

Dear Dieter,

"Dieter Folz" <diete...@gmx.de> wrote in message
news:1146206361....@u72g2000cwu.googlegroups.com...

Hi everyone!

Talking about stats, some time during the explanations I was
practically asked by a friend what the origins and the fundamental
logic of the null hypothesis was. I didn*t really know the answer. In
fact I started to sweat as I explainied that you want to prove that,
let's say two groups differ significantly in a measure (body weight for
example) from each other (=your hypothesis, called H_1). Therefore,
due to aristotelian logic(?), (hegelian?) dialectic(?) or poppersian
falsificationism(?), you choose a null hypothsis (H_0) which says the
contratictory thing to H_1 (here: no difference). Then your finding
(which of course should prove your real hypothesis H_1) must be THAT
unlikely under the assumption that H_0 is true, that we have to say,
considerig an eror of x% (mostly 5%), the H_0 can't be true, because it
then is so immensly unlikely that we would have found these kind of
effects.

Apart from the allegation of cheating to just choose an H_0 to
pretend to do a falsification, I had more and more problems to explain
the concept clearly, and had more and more the impression that I don't
understand it myself correctly nor sufficient. Esp. I have no idea what
the real reason or basic idea of the null hypothesis is, what the
exact, specific and datailed ideas, mathematical and logical
backgrounds are.

---------------

I think you had problems because the null hypothesis idea makes no sense as
a general procedure. To quote Jeffreys: "an hypothesis [H0] that may be true
is rejected because it has failed to predict observable results that have
not occurred. This seems a remarkable procedure. On the face of it, the
evidence might more reasonably be taken as evidence for the hypothesis, not
against it."

I suggest you forget this bizarre procedure and instead read a book on
Bayesian methods.

illywhacker.

Lou Pecora

unread,

Apr 30, 2006, 10:44:26 AM4/30/06

to

In article <1146338288....@i39g2000cwa.googlegroups.com>,
robert...@gmail.com wrote:

> > > Who "invented" the null hypothesis, whre does it come from, which
> > > concepts are the basis for it? Hope, somebody can help me out.
>
> The term "null hypothesis" seems to have been invented by Fisher,
> although Pearson and Neyman floated similar terms for similar concepts
> around the same time (1925--1935). The earliest example of a test in
> the style of Fisher et al. (that I know of) is "An Argument for Divine
> Providence,
> taken from the constant Regularity observed in the Births of both
> Sexes,"
> by John Arbuthnott, Philosophical Transactions of the Royal Society
> of London 27, (1710-1712), 186-190.
>
> Although Fisher debated various details, noisily and at great length,
> with P & N, they did agree that probablilities can't be assigned
> to hypotheses; hypothesis and/or significance testing (call it what
> you will, depending on whose camp you follow) is a work-around
> for the lack of probabilities of hypotheses. It is quite unnatural,
> hence proponents must go to great lengths to first relieve students
> of their unhealthy longing for P(H), and then to implant the sacred
> teachings. Luckily, it doesn't always take hold.

What is the alternative? This is just the problem of probability of the
prior in Baysian approaches, right?

-- Lou Pecora (my views are my own) REMOVE THIS to email me.

Dieter Folz

unread,

Apr 30, 2006, 12:25:25 PM4/30/06

to

Hi all!

David Jones schrieb:

> "Although the use of probability in testing hypotheses is almost as
> old as the study of probability, the modern terminology was largely
> created by R. A. Fisher and the team of J. Neyman and E. S. Pearson in
> the 1920s and -30s. "
>
> "Null hypothesis appears in 1935 in Fisher's The Design of

> Experiments. [...]

Thanx, that did help a lot! Due to that, I remembered (very)
vaguely that I had had to learn that many many years ago. At that time
I wasn't interested of course ;-).

I was in fact also quite aware of the actual problems with NHST,
Cohen etc. Nevertheless, my interest was only about the theoretical
ideas behind the null hyptheses. But esp. in reference to Reef Fish it
seems that it just doesn't have a deep philosophical explanation but is
some kind of "smart workaround" based more on common sense, than on
ideas about science by Aristotle, Hegel or Popper.

The point of "proving" or "accepting" the H_0 was in that context
new to me, because I never learned it that way. We always said that we
"retain the null hypothesis" and that this is *only* the proposition of
the statistical testing (the *approach* of that statistical decision
making is of course bayesian). That *statistical decision* has for now
nothing to do how we actually decide about prove or disprove, and that
prove or disprove refers *only* to our research hypothesis.

But I was curious about this point, and looked it up in my (very)
old German stats textbook -- I consulted that before on my question
above but didn't find answers for that. But in the case what an H_0
*means* it says that the null hypothesis is an "empty" hypotheses.
Would it be a hypotheses derived from a theory it can't be a null
hypothesis, it would have to be another alternative hypothesis (which
would have to have it's own H_0). It is just a complementary
"statistical statement" (sorry for these inappropriate translations).
And NOW I think I know why my teacher back at school made such a great
fuss about "translatiing the research hypothesis into a statistical
hypothesis", "then formulating the H_0" and about seperating that two
levels (research hypotheseis and the statistical hypothesis for the
statistical desicion) strictly from another.

Thx again to all, and cheers,
Dieter

Dieter Folz

unread,

Apr 30, 2006, 12:30:46 PM4/30/06

to

illywhacker schrieb:

> Dear Dieter,
[...]

> I suggest you forget this bizarre procedure and instead read a book on
> Bayesian methods.

Any recomendations (esp. introductiory texts, 'cause I only know that
basic theorem from standard stats books)?

Dieter

Reef Fish

unread,

Apr 30, 2006, 1:17:29 PM4/30/06

to

Dieter Folz wrote:
> Hi all!
>
>
> David Jones schrieb:
>
>
> > "Although the use of probability in testing hypotheses is almost as
> > old as the study of probability, the modern terminology was largely
> > created by R. A. Fisher and the team of J. Neyman and E. S. Pearson in
> > the 1920s and -30s. "
> >
> > "Null hypothesis appears in 1935 in Fisher's The Design of
> > Experiments. [...]
>
>
> Thanx, that did help a lot! Due to that, I remembered (very)
> vaguely that I had had to learn that many many years ago. At that time
> I wasn't interested of course ;-).
>
>
> I was in fact also quite aware of the actual problems with NHST,
> Cohen etc. Nevertheless, my interest was only about the theoretical
> ideas behind the null hyptheses. But esp. in reference to Reef Fish it
> seems that it just doesn't have a deep philosophical explanation but is
> some kind of "smart workaround" based more on common sense, than on
> ideas about science by Aristotle, Hegel or Popper.

Well, it wasn't even a "smart workaround". It has many known DEFECTS,
so the idea of "science and deep philosophy" had been disregarded
LONG, long ago -- before you started many, many years ago.

One of the known defects is that of testing a "sharp" hypothesis. If
you want to do a hypothesis test to find out if there is any difference
between the means of two (normal) populations, say, you would test

Ho: mu1 - mu2 = 0 vs H1: mu1 - mu2 .NE. 0.

using one of the well-known T-tests, say. That's a case in which
Anon Bob O'Hara must be thinking (or some other cases) that the
Ho is known to be false before you even collect any data. :-)
Give me ANY two populations, you can ALWAYS reject Ho by a
sufficiently large sample.

Some of the more subtle DEFECTS of the Neyman-Pearson
(sometimes call Classical) approach are discussed in the
Edwards, Lindman, and Savage paper, "Bayesian Statistical
Inference for Psychological Research" in Psychological
Review (1963).

Jimmie (L.J.) Savage wrote "No aspects of classical statistics
has been so popular with psychologists and other scientists
as hypothesis testing though some classical statisticians
agree with us that the topic has been overemphasized. A
statistician of great experience told us, "I don't know much
about tests, because I have never had occasion to use one."

Indeed, in the Bayesian statistics approach, of which L.J.
Savage has often been called the "founder" of that branch
of Statistics in the USA, all of the KNOWN defects of
classical statistics, both in hypothesis testing AND in the
use of Confidence Intervals disappear. There is never a
NEED for any hypothesis test of the "sharp null hypothesis"
type. And the Bayesian confidence interval (called "credible
intervals" to distinguish it from the classical C.I.) can NEVER
result in the silly example given by Hogg and Craig in their
textbook that a 95% confidence interval about a parameter
can turn out to contain the unknown parameter 100% of
the time, once a sample is drawn, because of the notion
of the realization of a random interval.

Thus, if anyone is viewing Statistics from a scientific and
philosophical angle, the Classifical approach of Neyman-
Person couldn' t be a WORSE place to look.

On the other hand, the Edwards, Lindman, and Savage
paper is quite elementary in mathematical statistics,
but it contains some VERY DEEP ideas about Statistics
from the science and philosophy angle, in its 50 or so
pages writeen for the psychologists.

-- Bob.

Anon.

unread,

Apr 30, 2006, 1:39:31 PM4/30/06

to

I guess the book that's most used is "Bayesian Data Analysis" by Gelman
et al. It has a homepage here: <http://www.stat.columbia.edu/~gelman/book/>

There are others as well, for example David Spiegelhalter has co-written
a book for the medical sciences:
(http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0471499757,subjectCd-ST11.html)

Anon.

unread,

Apr 30, 2006, 1:56:19 PM4/30/06

to

It's a Bayesian problem, yes. But it's not the problem of probability
of the prior.

The problem is that you would like to calculate P(H|D) (H=Hypothesis,
D=Data). Fisher had formalised the idea of using the likelihood
(P(D|H)) in inference, but that's not what is needed, hence all these
strange hoops frequentists leap through.

Bayesians can calculate P(H|D) from Bayes' rule:

P(H|D) = P(H).P(D|H)/P(D)

where
P(H): Prior
P(D|H): likelihood
P(D): Prior predictive distribution

P(D) is just a normalising constant, so in practice it is ignored (all
the estimation is done numerically, and there are techniques that avoid
the need to calculate P(D)). The problem is P(H): the prior. This is
the probability of the hypothesis before you see the data. It is
inevitably subjective, which causes some people all sorts of problems.

Reef Fish

unread,

Apr 30, 2006, 2:09:26 PM4/30/06

to

Anon. wrote:
> Dieter Folz wrote:
> > illywhacker schrieb:
> >
> >> Dear Dieter,
> > [...]
> >> I suggest you forget this bizarre procedure and instead read a book on
> >> Bayesian methods.
> >
> >
> > Any recomendations (esp. introductiory texts, 'cause I only know that
> > basic theorem from standard stats books)?

I have used some introductory textbooks on Bayesian statistics, as well
as advanced books on Bayesian statistics.

These books tend to be either low-level cookbooks on HOW to do the
elementary problems in binomial and normal situations or high level
books in "matrix slinging" as those by Arnold Zellner, who would
apply one of two summary statistics to some mathemtical gobbledy
goop and calls that his PRIOR distribution (or personal belief), and
then starts his matrix slinging in conjugate priors, non-informative
prior, and every kind of priors except what a concientious Bayesian
would use.

None of those books is appropriate for anyone who wants to learn
the Bayesian APPROACH (without getting bogged down by ist
nasty details, as in books by Raiffa and Schleifer which have been
around for a long timme).

The article by Lindman, Edwards, and Savage is heads and shoulders
above ANY book or expository article in Bayesian statistics since I
learned that subject from Savage himself, and had taught the subject
from Bayesian textbooks.

> >
> I guess the book that's most used is "Bayesian Data Analysis" by Gelman
> et al. It has a homepage here: <http://www.stat.columbia.edu/~gelman/book/>

Gelman was still grad student at Harvard when I was Visiting Full
Professor
there in 1990. I have never seen this text and don't doubt that it
would be
a useful textbook for a Bayesian course -- but seriously doubt that it
has
nearly as much coverage, and nearly as DEEP a coverage of the
philosophy of the Bayesian approach as the Lindman et al paper (which
was unmistakably written by Savage because of Savage's writing style).
Savage himself used that paper as a reference "textbook" in his
graduate
course at the Yale Statistics Department about Bayesian Statistics.

-- Bob.

Anon.

unread,

Apr 30, 2006, 2:36:32 PM4/30/06

to

Reef Fish wrote:
> Anon. wrote:
>> Dieter Folz wrote:
>>> illywhacker schrieb:
>>>
>>>> Dear Dieter,
>>> [...]
>>>> I suggest you forget this bizarre procedure and instead read a book on
>>>> Bayesian methods.
>>>
>>> Any recomendations (esp. introductiory texts, 'cause I only know that
>>> basic theorem from standard stats books)?
>
> I have used some introductory textbooks on Bayesian statistics, as well
> as advanced books on Bayesian statistics.
>
> These books tend to be either low-level cookbooks on HOW to do the
> elementary problems in binomial and normal situations or high level
> books in "matrix slinging" as those by Arnold Zellner, who would
> apply one of two summary statistics to some mathemtical gobbledy
> goop and calls that his PRIOR distribution (or personal belief), and
> then starts his matrix slinging in conjugate priors, non-informative
> prior, and every kind of priors except what a concientious Bayesian
> would use.
>
> None of those books is appropriate for anyone who wants to learn
> the Bayesian APPROACH (without getting bogged down by ist
> nasty details, as in books by Raiffa and Schleifer which have been
> around for a long timme).
>

There has been a large change in the way Bayesian stats is approached
over the last 10 or so years. Thanks to MCMC we can now fit models that
frequentists have trouble with. Most modern Bayesian textbooks avoid
both of these problems, and show how complex models can be developed and
fitted to data.

Reef Fish

unread,

Apr 30, 2006, 2:51:21 PM4/30/06

to

Some new approaches have been introduced, but all pseudo-
Bayesian.

No progress has been made, AFAIK, in the areas of soliciting one's
OWN prior distribution, realistically, even in the BIVARIATE cases,
let alone any "complicated" multivariate cases.

> Thanks to MCMC we can now fit models that
> frequentists have trouble with.

MCMC has been around when I was teaching graduate courses
in Monte Carlo methods. Being able to fit certain models to data
does not remove the basic FLAW of NOT being able to express
your OWN (Bayeaisn) prior (opinion) in the form of a prior
distribution that is informative, non-conjugate, and non-diffuse.

> Most modern Bayesian textbooks avoid
> both of these problems, and show how complex models can be developed and
> fitted to data.

Only in a very superficial Bayesian-like way. I lumped that into
the class of "pseudo" or "quasi" Bayesian methods, worse in
principle and in practice, than non-Bayesian methods.

> --
> Bob O'Hara

-- Reef Fish Bob.

Lou Pecora

unread,

Apr 30, 2006, 5:40:47 PM4/30/06

to

In article <e32toe$cht$1...@phys-news4.kolumbus.fi>,
"Anon." <bob....@NOSPAMhelsinki.fi> wrote:

Well, it sounds like my novice understanding of the Baysian approach is
at least in line with yours. I understand and agree with you (from my
outside viewpoint). I thought Robert's point was that all the stuff
Fisher was trying to do with hypothesis rejection (concentrating on
P(D|H) instead of P(H|D) ) was precisely because there was no way to get
at P(H). Is my take right?

Herman Rubin

unread,

Apr 30, 2006, 9:46:40 PM4/30/06

to

In article <1146206361....@u72g2000cwu.googlegroups.com>,

Dieter Folz <diete...@gmx.de> wrote:
>Hi everyone!

> Talking about stats, some time during the explanations I was
>practically asked by a friend what the origins and the fundamental
>logic of the null hypothesis was.

It almost certainly originated in the 17th or early 18th
century. This was the time that physics was rapidly
developing, and from the success of the simple theories,
it was felt that those simple theories, or not too
complicated modifications of them, were the TRUTH.

With this point of view, is it at all surprising that the
idea of the null hypothesis came about? Occam's Razor
only adds to the argument.
--
This address is for information only. I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Department of Statistics, Purdue University
hru...@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558

Anon.

unread,

May 1, 2006, 12:01:58 AM5/1/06

to

Yes. You want to invert the conditioning (i.e. go from P(D|H) to
P(H|D)), and mathematically the only way to do that is through Bayes'
rule. Fisher didn't like the consequence that you would have to
introduce subjectivity, so he tried to finesse the problem.

Anon.

unread,

May 1, 2006, 12:38:42 AM5/1/06

to

I'm not sure people bother with multivariate priors, because they're
difficult to elicit for psychological reasons.

THere has been work on the eliciting of priors, for example by Tony
O'Hagan, and Mary Kynn has has a nice package, Elicitor:
http://www.maths.qut.edu.au/~whateley

>
>> Thanks to MCMC we can now fit models that
>> frequentists have trouble with.
>
> MCMC has been around when I was teaching graduate courses
> in Monte Carlo methods. Being able to fit certain models to data
> does not remove the basic FLAW of NOT being able to express
> your OWN (Bayeaisn) prior (opinion) in the form of a prior
> distribution that is informative, non-conjugate, and non-diffuse.
>

But, and read this carefully, that flaw does not exist. We can and do
fit informative, non-conjugate priors.

>> Most modern Bayesian textbooks avoid
>> both of these problems, and show how complex models can be developed and
>> fitted to data.
>
> Only in a very superficial Bayesian-like way. I lumped that into
> the class of "pseudo" or "quasi" Bayesian methods, worse in
> principle and in practice, than non-Bayesian methods.
>

What makes them superficial?

Message has been deleted

Dhandapani

unread,

May 2, 2006, 1:03:24 AM5/2/06

to

> So, if type I error is more important is the reason given.

Reef Fish Bob never said that! Either you or the other Bob said it.
That was why I asked for you to be more specific.

>>
I never meant that you or other Bob have said it. What i wanted to
convey was the same example is quoted for saying type I error is more
serious than type II error. So my question remains same, is it correct
to say type I error is more serious than type II error?

Dhandapani

Art Kendall

unread,

May 2, 2006, 8:42:37 AM5/2/06

to

By convention only, Type I error is more serious than type II error.

In legal systems based on English common law, a false conviction is
considered worse than a false failure to convict.

By convention, it is worse to mistakenly go with the new (alternative,
exploratory) explanation, theory, treatment, or practice, than to
mistakenly stay with the old (standard, customary, default) explanation,
theory, treatment, or practice.

There is no _math_ reason to use the .20 and .05 convention. However,
even many very experienced mathematicians can not easily compare studies
with more the one ot the alpha, beta, effect size triad varied.
Therefore, I usually recommend leaving alpha and beta at conventional
levels and planning or presenting in terms of effect sizes. This is
analogous to the convention in chemistry that studies are done at STP,
i.e., standard temperature and pressure.

Conventions often arise to simplify communication. In reporting the
results of surveys, for example, it is easier for readers, to
compare/interpret results when the confidence level is kept constant at
.95 and the width of the confidence interval is allowed to vary.

Art
A...@DrKendall.org
Social Research Consultants

Dhandapani wrote:
>>So, if type I error is more important is the reason given.

> <snip>
>
>
> Dhandapani
>

Jerry Dallal

unread,

May 2, 2006, 8:53:03 AM5/2/06

to

Anon. wrote:

> You either reject the null hypothesis ("guilty"), or fail to reject it
> ("not proven"). This is apt because in practice the null hypothesis is
> a straw man, so hypothesis testing can be seen as little more than
> asking if you have enough data to reject a hypothesis that you know is
> wrong anyway.

"Hypothesis testing can be seen as asking if you have enough data to
reject a hypothesis that you know is wrong" is one of those statistical
jokes like Jeffreys', "What the use of P implies, therefore, is that a
hypothesis that may be true may be rejected because it has not predicted

observable results that have not occurred."

They are true at one level, but they miss the point entirely. To put a
more positive spin on it, they are like koans. Anyone who sees only
what's on the surface will be puzzled, but anyone who gets to the heart
of them will become enlightened.

So, think about it. It is *true* that hypothesis testing can be seen as

asking if you have enough data to reject a hypothesis that you know is

wrong. Yet, it is done routinely in situations where the consequences
can be enormous. Why? If it is *nothing more* than asking if you have
enough data to reject a hypothesis that you know is wrong, then why do
it at all? Even more important, why does science *continue* to do it?
The answer is obvious if one merely thinks about how significance
testing is used in practice, but anyone who knows only the mathematics
will probably never see it.

If one had to put it in words--solve a koan?--the best answer I've seen
can be found it the writings of John Tukey, in something he actually
published! (not a Bell Labs tech report or a set of lecture notes).

Jerry Dallal

unread,

May 2, 2006, 8:54:59 AM5/2/06

to

Dhandapani wrote:

> is it correct
> to say type I error is more serious than type II error?

Forget statistics. You are asking whether it is a more serious error to
claim to see something when nothing is there than to miss something that
really is there. Context is everything.

Reef Fish

unread,

May 2, 2006, 12:26:25 PM5/2/06

to

Art Kendall wrote:
> By convention only, Type I error is more serious than type II error.
>
> In legal systems based on English common law, a false conviction is
> considered worse than a false failure to convict.

Only on non-Civil courts. The "preponderance of the evidence"
puts it 50-50 on the seriousness of Type I to Type II errors in court.

In Criminal Court, where "beyond a reasonable doubt" is required
for conviction, Type I is considered more serious to the tune of
somewhere from 10-1 to 19-1.

When I finally realized that Dhandapani addressed his question
to me, your response above gave me the perfect platform to
answer Dhandapani, relative to LAW.

In hypothesis testing in statistics, there is no equivalent to the
Civil Court of Law. In that sense, the probability of Type I error
is always set to be small, because that sets the strength of the
evidence of "statistical proof", in the testing setup.

-- Bob.

Reef Fish

unread,

May 2, 2006, 1:01:33 PM5/2/06

to

Jerry Dallal wrote:
> Anon. wrote:
>
> > You either reject the null hypothesis ("guilty"), or fail to reject it
> > ("not proven"). This is apt because in practice the null hypothesis is
> > a straw man, so hypothesis testing can be seen as little more than
> > asking if you have enough data to reject a hypothesis that you know is
> > wrong anyway.
>
> "Hypothesis testing can be seen as asking if you have enough data to
> reject a hypothesis that you know is wrong" is one of those statistical
> jokes like Jeffreys', "What the use of P implies, therefore, is that a
> hypothesis that may be true may be rejected because it has not predicted
> observable results that have not occurred."
>
> They are true at one level, but they miss the point entirely. To put a
> more positive spin on it, they are like koans.

Your first paragraph is a good one!

I just knew that if I stick around this group long enough, I would
learn
something NEW. :-)

In this case, it wasn't statistics, but "koans". When I saw it the
first
time, I dismissed it as some analogy with which I am not familiar, but
when I see it again, I had to go to Google web to find that, among
other pages, I found this billing:

*> Pointing at the Moon - One Hundred Zen Koans from Chinese Masters

Which immediately reminded me of the saying (source unknown)
which I have used on some discusssants (haven't done that in the
sci.stat.math groups yet), though in restrospect, I can see its
applicability to a few:

Confucius say, "Man point finger at Moon. Idiot see Finger."

so, I had to click the web

http://www.ciolek.com/WWWVLPages/ZenPages/KoanStudy.html

to realize that I am as hopeless as one to practice Zen as some in
the sci.stat groups are hopeless in their practice of statistics.

Will that statement qualify as a koan? :-) If not, at least it
rhymes
with the Cone Hat the dunce wears.

>
> So, think about it. It is *true* that hypothesis testing can be seen as
> asking if you have enough data to reject a hypothesis that you know is
> wrong. Yet, it is done routinely in situations where the consequences
> can be enormous. Why? If it is *nothing more* than asking if you have
> enough data to reject a hypothesis that you know is wrong, then why do
> it at all? Even more important, why does science *continue* to do it?
> The answer is obvious if one merely thinks about how significance
> testing is used in practice, but anyone who knows only the mathematics
> will probably never see it.

Actually, as DZ and I have both point out, with different examples,
that it is NOT true that hypotheses Ho tested are always known to
be true.

They are ASSUMED to be true, until proven by the evidence otherwise.

But if the truth is such that Ho can never be rejected by data
(p < 1/(1 followed by a googleplex of zeros) will do for "never)
then no matter how long you sample or how large a dataset
you have, you CANNOT reject the null hypothesis.

The example I gave was the true state of nature is mu1 << mu2

in the sense that the chance of obtaining any sample means
such that P( xibar > x2bar ) < 1/(1 googolplex),

then setting up the H1: mu1 > mu2 against Ho mu1 = mu2,

such as a salesman of Zen (mu1) is trying to prove it's better
than some other mystic's (mu2), that Zen salesman can never
succeed to reject mu1 = mu2 , against the false one-tailed
alternative.

>
> If one had to put it in words--solve a koan?--the best answer I've seen
> can be found it the writings of John Tukey, in something he actually
> published! (not a Bell Labs tech report or a set of lecture notes).

Tukey has written and published many things that might fit that bill.
His comments about hypothesis testing in his AMS paper "On the
Future of Statistics" (1961) certainly qualifies, when he drew the
parallel that a use of significance tests or Hypothesis testing is
like how a drunk uses a lamp post -- for support rather than for
enlightment.

Which formed a monumental base for his brand of Statistics,
to be identified with "Data Analysis".

-- Bob.

Jerry Dallal

unread,

May 2, 2006, 1:51:04 PM5/2/06

to

This is apt, although I would substitute 'The unlightened' for 'Idiot'.
The two statements (the one about false null hypotheses and the other
about P values) are fingers pointing at the moon.

> so, I had to click the web
>
> http://www.ciolek.com/WWWVLPages/ZenPages/KoanStudy.html
>
> to realize that I am as hopeless as one to practice Zen as some in
> the sci.stat groups are hopeless in their practice of statistics.
>
> Will that statement qualify as a koan? :-) If not, at least it
> rhymes
> with the Cone Hat the dunce wears.
>
>
>> So, think about it. It is *true* that hypothesis testing can be seen as
>> asking if you have enough data to reject a hypothesis that you know is
>> wrong. Yet, it is done routinely in situations where the consequences
>> can be enormous. Why? If it is *nothing more* than asking if you have
>> enough data to reject a hypothesis that you know is wrong, then why do
>> it at all? Even more important, why does science *continue* to do it?
>> The answer is obvious if one merely thinks about how significance
>> testing is used in practice, but anyone who knows only the mathematics
>> will probably never see it.
>
> Actually, as DZ and I have both point out, with different examples,
> that it is NOT true that hypotheses Ho tested are always known to
> be true.

Either you've misread me or mistyped your response. What I claimed as
true (assuming *some* hyperbole is allowed) was that "hypothesis testing

can be seen as asking if you have enough data to reject a hypothesis

that you know is WRONG [emphasis added)".

> They are ASSUMED to be true, until proven by the evidence otherwise.
>
> But if the truth is such that Ho can never be rejected by data
> (p < 1/(1 followed by a googleplex of zeros) will do for "never)
> then no matter how long you sample or how large a dataset
> you have, you CANNOT reject the null hypothesis.
>
> The example I gave was the true state of nature is mu1 << mu2
>
> in the sense that the chance of obtaining any sample means
> such that P( xibar > x2bar ) < 1/(1 googolplex),
>
> then setting up the H1: mu1 > mu2 against Ho mu1 = mu2,
>
> such as a salesman of Zen (mu1) is trying to prove it's better
> than some other mystic's (mu2), that Zen salesman can never
> succeed to reject mu1 = mu2 , against the false one-tailed
> alternative.

There are all sorts of artificial examples that one can construct. My
claim is that

* despite a great deal of truth to the statement that most null
hypotheses are false, science continues to test them
* it makes perfect sense in the context of real data, *if* one
understands what it really going on.

>> If one had to put it in words--solve a koan?--the best answer I've seen
>> can be found it the writings of John Tukey, in something he actually
>> published! (not a Bell Labs tech report or a set of lecture notes).
>
> Tukey has written and published many things that might fit that bill.
> His comments about hypothesis testing in his AMS paper "On the
> Future of Statistics" (1961) certainly qualifies, when he drew the
> parallel that a use of significance tests or Hypothesis testing is
> like how a drunk uses a lamp post -- for support rather than for
> enlightment.
>
> Which formed a monumental base for his brand of Statistics,
> to be identified with "Data Analysis".
>
> -- Bob.
>

The passage from Tukey that I'm thinking of explains why significance
tests make sense even though most null hypotheses are false.

Reef Fish

unread,

May 2, 2006, 3:20:13 PM5/2/06

to

No argument there. I am never good at conventional euphemisms
that are more politically correct and less offensive sounding to the
ear.

In the cases of posters I have in mind in this group, I think 'idiot'
is a much more descriptive and unambiguous word than 'unlighteneded',
because many supremely intelligent folks can be unlightened about
many things, but they learned and become enlightened quickly.
Only the 'idiots' would argue and argue, and argue somemore
about the Finger, not necessarily the middle one, but the one
pointing to the Moon. :-))

Thank you for pointing out that "typo" which was more or a verbal slip
to say the opposite to what was meant (unfortunately I do that
nonnegligibly often). What I indeed meant was WRONG, as that
was Anon Bob's point and DZ and I produced counterexample to
Anon Bob's assertion, as shown in the CONTEXT of my example
below.

In the hypothesis setting one CANNOT always prove a WRONG
hypothesis that is assumed to be true in Ho, no matter how
much data is available.

> > They are ASSUMED to be true, until proven by the evidence otherwise.
> >
> > But if the truth is such that Ho can never be rejected by data
> > (p < 1/(1 followed by a googleplex of zeros) will do for "never)
> > then no matter how long you sample or how large a dataset
> > you have, you CANNOT reject the null hypothesis.
> >
> > The example I gave was the true state of nature is mu1 << mu2
> >
> > in the sense that the chance of obtaining any sample means
> > such that P( xibar > x2bar ) < 1/(1 googolplex),
> >
> > then setting up the H1: mu1 > mu2 against Ho mu1 = mu2,
> >
> > such as a salesman of Zen (mu1) is trying to prove it's better
> > than some other mystic's (mu2), that Zen salesman can never
> > succeed to reject mu1 = mu2 , against the false one-tailed
> > alternative.
>
> There are all sorts of artificial examples that one can construct.

DZ and I constructed two. I "pointed" to the fact that there are
unlimited number of counterexamples like those.

> My claim is that
>
> * despite a great deal of truth to the statement that most null
> hypotheses are false, science continues to test them
> * it makes perfect sense in the context of real data, *if* one
> understands what it really going on.

That is both "true" AND "false", in the enlightened sense.
It is TRUE, I had argued, in the sense of testing a "sharp" null
hypothesis that is known a priori to be not "literally true" to
the infinitessimals. Those sharp hypotheses can always be
rejected if sufficient data is obtained.

It is FALSE in the sense of my counterexample, and DZ's.

>
>
> >> If one had to put it in words--solve a koan?--the best answer I've seen
> >> can be found it the writings of John Tukey, in something he actually
> >> published! (not a Bell Labs tech report or a set of lecture notes).
> >
> > Tukey has written and published many things that might fit that bill.
> > His comments about hypothesis testing in his AMS paper "On the
> > Future of Statistics" (1961) certainly qualifies, when he drew the
> > parallel that a use of significance tests or Hypothesis testing is
> > like how a drunk uses a lamp post -- for support rather than for
> > enlightment.
> >
> > Which formed a monumental base for his brand of Statistics,
> > to be identified with "Data Analysis".
> >
> > -- Bob.
> >
>
> The passage from Tukey that I'm thinking of explains why significance
> tests make sense even though most null hypotheses are false.

That would be a different passage then. Then Tukey would be
explaining an enlightened use of hypothesis, for enlightment, as
OPPOSED to the drunks using a lamp post.

It works BOTH ways, without any contradiction of each other.

-- Bob.

Richard Ulrich

unread,

May 2, 2006, 5:42:38 PM5/2/06

to

Right. Here are example where Type II matters more.

For a guard on sentry duty, it is important to
react -- over-react -- to every hint of 'something'
because lives are at stake if any infiltrator is *missed*.

Some people who are "anti-environmentalist" wrongly imagine
that they are being scientific, when they want an argument
that shows that "the hazard" (climate change; killing all the
whales) is established with a 5% test (which says there is
a change). The situation is parallel to the sentry's.

The logic seems easy for sentries, but the generalization
seems tough to establish, for some people.
It is hard to convince them to use a multi-test correction
that works in the opposite direction of the usual Bonferroni
correction. We don't divide the 5% by k, to protect from hazards.

If there are a dozen respectable hazards at the *50%* test size,
it is a bad decision policy that says we have to ignore them.

--
Rich Ulrich, wpi...@pitt.edu
http://www.pitt.edu/~wpilib/index.html

Robert Dodier

unread,

May 3, 2006, 12:41:37 AM5/3/06

to

Jerry Dallal wrote:

> "Hypothesis testing can be seen as asking if you have enough data to
> reject a hypothesis that you know is wrong" is one of those statistical
> jokes like Jeffreys', "What the use of P implies, therefore, is that a
> hypothesis that may be true may be rejected because it has not predicted
> observable results that have not occurred."

I guess calling it a joke obviates the need to address it.

> They are true at one level, but they miss the point entirely. To put a
> more positive spin on it, they are like koans. Anyone who sees only
> what's on the surface will be puzzled, but anyone who gets to the heart
> of them will become enlightened.

Well, that's convenient, isn't it. You needn't bother to explain
what's up with hypothesis testing; it's well known that mere words
cannot enlighten.

> So, think about it. It is *true* that hypothesis testing can be seen as
> asking if you have enough data to reject a hypothesis that you know is
> wrong. Yet, it is done routinely in situations where the consequences
> can be enormous.

Yeah. I wish they'd cut that out.

> Why? If it is *nothing more* than asking if you have enough data
> to reject a hypothesis that you know is wrong, then why do it at all?

"The boss said so" is a perfectly valid reason. "I don't know what
else to do" is a little weaker.

> Even more important, why does science *continue* to do it?

In problems with weak prior information and a lot of noise,
the result from a conventional hypothesis test is probably not too
different from what you'd get from a decision-theoretic approach.

With strong prior information or low noise, I doubt if people
bother with significance tests.

> The answer is obvious if one merely thinks about how significance
> testing is used in practice, but anyone who knows only the mathematics
> will probably never see it.

What are we supposed to see here? Oh, sorry, I'm not enlightened.
Never mind.

> If one had to put it in words--solve a koan?--the best answer I've seen
> can be found it the writings of John Tukey, in something he actually
> published! (not a Bell Labs tech report or a set of lecture notes).

Is this a puzzle of some kind? Or maybe you can just give a citation.

Robert Dodier

Anon.

unread,

May 3, 2006, 1:01:02 AM5/3/06

to

Reef Fish wrote:
> Jerry Dallal wrote:
>> Reef Fish wrote:
>>> Jerry Dallal wrote:

<snip>

I just want to defend myself by pointing out that Reef Fish is setting
up something of a straw man. My comments were about what happens _in
practice_. You can produce all sorts of weird counter-examples, but if
you never see them in practice, then they are not relevant to my point.

Reef Fish used an example of a one-tailed test, where the direction was
wrong. In practice, I would expect that someone would notice that
something was amiss, and start to question what they were doing.

Reef Fish

unread,

May 3, 2006, 1:42:16 AM5/3/06

to

I didn't realize you were from the Wizard of Oz!

> My comments were about what happens _in
> practice_. You can produce all sorts of weird counter-examples, but if
> you never see them in practice, then they are not relevant to my point.

The example I gave was contrived only for the purpose of showing
something counter to your exaggerated claim.

In practice more often than not, the test of equality of means
are NOT rejected, and they are NOT rejected no matter how
large a sample, if you had set up the incorrect alternative Hyp.

That's the reality, Anon Bob.

> Reef Fish used an example of a one-tailed test, where the direction was
> wrong. In practice, I would expect that someone would notice that
> something was amiss, and start to question what they were doing.

What you are saying is that there doexn't exist any REAL test in
which the p-value is greater than .5? Greater than .9?
Most of the time, the tester doesn't even KNOW what the
definition of p-value is, and assume it's always the smaller
probability on either tail.

That's just another one of the inter-related faux pas many users
like yourself unknowingly committed.

How many times have you seen a p value GREATER than .9?

Did you even know that a p-value can be greater than .5?

Work on those first, before you come back to defend yourself,
and claim you know the real world and I don't.

Reef Fish Bob.

Anon.

unread,

May 3, 2006, 3:48:02 AM5/3/06

to

Is this true? It would actually be interesting to see some data on this.

and they are NOT rejected no matter how
> large a sample, if you had set up the incorrect alternative Hyp.
>
> That's the reality, Anon Bob.
>

But in practice, how often do people set up an incorrect alternative
hypothesis in the way you describe? That's the practice I was thinking
of, so if people never do that, then it's not an issue _in practice_.

As far as I'm aware, most tests of equality of location are two-tailed
(or they're ANOVAs, which amount to the same thing, in the sense that no
direction of difference is hypothesised).

>
>>Reef Fish used an example of a one-tailed test, where the direction was
>>wrong. In practice, I would expect that someone would notice that
>>something was amiss, and start to question what they were doing.
>
>
> What you are saying is that there doexn't exist any REAL test in
> which the p-value is greater than .5? Greater than .9?

No, I'm not saying this and I've absolutely no idea where you get the
idea from that I am.

If you were doing a one tailed test, and got t-value of -30, when you
expected it to be positive, wouldn't that give you pause for thought?

Bob

--
Bob O'Hara

Dept. of Mathematics and Statistics

P.O. Box 68 (Gustaf Hällströmin katu 2b)
FIN-00014 University of Helsinki
Finland

Telephone: +358-9-191 51479
Mobile: +358 50 599 0540
Fax: +358-9-191 51400
WWW: http://www.RNI.Helsinki.FI/~boh/

Journal of Negative Results - EEB: http://www.jnr-eeb.org

Herman Rubin

unread,

May 3, 2006, 11:00:40 AM5/3/06

to

In article <5183@2528016987.575813402.22776.21473.24241>,
DZ <12935@2884614598.2765014178.8093.20204.28986> wrote:

>Anon. <bob....@NOSPAMhelsinki.fi> wrote:
>> You either reject the null hypothesis ("guilty"), or fail to reject it
>> ("not proven"). This is apt because in practice the null hypothesis is

>> a straw man, so hypothesis testing can be seen as little more than

>> asking if you have enough data to reject a hypothesis that you know is

>> wrong anyway.

>This is only true for the point null. Consider H0 : |m1 - m2| < epsilon.
>Then for a fixed p-value = alpha there is correspondence with
>Pr(H0 | data) in that as more data is collected, this may approach
>alpha (section 2.3 in Berger and Delampady 1987 Testing precise
>hypotheses. Stat Sci 2:317-352).

One can consider the problem from a Bayesian approach,
including a prior Bayesian approach, which is what can be
obtained from principles of consistency, and does not
involve the notion of belief. As far as I know, the
only decision-theoretic approach is in my paper in the
First Purdue Symposium.

If the width of the null is rather small compared to the
precision of the usual estimate, it can be considered a
point null, with its mass being the integrated loss-prior
product over the null, and loss 1. In this case, the
problem is approximately what Sethuraman and I considered
in 1965, and which is quite good for moderate size samples.

If the width of the null is an order of magnitude larger
than the precision of the estimator, just see if the
estimate falls in the interval.

In the intermediate case, which certainly can occur, the
form of the loss-prior combination matters, at least for
the loss-prior combinations tried. This is unavoidable.

Reef Fish

unread,

May 3, 2006, 11:18:53 AM5/3/06

to

Anon. wrote:
> Reef Fish wrote:
> > Anon. wrote:
> >>Reef Fish wrote:
> >>>Jerry Dallal wrote:
> >>>>Reef Fish wrote:
> >>>>>Jerry Dallal wrote:
> >><snip>
> >>

> >>I just want to defend myself by pointing out that Reef Fish is setting
> >>up something of a straw man.
> >
> > I didn't realize you were from the Wizard of Oz!
> >
> >
> >>My comments were about what happens _in
> >>practice_. You can produce all sorts of weird counter-examples, but if
> >>you never see them in practice, then they are not relevant to my point.
> >
> > The example I gave was contrived only for the purpose of showing
> > something counter to your exaggerated claim.
> >
> > In practice more often than not, the test of equality of means
> > are NOT rejected,
>
> Is this true? It would actually be interesting to see some data on this.

I used to see data of this ALL the time, on Ph.D. committees of
Physical Education and other Education departments where a
typical doctoral dissertation consists of comparing n "treatments"
to look for differences and superior ones.

These dissertation directors and candidates didn't know anything
about ANOVA techniques for the sole purpose of elminating the
invalidity of "significant difference" when every pair of possible
pairwise comparison is made, and those FEW that turned out to
be "statistically significant" according to T-tests were written
up as the findings from their doctoral dissertation "research".

On such pairwise T-tests on Ho: mu(i) = mu(j)

Almost ALL of them are ACCEPTED (or NOT rejected). Typically
if there are 15 different methods of muscle building, there would
be 105 possible T-tests for pairwise equality of means, and
only a few of those are rejected to report the "finding" of such
theses generated by massive pairwise comparisons by T-test.

> > That's the reality, Anon Bob.

> >
> But in practice, how often do people set up an incorrect alternative
> hypothesis in the way you describe? That's the practice I was thinking
> of, so if people never do that, then it's not an issue _in practice_.
>
> As far as I'm aware, most tests of equality of location are two-tailed
> (or they're ANOVAs, which amount to the same thing, in the sense that no
> direction of difference is hypothesised).

The T-tests in Ph.D. dissertations example cited above directly
contradicted your preceding paragraph. Those T-tests WERE
two-tailed. Most of them were NOT rejected.

You started with a ridiculous assertion that all Ho are known to
be wrong and rejected given sufficient data; and DZ and I showed
you some counterexamples that ASSURED that the Ho cannot be
rejected.

Then you complained about it's lack of reality as if your own has
reality.

So, I finally cited you some doctoral dissertations, in which I've
sat on committees, or viewed on the sideline (such as the
doctoral dissertation of the Athletic director of the college I
went to, who hired me and another student to do the hand
calculation of the T-tests, because that was before the days
when all colleges have COMPUTERS). He got his Ed.D.
doctoral degree in Physical Education from the University
of Kentucky.

In short, instead of discussing the proper and improper use
of Hypothesis Testing terminology, and the proper
interpretation of the results of such tests in the Neyman
Pearson setting, you chose to make some unsupportable
remarks as if they are common occurrences in practice.

It's pointless to discuss how often statistics are used properly
and improperly. This newsgroup is a perfect example to show
that the MAJORITY of the posters and discussants, especially
Bob O'Hara, are known to make faulty statements about
Linear models, regression methods, and how hypothesis
testing terminology. THAT's still another REALITY, Anon
Bob O'Hara.

-- Reef Fish Bob.

Anon.

unread,

May 3, 2006, 12:57:07 PM5/3/06

to

And haven't yet shown that the null hypotheses in the tests would still
have been accepted with more data. I'm afraid you need that to complete
your refutation: the "given sufficient data" is important.

<snip>

Bob

--
Bob O'Hara
Department of Mathematics and Statistics

P.O. Box 68 (Gustaf Hällströmin katu 2b)
FIN-00014 University of Helsinki
Finland

Telephone: +358-9-191 51479
Mobile: +358 50 599 0540
Fax: +358-9-191 51400
WWW: http://www.RNI.Helsinki.FI/~boh/

Journal of Negative Results - EEB: www.jnr-eeb.org

Reef Fish

unread,

May 3, 2006, 1:53:44 PM5/3/06

to

I have shown your counterproductive statement and when you wanted
to talk about "reality", gave you the EVIDENCE in reality that
contradicted your ridiculous assertion, and now you are back to
your original "non-realiity" and want to start all over again.

Sorry, Bob O'Hara, you'll have to do your SOLO DANCE on your
noise from now on, on your Null Hypothesis theme, which comes
from your own lack of statistical knowledge AND experience.

> --
> Bob O'Hara
> Department of Mathematics and Statistics
> P.O. Box 68 (Gustaf Hällströmin katu 2b)
> FIN-00014 University of Helsinki
> Finland

-- Reef Fish Bob.

Anon.

unread,

May 3, 2006, 3:17:26 PM5/3/06

to

Can you give me (us?) complete references to these, please. I'm
interested because similar problems have come up in genetics, so it
could be quite relevant.

Bob

--
Bob O'Hara
Department of Mathematics and Statistics
P.O. Box 68 (Gustaf Hällströmin katu 2b)
FIN-00014 University of Helsinki
Finland

Telephone: +358-9-191 51479