Thais had posted this problem in sci.stat.consult and had received
replies or several rounds. I wonder why he is starting anew here?
> This is the "arbitrary units" problem, which the use of standardized
> effect size indices (Cohen's d, etc) can solve if the standardizing
> variability is in some sense a natural component of the situation,
That is as much nonsense as Cohen's d, whatever it is!
This is a problem in distinquishing the meaning of "Statistically
significant" (which has very explicit meaning in Frequentist
statistics) and the term "Practical USEFULNESS".
Any statisticians worth his salt would know that a highly
significant" result can be completely worthless from a practical point
of view of the usefulness of the result.
Conversely, a statistical result that is not statistically significant
some .05 or .10 level can be very useful.
The two concepts are TOTALLY different in terms of knowing how
to apply statistics sensibly and usefully.
-- Reef Fish Bob.
> opposed to representing some sort of experimental error. The drawback
> is that the CI must then incorporate uncertainty in the estimate of
> variability. This both widens the CI and makes it more difficult to
---- snip ----
>> This is the "arbitrary units" problem, which the use of standardized
>> effect size indices (Cohen's d, etc) can solve if the standardizing
>> variability is in some sense a natural component of the situation,
> That is as much nonsense as Cohen's d, whatever it is!
Xbar(1) - Xbar(2)
Cohen's d = -----------------
Thanks, Bruce for providing the reference.
As I said, whatever Cohen used it's IRRELEVANT to the quesiton of
"practical significance" or practical usefulness" which is a completely
different concept from "statistical significance".
Cohen's d appears to be nothing more than a TEST STATISTIC
used to determine the "statistical significance" or a test.
There are hundreds of thousands of such test statistics in the
subject of Statistics, but NONE tells you when a result is of any
PRACTICAL value, as in commonsense.
The above is an important AXIOM in my Data Analysis course.
To know the difference between "statistical significance" and
The best example is that of correlation!! The one single statistic
that is abused by more users than any other.
At a .05 significance level, the Pearson correlation is statistically
significant when its ABSOLUTE value is approximately greater
than 2/sqrt(n), for large n.
That result was posted by me somewhere in sci.stat.math.
Just found it, in a November 2005 post:
The 2 came from 1.96 for Z.
What that says is that a correlation coefficient that is greater
than .02 is statistically significant for n = 10,000.
===== excerpt of the PROOF of the asymptotic result
r is significant (two-tailed) if
RF> |R|* sqrt((n-2)/(1 - R*R)) > t(1-alpha/2;(n-2)).
RF> or equivalently, if |R| > t /(sqrt((n-2) + t*t))
where t is the critical value at alpha/2 for t with (n-2) df.
Since sqrt((n-2) + t*t)) is approximate sqrt(n) for large n,
an easy mnemonic device (using the asymptotic approx,)
is to think of the standard error of r as 1/sqrt(n).
Thus, the r is statistically significant at the 95% level if
I r I > 2/sqrt(100) = 0.2 if n = 100
and I r I > 2/sqrt(10000) = 0.02 if n = 10,000
and so on.
================== end excerpt
Has anyone seen scatterplots of correlation r = .2, or .4?
let along .02? It is completely indistinguishable from a
RANDOM scatter with 0 correlation.
A correlation of .02 is what I call practically USELESS.
A correlation of .98 MAY be USELESS.
There is that SPSS Multiple regression example I did from the
1975 Manual where the Multiple R exceeded .98 I believe, but
the result was completely USELESS.
These are the ideas imbedded in the notions of "statistical
significance" vs practical significance.
-- Reef Fish Bob.
I agree with you on most part of your answer, especially on the
distinction between significance and practical usefulness In French, I
usually make the distinction between two quite similar words :
"Significativité" (=significance) and "Signification" (= meaning). But
please let me disagree with you on 1 or 2 points.
> As I said, whatever Cohen used it's IRRELEVANT to the quesiton of
> "practical significance" or practical usefulness" which is a completely
> different concept from "statistical significance".
> Cohen's d appears to be nothing more than a TEST STATISTIC
> used to determine the "statistical significance" or a test.
First, you're interpretation of Cohen's d seems erroneous to me.
Cohen's d is a measure of effect size, just as r, R², Partiat Eta² or
Proportional Reduction in Error (Judd & McClelland, 1988). It provides
no information concerning the statistical significance of the
corresponding effects. Cohen's (1988) book on Statistical Power provide
a rule to the interpretation of what can be called a "Small", "Medium"
or "Large" effect size.
Large effect size : Cohen's d = .8 (approx. to r = .38),
Medium effect size : Cohen's d = .5 (approx. r = .25),
Small effect size : Cohen's d = .2 (approx. r = .15).
However, although these indications of effect sizes remains independant
of the question of practical usefulness, it helps to understand that
statistical significance is not a reliable indication of the importance
of the predictor.
> To know the difference between "statistical significance" and
> "practical usefulness".
> The best example is that of correlation!! The one single statistic
> that is abused by more users than any other.
> At a .05 significance level, the Pearson correlation is statistically
> significant when its ABSOLUTE value is approximately greater
> than 2/sqrt(n), for large n.
> A correlation of .02 is what I call practically USELESS.
> A correlation of .98 MAY be USELESS.
Second, a correlation of .02 may be useless, but, depending on the
field of research, it can be very useful. Let me quote Rosenthal (1990,
p 775) :
"The Physician's Aspirin Study
At a special meeting held December 18, 1987, it was decided to end
prematurely a randomized double blind experiment on the effects of
aspirin on reducing heart attacks (Steering Committee of the
Physician's Health Study Research Group, 1988). The reason for the
unusual termination of this experiment was that it had become so clear
that aspirin prevented heart attacks (and deaths from heart attacks)
that it would be unethical to continue to give half of the physician
research subjects a placebo. Now what do you suppose was the magnitude
of the experimental effect that was so dramatic as to call for the
termination of this research? Was r2 .90, or 30, or .70, or .60, so
that the corresponding rs would have been .95,.89,34, or .77? No. Well,
was r2 50, .40, .30, or even .20, so that the corresponding rs would
have been .7 1, .63, 3 , or .45? No. Actually, what r2 was, was .0011,
with a corresponding r of .034."
So, you are right in insisting on the distinction between usefulness
and significance. In some research, an effect size (r²) of .50 may be
large but meaningless, whereas in another field, an effect size of .001
can be very small, but very meaningful.
What has evolved from this "dichotomy" is not encouraging.
In psychology, and related "social" sciences, effect sizes has totally
dominated the scene. The statistics text books used really downplay
statistics, logical thinking and mathematics, and all focus on effect sizes.
In economics and in the natural life sciences, statistics has dominated, and
practical significance has been downplayed. So we keep seeing articles
expressing on one hand that "statistical significance" does not indicate
"practical signidicance", and the value of the papers should be based on
Then on the other hand "practical signidicance" has no "standard", so we and
deluged by papers reporting all kinds of invented tests, weird ideas,
invented ways to get something out of a PCA, etc. Oh welll, it keeps all the
periodical publishers happy with all the subscribers who pay for this
I read all newsgroups from Google. Today, Google has been constipated
since this morning and posts are coming out several hours late.
But one ADVANTAGE of such delay is that David Heiser has already
said many of the things I would have said in his post, especially his
DH> Then on the other hand "practical signidicance" has no
DH> "standard", so we and deluged by papers reporting all kinds
DH> of invented tests, weird ideas, invented ways to get something
DH> out of a PCA, etc. Oh welll, it keeps all the periodical
DH> happy with all the subscribers who pay for this nonsense.
"Practical usefulness", by its very nature, is a highly subjective
that is not subject to any quantification "standards" that are
contrived "nonsense" -- that's good technical term for it. :-)
I don't really have much to add ... but I'll do a quick read through
to see your disagreements.
> > Cohen's d appears to be nothing more than a TEST STATISTIC
> > used to determine the "statistical significance" or a test.
It looked like a test statistic, but I KNOW it's nonsense from general
consideration of my paragraphs and David's paragraph above.
> Cohen's (1988) book on Statistical Power provide
> a rule to the interpretation of what can be called a "Small",
> "Medium" or "Large" effect size.
Do you REALLY think those rules have any PRACTICAL usefulness?
> Large effect size : Cohen's d = .8 (approx. to r = .38),
> Medium effect size : Cohen's d = .5 (approx. r = .25),
> Small effect size : Cohen's d = .2 (approx. r = .15).
Can I write a paper to refine that? I think if the two digits are the
same, such as .33, .44, .55, they are much more useful effect
sizes because they can be correlated to SHOE sizes of 3, 4, 5,
etc. I've never seen any shoe size of .38, have you?
Then there are PRIME sizes, such as .13, .17, .41, etc. that
are clearly more practically useful than numbers that can be
factored into other numbers.
I think Cohen has really opened up a new field, that may be
called Numerological Statistics, that goes hand-in-hand with
his other theories, which I heard he called Type-2 a probability, :-)
What have you been smoking? Are you saying some study was called
off because the correlations were like .02 and that makes .02 useful
because it stops further wasted efforts?
When it comes to correlation, ALL correlations are practically USELESS.
It doesn't really matter if its .02, or .2, or .98, there are ALWAYS
(more USEFUL ways of expressing the same information by using
For that reason, the more I look at the ABUSE in the use of
the more appreciate Tukey's saying something to the effect that using
correlations is like "sweeping dirt under the rug WITH A VENGEANCE"
-- it is far worse than hiding dirt.
So, we'll forever disagree on our opinion about correlations.
> So, you are right in insisting on the distinction between usefulness
> and significance. In some research, an effect size (r²) of .50 may be
> large but meaningless, whereas in another field, an effect size of .001
> can be very small, but very meaningful.
That is saying something a little different that the size for
usefulness" which has no standard and no scale.
Your saying of .5 and .001 are in terms of ABSOLUTE scale. In
that respect, at least correlation is bounded between -1 and 1 so
that it has a "relative scale" of sorts but still cannot be used to
judge practical usefulness.
On the other hand, for measures that have NO size limits, any
statement about what's big and what's small is nothing short of
being INSANE. Is 100,000,000 large? It could be. But it's
negligible in the budget of the USA. The congress has just
approved the squandering of $7 BILLION for next year's budget
to fight the war in Iraq and Afganistan. 7,000,000,000, which
is 70 times that number, for just one tiny, tiny portion of the
national annual budget.
Is .0001 small? That could be astronomically large if its in
cm units measuring microscopic organisms under a million
But those who make up national, state, and city budget will
know what is PRACTICALLY significant or not in THEIR
budgets, and the microbiologists will know how small is HUGE.
As for Cohen, he doesn't know ANYTHING, but contrived
nonsense, to pander to the gullible sociologists.
-- Reef Fish Bob.
> As for Cohen, he doesn't know ANYTHING, but contrived
> nonsense, to pander to the gullible sociologists.
A few weeks ago, Reef Fish wrote several screens of diatribe
against Jacob Cohen, whose book on the subject was the single
main impetus for statistical power analysis. I suppose that the
book built a practical conclusion from Mosteller's entertaining,
"counter-intuitive" demonstrations about how long a World Series
would have to be to show which team is a bit better.
There yet is no internal evidence that I have noticed, that Reef
Fish had ever browsed or even seen either of Cohen's
much-respected and much-cited textbooks. "Data-free analysis."
And apparently that's where Richard Ulrich learned his statistics.
Cohen's so-called power analysis is well within the framework of
statistics, EXCEPT when he made blunders such as calling
Type II error a probability, which I even defended that he couldn't
have done it when Richard Ulrich said he did, and it was Jerry Dallal
who testified that Cohen did the same thing (at least once) in his
That is sufficient for me to characterize the much hoopla'd book
Cohen (never heard of him until Ulrich mentioned it) as "contrived
nonsense" for those in the subculture of sociologists.
> I suppose that the
> book built a practical conclusion from Mosteller's entertaining,
> "counter-intuitive" demonstrations about how long a World Series
> would have to be to show which team is a bit better.
The merit of Mosteller's work stands on its own. No amount of
Richard Ulrich inuendo or smearing will matter in the least.
If Richard Ulrich had learned just a LITTLE bit from the Mosteller and
Tukey book on Regression, he wouldn't be making all those blunders
on the subject that he learned from sociologists.
> There yet is no internal evidence that I have noticed, that Reef
> Fish had ever browsed or even seen either of Cohen's
> much-respected and much-cited textbooks. "Data-free analysis."
I can tell you definitively that I have NEVER, and will not EVER,
read any of Cohen's book, nor even anything he has written,
except second hand, via Ulrich, Dallal, and a few others. Based
on what I heard there, it was more than sufficient for ME to
decide that he is no "statistician", nor even one who has any good
Richard, clip that paragraph and you can quote me any time,
and save that "there is no internal evidence" that I ever browsed
Cohen's writing. There are so many great statisticians whose
work I haven't had time to read YET, on topics that are secondary
to my interest and to the mainstream of Statistics, that the ONLY
way I would read Cohen would be if I my cruiseship sank, and I
am stranded in a Pacific Island of no inhabitant, and Cohen's
book floated on shore. Nah ... I changed my mind. I wouldn't
read it even then, in favor of using it to light a fire. :-)
I hope I didn't beat around the bush above on my assessment
But to be on the serious side, if anyone had cited ANYTHING
by Cohen that's shown to have some value in statistics, I
would have at least gladly considered it. As it was, the only
thing Ulrich manage to cite are errors or nonsense written by
Cohen, and the number of times his book had been cited in
Google Scholar (a piece of absolute JUNK in Google, as I
had documented -- that it found more citations about my
publications on several subjects in which Tukey and Mostellers
are much better known and much better scholar than I am,
and Google Scholar missed my entry on "Interactive Data
Analysis" in the Encyclopedia of Statistical Sciences altogether.
But that's the kind of "junk research" Ulrich excels in -- relying
on the worst of information from Google, while he missed his
Statistical EDUCATION from the mainstream textbooks and
papers by, and for, statisticians.