chi-square test goodness of fit

Alexis Gatt

unread,

Nov 4, 2004, 7:54:08 AM11/4/04

to

Hi guys,

I have a very basic question regarding the goodness-of-fit statistic
chi-square. The chi-square measure between the observed data and those given
by the model I implemented equals 24, and there is 28 degrees of freedom.
According to a chi-square table I found in a book, the equivalent
probability value is between 0.5 and 0.75.

And this is where I am confused. Does this mean that the model predict the
data well or not? And why?

Many thanks

Alexis

Bruce Weaver

unread,

Nov 4, 2004, 8:10:40 AM11/4/04

to

Alexis Gatt wrote:

Pearson's chi-square is based on the discrepancy between Observed and
Expected frequencies. The more O and E differ, the larger the
chi-square value becomes, and the smaller the p-value becomes.
Therefore, smaller values of chi-square (and larger values of p)
represent better fits.

--
Bruce Weaver
bwe...@lakeheadu.ca
www.angelfire.com/wv/bwhomedir

George Kahrimanis

unread,

Nov 5, 2004, 5:00:05 PM11/5/04

to

"Alexis Gatt" <alexism...@yahoo.co.uk> wrote in message
news:<cmd8or$b20$1...@iss-nntp.leeds.ac.uk>...
> [...]

> According to a chi-square table I found in a book, the equivalent
> probability value is between 0.5 and 0.75.
> And this is where I am confused. Does this mean that the model
> predict the data well or not? And why?

I know the feeling. You had better struggle with this question
a couple of more days, but if you are in a hurry, read on now.

A goodness-of-fit score of about 50% means that your fit is of
average "goodness". A 100% goodness of fit means that the curve
goes through the points; a perfect fit, but people will suspect
that you have massaged the data *and* that you are obsessive-
compulsive. Consequently, not only it is nonsense to say that
100% g.o.f. is twice as good as a 50% g.o.f., but it may be less
desirable.

~ George Kahrimanis

D. Touie

unread,

Dec 23, 2004, 11:58:36 PM12/23/04

to

I apologize for being so late with this response. I only now noticed
your original posting.

I do not agree with Bruce Weaver's response to you.

Your model does predict your data reasonably well. Your chi-square sum
of 24 with 28 degrees of freedom evaluates to about 0.318 probability.
This means your data is about 0.182 less than the optimum fit
probability of 0.50. In general, probabilities of 0.10 or less, or
their probability equivalents of 0.90 or more, are the beginnings of
usual non-fit conclusions.

BTW, I did this particular evaluation using Microsoft's Excel
spreadsheet function "Chidist()." This function has to be modified by
subtracting it from 1 to get a correct answer. Their function is
implemented exactly backward. I find it amusing that all of the other
spreadsheets I have tried over the past years implement the equivalent
function the same way. I think this an example of standardized legacy
code marching to the drumbeat of Microsoft.

Your chi-square table seems to suffer from the same malady. Maybe it
is the same one Microsoft programmers consulted when deciding how to
implement "Chidist()." Hmm.

Richard Ulrich

unread,

Dec 24, 2004, 4:10:42 PM12/24/04

to

On Fri, 24 Dec 2004 04:58:36 GMT, D. Touie <dto...@tscnet.com> wrote:

[snip]

> BTW, I did this particular evaluation using Microsoft's Excel
> spreadsheet function "Chidist()." This function has to be modified by
> subtracting it from 1 to get a correct answer. Their function is
> implemented exactly backward. I find it amusing that all of the other
> spreadsheets I have tried over the past years implement the equivalent
> function the same way. I think this an example of standardized legacy
> code marching to the drumbeat of Microsoft.
>
> Your chi-square table seems to suffer from the same malady. Maybe it
> is the same one Microsoft programmers consulted when deciding how to
> implement "Chidist()." Hmm.

I think you are complaining that computer software
developers regularly use "cumulative distribution functions"
which take the integral from zero (or minus infinity) to
plus infinity.

I believe the explanation is that computer programmers
(in this sort of case) have written in such a way that
they will retain some respect from trained statisticians
- instead of trying to please amateur data-analyst hackers
who like to see their "p < 0.05" .

--
Rich Ulrich, wpi...@pitt.edu
http://www.pitt.edu/~wpilib/index.html

Jack Tomsky

unread,

Dec 24, 2004, 9:50:57 PM12/24/04

to

On Fri, 24 Dec 2004 04:58:36 GMT, D. Touie wrote:

>BTW, I did this particular evaluation using Microsoft's Excel
>spreadsheet function "Chidist()." This function has to be modified by
>subtracting it from 1 to get a correct answer. Their function is
>implemented exactly backward. I find it amusing that all of the other
>spreadsheets I have tried over the past years implement the
equivalent
>function the same way. I think this an example of standardized legacy
>code marching to the drumbeat of Microsoft.
>
>Your chi-square table seems to suffer from the same malady. Maybe it
>is the same one Microsoft programmers consulted when deciding how to
>implement "Chidist()." Hmm.

I have found that Excel's rule for the probability tail differs
according to the distribution. Each time I use it, I have to check
whether I need to modify their result.

NORMSDIST(x) = P(X < x)
TDIST(x,n,1) = P(|X| > x), undefined for x < 0
TDIST(x,n,2) = 2*P(|X| > x), undefined for x < 0
CHIDIST(x,n) = P(X > x)
FDIST(x,m,n) = P(X > x)

Jack

D. Touie

unread,

Dec 24, 2004, 10:05:20 PM12/24/04

to

From: Richard Ulrich <Rich....@comcast.net>
Newsgroups: sci.stat.math
Subject: chi-square test goodness of fit
Date: Fri, 24 Dec 2004 16:10:42 -0500

On Fri, 24 Dec 2004 04:58:36 GMT, D. Touie <dto...@tscnet.com> wrote:

[snip]
> BTW, I did this particular evaluation using Microsoft's Excel
> spreadsheet function "Chidist()." This function has to be modified by
> subtracting it from 1 to get a correct answer. Their function is
> implemented exactly backward. I find it amusing that all of the other
> spreadsheets I have tried over the past years implement the equivalent
> function the same way. I think this an example of standardized legacy
> code marching to the drumbeat of Microsoft.
>
> Your chi-square table seems to suffer from the same malady. Maybe it
> is the same one Microsoft programmers consulted when deciding how to
> implement "Chidist()." Hmm.

Fascinating argument.

I am guessing you equate "trained statisticians" with "mathematical
statisticians." I am, of course, a mere applied statistician.

I did do some due diligence research before posting my "complaint."

Left over from the dozen or so statistics texts I used to keep on
hand, I have two basic books left. Both have Chi-square probability
transformation tables. One reads from right-to-left, the other
left-to-right. I am thoroughly chagrined to find out at this late date
they were written only for us "amateurs."

I also consulted my nearly new TI-89 Titanium calculator. It
Chi-square functions work as "cumulative distribution functions"
rather than as tables. It produces the same probability
transformations as my two basic book tables.

Since I posted my "complaint," I thought of a simple experiment. I
tried it out this morning in my Excel worksheet. I entered 10 made-up
coin-flipping results into 2 two-bin Chi-square vectors. In the first
vector I made all 10 results heads. In the second I made it 5 heads
and 5 tails. I then evaluated the two vectors with the uncorrected
Excel "Chidist()" function at 1 degree of freedom.

10 heads = Near zero probability.

5 heads & 5 tails = Exactly 1 probability.

Are these the results you non-amateurs expect?

Possibly there is a properly "trained statistician" in this news group
who could straighten this hiccup out for us interested "amateurs."

Mack

unread,

Dec 25, 2004, 11:50:15 AM12/25/04

to

On Sat, 25 Dec 2004 03:05:20 GMT, D. Touie <dto...@tscnet.com> wrote:

>From: Richard Ulrich <Rich....@comcast.net>
>Newsgroups: sci.stat.math
>Subject: chi-square test goodness of fit
>Date: Fri, 24 Dec 2004 16:10:42 -0500
>
>On Fri, 24 Dec 2004 04:58:36 GMT, D. Touie <dto...@tscnet.com> wrote:
>
[snip]
>

>Since I posted my "complaint," I thought of a simple experiment. I
>tried it out this morning in my Excel worksheet. I entered 10 made-up
>coin-flipping results into 2 two-bin Chi-square vectors. In the first
>vector I made all 10 results heads. In the second I made it 5 heads
>and 5 tails. I then evaluated the two vectors with the uncorrected
>Excel "Chidist()" function at 1 degree of freedom.
>
>10 heads = Near zero probability.
>
>5 heads & 5 tails = Exactly 1 probability.
>
>Are these the results you non-amateurs expect?

Even as an amateur those results make sense.
The results are the probability that the chi square
will have a larger value.

In the even distribution case the chi square is zero.
Of course, the probability is 1 that any other value
will be larger.

In the all heads case the result is not quite zero
because of the way the function is calculated.
The result expected is near zero. It isn't zero
because a larger sample could result in a larger
value. Under the randomness assumption this is
extremely unlikely.

If the results were near one probability for the all
heads case and zero for the even case it would
still make sense. The results would be that the
probability that the chi square has a smaller value
rather than larger.

IMHO the probability that the chi square will have
a larger value is a better statement. The statistic
book I have uses this definition. The statistics
function I use for programming also uses this
definition.

The program help files/manuals should state one
way or the other. If it differed from the way the
programmers described it then it would be an error
otherwise it is just an annoyance. In the excel
case the documentation is correct.

CHIDIST is calculated as CHIDIST = P(X>x)

>
>Possibly there is a properly "trained statistician" in this news group
>who could straighten this hiccup out for us interested "amateurs."

Leslie 'Mack' McBride
remove text between _ marks to respond via e-mail

Richard Ulrich

unread,

Dec 26, 2004, 9:08:27 PM12/26/04

to

This note was written so strangely that it is hard to respond to.

It seems to be a response to my note, but it deleted 100%
of my words, while saving what I quoted -- I have replaced
what I quoted, with what I replied.

Also, what D. Touie is describing seems confused to me.

On Sat, 25 Dec 2004 03:05:20 GMT, D. Touie <dto...@tscnet.com> wrote:

> From: Richard Ulrich <Rich....@comcast.net>
> Newsgroups: sci.stat.math
> Subject: chi-square test goodness of fit
> Date: Fri, 24 Dec 2004 16:10:42 -0500
>

[ snip original, and restoring my reply ]
RU.

" I think you are complaining that computer software
developers regularly use "cumulative distribution functions"
which take the integral from zero (or minus infinity) to
plus infinity.

" I believe the explanation is that computer programmers
(in this sort of case) have written in such a way that
they will retain some respect from trained statisticians
- instead of trying to please amateur data-analyst hackers
who like to see their "p < 0.05" ."

DT >

> Fascinating argument.
>
> I am guessing you equate "trained statisticians" with "mathematical
> statisticians." I am, of course, a mere applied statistician.

I started work as a computer programmer who knew a
little statistics. I became a data-analyst with a math
background and some extra knowledge of statistics,
and eventually I was trained as a statistician -- So, I
was exposed, over and over, to PDFs and CDFs, and
other terminology particular to statistics.

I'm not sure that I use the term "applied statistician".
Right this minute, it seems to me to describe people
like myself, statisticians who do have a lot of ability
to actually analyze data.

I *don't* think I should call someone an "applied
statistician" who happens to know a little statistics
and analyzes data, but I'm not sure what to call someone
who has learned from books on "applied statistics."
Well, i've said data-analyst before, and I don't think of
anything I like better.

>
> I did do some due diligence research before posting my "complaint."
>
> Left over from the dozen or so statistics texts I used to keep on
> hand, I have two basic books left. Both have Chi-square probability
> transformation tables. One reads from right-to-left, the other
> left-to-right.

"Transformation tables"? Okay, not necessarily
distribution tables.

I don't understand right-to-left and left-to-right, but that
sounds like opposite approaches.

> I am thoroughly chagrined to find out at this late date
> they were written only for us "amateurs."

- at least, not for professionals ....

>
> I also consulted my nearly new TI-89 Titanium calculator. It
> Chi-square functions work as "cumulative distribution functions"
> rather than as tables. It produces the same probability
> transformations as my two basic book tables.

Confusion? Aren't the two books different?
TI-89 gives a label of "cumulative distribution function"
and yet, approaches zero for large values?

>
> Since I posted my "complaint," I thought of a simple experiment. I
> tried it out this morning in my Excel worksheet. I entered 10 made-up
> coin-flipping results into 2 two-bin Chi-square vectors. In the first
> vector I made all 10 results heads. In the second I made it 5 heads
> and 5 tails. I then evaluated the two vectors with the uncorrected
> Excel "Chidist()" function at 1 degree of freedom.
>
> 10 heads = Near zero probability.
>
> 5 heads & 5 tails = Exactly 1 probability.
>
> Are these the results you non-amateurs expect?
>
> Possibly there is a properly "trained statistician" in this news group
> who could straighten this hiccup out for us interested "amateurs."

1) I have no objection to software that provides extra
functions that deal with "test results" if you want; I've
written routines like that myself, for obvious reasons.
2) Excel does not appeal to statisticians. It is hard to
use Excel as evidence for proper practice. -- Especially,
considering this --

Jack Tomsky posted an earlier reply to DT's first note,

"I have found that Excel's rule for the probability tail differs
according to the distribution. Each time I use it, I have to check
whether I need to modify their result. "

Surely, even non-statisticians will detect a problem here....

If you will use groups.google on sci.stat.* and ask for EXCEL,
you will find 800+ threads mentioning the spreadsheet.
A vast number of these will object to using Excel for statistics.

(My own concern is mainly that users don't have decent
documentation on the routines, and that they can't readily
screen and modify their variables -- Screening is hard enough
to do with 'real stat packages' so it is really unfortunate that
people who are struggling worst to understand data are stuck
with the worst tools.)

Actually, statisticians don't seem to like any spreadsheets, but
Excel is the one most often named. Microsoft did not write it
or invent it, but it bundled it and made it a cheap standard.

D. Touie

unread,

Dec 27, 2004, 5:06:37 AM12/27/04

to

On Sun, 26 Dec 2004 21:08:27 -0500, Richard Ulrich
<Rich....@comcast.net> wrote:

>On Sat, 25 Dec 2004 03:05:20 GMT, D. Touie <dto...@tscnet.com> wrote:

UR

This note was written so strangely that it is hard to respond to.

It seems to be a response to my note, but it deleted 100%
of my words, while saving what I quoted -- I have replaced
what I quoted, with what I replied.

DT
Yes, I did delete your words. I was probably tired. Seen now in pure
message isolation it seems odd to me too.

UR

Also, what D. Touie is describing seems confused to me.

<snip>

DT

> I did do some due diligence research before posting my "complaint."
>
> Left over from the dozen or so statistics texts I used to keep on
> hand, I have two basic books left. Both have Chi-square probability
> transformation tables. One reads from right-to-left, the other
> left-to-right.

UR

"Transformation tables"? Okay, not necessarily distribution tables.

DT
For me the tables serve the primary purpose of transforming
(converting?) accumulated chi-square sums into probability amounts. Of
course the table are based on the chi-square distribution so they
could also be termed distribution tables.

UR

I don't understand right-to-left and left-to-right, but that
sounds like opposite approaches.

DT
They both express the same relationship between accumulated chi-square
sums and their probability amounts. They merely do it in opposite
directions on their pages.

DT

> I also consulted my nearly new TI-89 Titanium calculator. It
> Chi-square functions work as "cumulative distribution functions"
> rather than as tables. It produces the same probability
> transformations as my two basic book tables.

UR

Confusion? Aren't the two books different?
TI-89 gives a label of "cumulative distribution function"
and yet, approaches zero for large values?

DT
As I hope you can see by now, my book tables and my TI-89 evaluate
larger chi-square sums as higher probability amounts. This is opposite
what Excel does, and what you suggest is proper practice.

> Since I posted my "complaint," I thought of a simple experiment. I
> tried it out this morning in my Excel worksheet. I entered 10 made-up
> coin-flipping results into 2 two-bin Chi-square vectors. In the first
> vector I made all 10 results heads. In the second I made it 5 heads
> and 5 tails. I then evaluated the two vectors with the uncorrected
> Excel "Chidist()" function at 1 degree of freedom.
>
> 10 heads = Near zero probability.
>
> 5 heads & 5 tails = Exactly 1 probability.
>
> Are these the results you non-amateurs expect?

Another poster took me to task about this question. If this Excel
evaluation reflects current proper chi-square distribution tail
direction, my nit pick is the 5 heads & 5 tails probability amount
should be near 1, not exactly 1.

I did some more looking. This time on the Internet. I found four
sites, including NIST's, that plot the chi-square distribution tails
your (and Excel's) way.

My conclusion is I can evaluate either way with impunity. But it does
seem to me that what is proper practice here is wholly dependent on
what authority I rely on.

UR

2) Excel does not appeal to statisticians. It is hard to
use Excel as evidence for proper practice. -- Especially,
considering this --

DT
I used Excel in this instance to do a quick and dirty answer to the
original poster's question.

UR

Jack Tomsky posted an earlier reply to DT's first note,

"I have found that Excel's rule for the probability tail differs
according to the distribution. Each time I use it, I have to check
whether I need to modify their result. "

Surely, even non-statisticians will detect a problem here....

DT
I entirely agree.

Richard Ulrich

unread,

Dec 27, 2004, 1:18:09 PM12/27/04

to

- Trying anew to be explicit and exact -
DT confuses me more.

On Mon, 27 Dec 2004 10:06:37 GMT, D. Touie <dto...@tscnet.com> wrote:
>
[snip, various, from him and from me.]

> DT
> As I hope you can see by now, my book tables and my TI-89 evaluate
> larger chi-square sums as higher probability amounts.

CDF -- Larger chi-square sums have larger (higher) p-values,
increasing eventually to approach 1.0.
Okay, that would be CORRECT for a "cumulative distribution
function." That's not what I thought was said before.

Or -- this post otherwise is consistent with previous posts
if "higher probability amounts" is supposed to denote p-values
that are *smaller*. Forgive me if I find that confusing, again.

> This is opposite
> what Excel does, and what you suggest is proper practice.

And, I thought we established that Excel goes both ways.

For the *first* example from Excel --
I was explaining that "real statisticians" who have seen
a lot of theory will be very comfortable with functions
that show increasing p with increasing X^2 (or whatever),
and this is well known as the CDF.

A package with good coherency over many functions will
probably see fit to implement the CDF, especially if they
want respect of "statisticians." Subroutine Packages like
IMSL always had the CDFs, and I think that's what is
implemented in SPSS and SAS (for instance).
What *else* they implement is fine, so long as it is labeled.
What *else* is in some text book is fine with me,
so long as it is labeled. If the TI-89 clearly labels something
as the simple "cumulative distribution function" and it
approaches 0.0 as the X^2 value gets larger, well, that is
simply wrong.

By the term "statistician," I refer to people who know some
of the theory of probability and statistics. Data analysts
who use some statistical procedures are worth respecting for
what they know about real data and real problems; I was
a data analyst before I became a statistician.
- But someone else can have another preference for terms.

Bruce Weaver

unread,

Jan 3, 2005, 11:35:49 AM1/3/05

to

D. Touie wrote:

> On Thu, 4 Nov 2004 12:54:08 -0000, "Alexis Gatt"
> <alexism...@yahoo.co.uk> wrote:
>
>
>>Hi guys,
>>
>>I have a very basic question regarding the goodness-of-fit statistic
>>chi-square. The chi-square measure between the observed data and those given
>>by the model I implemented equals 24, and there is 28 degrees of freedom.
>>According to a chi-square table I found in a book, the equivalent
>>probability value is between 0.5 and 0.75.
>>
>>And this is where I am confused. Does this mean that the model predict the
>>data well or not? And why?
>>
>>Many thanks
>>
>>Alexis
>
>
> I apologize for being so late with this response. I only now noticed
> your original posting.
>
> I do not agree with Bruce Weaver's response to you.

My response was:

"Pearson's chi-square is based on the discrepancy between Observed and
Expected frequencies. The more O and E differ, the larger the
chi-square value becomes, and the smaller the p-value becomes.

Therefore, smaller values of chi-square (and larger values of p)
represent better fits."

What do you disagree with? I was not suggesting that smaller values of
chi-square and larger values of p *than reported by the OP* represent
good fits. It was simply a general statement: the smaller the value of
chi-square (and the larger the value of p), the better the fit.

Cheers,
Bruce

D. Touie

unread,

Jan 4, 2005, 2:12:03 AM1/4/05

to

You seem to believe Pearson's Chi-square is a one-sided measurement
construct. It is two-sided. A probability of .01 is the same as a
probability of .99. These two probabilities usually have a different
causality, but the exact same probability meaning.

I think part of the confusion here is the traditional misnomer
"expected" value. It could more properly be described as a
"comparative" value.

For instance, if I flip a fair coin 100 times and record the outcomes
faithfully, I do not "expect" 50 heads and 50 tails to result. That
would be an unexpectedly rare result. What I would expect is about 45
heads and 55 tails to result, or equally 45 tails and 55 heads.

Bruce Weaver

unread,

Jan 4, 2005, 8:44:48 AM1/4/05

to

D. Touie wrote:

> On Mon, 03 Jan 2005 11:35:49 -0500, Bruce Weaver
> <bwe...@lakeheadu.ca> wrote:

>>
>>What do you disagree with? I was not suggesting that smaller values of
>>chi-square and larger values of p *than reported by the OP* represent
>>good fits. It was simply a general statement: the smaller the value of
>>chi-square (and the larger the value of p), the better the fit.
>
>
> You seem to believe Pearson's Chi-square is a one-sided measurement
> construct. It is two-sided. A probability of .01 is the same as a
> probability of .99. These two probabilities usually have a different
> causality, but the exact same probability meaning.

Can you provide any examples of this? I'm afraid you lost me.

>
> I think part of the confusion here is the traditional misnomer
> "expected" value. It could more properly be described as a
> "comparative" value.
>
> For instance, if I flip a fair coin 100 times and record the outcomes
> faithfully, I do not "expect" 50 heads and 50 tails to result. That
> would be an unexpectedly rare result. What I would expect is about 45
> heads and 55 tails to result, or equally 45 tails and 55 heads.
>

You've lost me again. If the coin is fair (which is the hypothesis
being tested), the probability of exactly 50 heads and 50 tails would be
about 0.08, which is not *that* rare. And it is the single outcome with
the highest probability. The probability of exactly 45 heads (and 55
tails), for example, is about 0.05. We all know that the probability of
any single outcome will not be extremely high (especially as N gets
large). But the expected value of a binomial distribution is still N*p.

Richard Ulrich

unread,

Jan 4, 2005, 1:57:23 PM1/4/05

to

Well, to some extent you can extrapolate and say that any
distribution does have two ends.

However, that's pretty misleading to pretend that chisquared
isn't used in a particular way in Goodness of Fit testing,
where it is almost always "one-sided" in the given sense --
0.999 of the time, or 4 nines, or 5 nines?

"The better the fit" is still true at the low-value extreme of
the chi-squared, where the (extremely rare) hypothesis
would ask whether the observed fit is TOO GOOD to
be accidental.

>
> I think part of the confusion here is the traditional misnomer
> "expected" value. It could more properly be described as a
> "comparative" value.
>
> For instance, if I flip a fair coin 100 times and record the outcomes
> faithfully, I do not "expect" 50 heads and 50 tails to result. That
> would be an unexpectedly rare result. What I would expect is about 45
> heads and 55 tails to result, or equally 45 tails and 55 heads.

Is there a point here worth making?

D. Touie

unread,

Jan 5, 2005, 1:47:50 AM1/5/05

to

On Tue, 04 Jan 2005 13:57:23 -0500, Richard Ulrich
<Rich....@comcast.net> wrote:

>On Tue, 04 Jan 2005 07:12:03 GMT, D. Touie <dto...@tscnet.com> wrote:

>> You seem to believe Pearson's Chi-square is a one-sided measurement
>> construct. It is two-sided. A probability of .01 is the same as a
>> probability of .99. These two probabilities usually have a different
>> causality, but the exact same probability meaning.

>Well, to some extent you can extrapolate and say that any
>distribution does have two ends.

The word "magnanimous" comes to me at this moment.

>However, that's pretty misleading to pretend that chisquared
>isn't used in a particular way in Goodness of Fit testing,
>where it is almost always "one-sided" in the given sense --
>0.999 of the time, or 4 nines, or 5 nines?

It may be misleading for you. Apparently your usual Chi-square work is
one-sided, and you take that as a given. At this moment I am working
with a long-term project where both sides of the Chi-square curve are
in play. What a surprise that I see this differently than you.

>"The better the fit" is still true at the low-value extreme of
>the chi-squared, where the (extremely rare) hypothesis
>would ask whether the observed fit is TOO GOOD to
>be accidental.

The "optimal fit" is found at X2 = .50 in every instance. I fully
agree with your final clause above. Can you see how the two clauses in
the above sentence, taken together, would confound me?

>> I think part of the confusion here is the traditional misnomer
>> "expected" value. It could more properly be described as a
>> "comparative" value.

>> For instance, if I flip a fair coin 100 times and record the outcomes
>> faithfully, I do not "expect" 50 heads and 50 tails to result. That
>> would be an unexpectedly rare result. What I would expect is about 45
>> heads and 55 tails to result, or equally 45 tails and 55 heads.

>Is there a point here worth making?

If you cannot see it, then I think it does not exist for you.

D. Touie

unread,

Jan 5, 2005, 12:51:53 PM1/5/05

to

On Tue, 04 Jan 2005 08:44:48 -0500, Bruce Weaver
<bwe...@lakeheadu.ca> wrote:

>D. Touie wrote:

>> On Mon, 03 Jan 2005 11:35:49 -0500, Bruce Weaver
>> <bwe...@lakeheadu.ca> wrote:

>>>What do you disagree with? I was not suggesting that smaller values of
>>>chi-square and larger values of p *than reported by the OP* represent
>>>good fits. It was simply a general statement: the smaller the value of
>>>chi-square (and the larger the value of p), the better the fit.

>> You seem to believe Pearson's Chi-square is a one-sided measurement
>> construct. It is two-sided. A probability of .01 is the same as a
>> probability of .99. These two probabilities usually have a different
>> causality, but the exact same probability meaning.

>Can you provide any examples of this? I'm afraid you lost me.

The problem posed by the original poster in this thread: Chi-square
sum of 24 with 28 degrees of freedom = about .68 probability evaluated
the "professional" way. This result is precisely equivalent to 1 - .68
= about .32 probability for goodness-of-fit testing purposes. In a
goodness-of-fit test, we aim for the middle ground surrounding .50
probability; not the probability extremes, either high or low.

>> I think part of the confusion here is the traditional misnomer
>> "expected" value. It could more properly be described as a
>> "comparative" value.
>>
>> For instance, if I flip a fair coin 100 times and record the outcomes
>> faithfully, I do not "expect" 50 heads and 50 tails to result. That
>> would be an unexpectedly rare result. What I would expect is about 45
>> heads and 55 tails to result, or equally 45 tails and 55 heads.

>You've lost me again. If the coin is fair (which is the hypothesis
>being tested), the probability of exactly 50 heads and 50 tails would be
>about 0.08, which is not *that* rare. And it is the single outcome with
>the highest probability. The probability of exactly 45 heads (and 55
>tails), for example, is about 0.05. We all know that the probability of
>any single outcome will not be extremely high (especially as N gets
>large). But the expected value of a binomial distribution is still N*p.

Apparently you buy into that fair coin teaching nonsense from
beginning college statistics. I do not. I would not think of testing
the hypothesis you hypothesize above.

I see this kind of testing as a fundamental "ordinary sample, or
unusual sample" opportunity. Also it provides me with useful grounding
in my larger goal of measuring randomness.

I made a table of all the outcomes from a 100 coin-flips trial. Here
is a partial summary of that table:

49/51 split = .156 probability
48/52 " = .147
47/53 " = .133
46/54 " = .116
45/55 " = .097

50/50 " = .080

44/56 " = .078

Sixth most common predicted result is rare enough for government work.

I did the 45/55 split estimate off the top of my head from a very old
memory. It turns out I was one off in that estimate. A 46/54 split is
the typical mid-value split I would anticipate from an ordinary 100
coin-flips trial.

Mack

unread,

Jan 5, 2005, 6:47:54 PM1/5/05

to

A 50/50 split might not be common but an 8% chance is still pretty
high. If I was testing if a coin was fair, a 50/50 split would
certainly not be negative evidence. A chi square value of zero
indicates perfect fit with the estimated distribution. In some cases
perfect fit would be suspicious in others it is not. In the coin toss
case a perfect fit is not suspicious.

The table you use above assumes that 49 tails/51 heads has the
same meaning as 51 tails/49 heads which is a bit misleading.
To see why this is so extrapolate to the case of 1/99. This is
a strong indication of bias, but also in a particular direction.
50 tails/ 50 heads is still more common than either 49 tails/51 heads
or 51 tails/49 heads although both of those have the same chi square.

50/50 is the mean value expected. You seem to be expecting a
specific number rather than a range. Since there are two bins
there is one degree of freedom and a chi square of 3.84 or larger
would indicate that the coin is not fair (.05 level of certainty).
Since there are only two bins the addition from each bin is half so
the value from one bin is 1.92. Since the number is balanced we can
multiply by 50 and take the square root and get a limit of about 9.8.
If less than 41 head or more than 59 heads are tossed we have
evidence against the null hypothesis which is that the coin is fair.

There are a number of other ways to calculate this and this certainly
isn't the best. A chi square value isn't proof but only evidence. My
statisics professor used to say you can't prove anything with
statistics but you can draw conclusions from it.

This is equivalent to counting the ones in a bits stream. There will
be a certain mean and a deviation from the mean. If you are testing
randomness then you would expect the normal distribution about
the mean and test for that rather than relying on a single value.

Certain PRNGs fail this test miserably if the low bits are used since
the deviation is too low. Real random number generators usually
have bias while PRNGs by definition have correlation. The tests best
able to detect specific problems will vary. The coin toss case
is a physical system and we can safely assume no memory of the
previous toss (no correlation). However we could easily suspect bias
and test for that. There was some research on coin tossing which is
actually a chaotic system and not quantum randomness. It concluded
that there is bias based on the coin and tossing conditions. Ie. with
a mechanical coin flipper they could consistently make the coin land a
certain way.

Aleks Jakulin

unread,

Jan 6, 2005, 6:42:27 AM1/6/05

to

D. Touie:

> You seem to believe Pearson's Chi-square is a one-sided measurement
> construct. It is two-sided. A probability of .01 is the same as a
> probability of .99. These two probabilities usually have a different
> causality, but the exact same probability meaning.

Although you have a point, I'd express it differently:

The chi-squared X^2 *statistic* is a kind of a distortion measure
which doesn't distinguish between observing 0.6 instead of the
expected 0.5 compared to observing 0.4 instead of the expected 0.5.

Note that this is different with G^2.

--
mag. Aleks Jakulin
http://www.ailab.si/aleks/
Artificial Intelligence Laboratory,
Faculty of Computer and Information Science,
University of Ljubljana, Slovenia.

Aleks Jakulin

unread,

Jan 6, 2005, 6:47:05 AM1/6/05

to

> B.Weaver wrote:
>>
>>What do you disagree with? I was not suggesting that smaller values
>>of chi-square and larger values of p *than reported by the OP*
>>represent good fits. It was simply a general statement: the
>>smaller the value of chi-square (and the larger the value of p), the
>>better the fit.
>
> You seem to believe Pearson's Chi-square is a one-sided measurement
> construct. It is two-sided. A probability of .01 is the same as a
> probability of .99. These two probabilities usually have a different
> causality, but the exact same probability meaning.

I guess the disagreement is because D. Touie considers 'p' the
probability, and B. Weaver 'p' the p-value...

Aleks

Richard Ulrich

unread,

Jan 6, 2005, 8:15:05 PM1/6/05

to

On Wed, 05 Jan 2005 17:51:53 GMT, D. Touie <dto...@tscnet.com> wrote:

[snip]

>
> The problem posed by the original poster in this thread: Chi-square
> sum of 24 with 28 degrees of freedom = about .68 probability evaluated
> the "professional" way. This result is precisely equivalent to 1 - .68
> = about .32 probability for goodness-of-fit testing purposes.

Now that you have heard of it, can you start using "CDF" to
make your posts intelligible? I don't know whether <the above>
is right in any sense at all, or all turned around again.

For large d.f., the expected value of chi-squared is the d.f.
Thus, a value of the test statistics that is less than the d.f. is
on the low side of the cumulative distribution function;
I assume that this is what is denoted by "professional" way,
since Touie was offended when I pointed out that statistics
"professionals" are not surprised by stat-software that
provide the (unambiguous) CDF, instead of a dubious p-level.
(Excel confusions apparently just struck again.)

Thus, the GOF p-value (upper tail) is going to be 0.68, as
99.99% of testers would be concerned and use it -- ignoring
the low end of chi-squared.

> In a
> goodness-of-fit test, we aim for the middle ground surrounding .50
> probability; not the probability extremes, either high or low.
>

That's glib, and practically wrong. Usually, there's no reason
to worry about the Goodness of Fit being "too good", so
the closer to 0.0 CDF (p-value, 1.0), the better.

- I know that I have never seen anyone formalize a concern
about the fit being "too good" in the absense of explanatory
variables.

- I do know that one *reason* for a fit that is "too good"
is that the GOF problem may have been set up wrong.

For instance, I have seen someone try to interpret the
test "with 28 degrees of freedom" that the computer
program gives, even though half those contrasts were
fixed at zero: I'm saying, an informed, statistician-viewer
would have known that the "24.0" was owing to (say) only 14 d.f.;
and did potentially indicate lack-of-fit.

--
Rich Ulrich, wp...@pitt.edu
http://www.pitt.edu/~wpilib/index.html