Scope of Fisher's Exact test

55 views
Skip to first unread message

Margaret

unread,
Jan 11, 2006, 1:19:57 PM1/11/06
to MedStats
Dear all

I am a little confused by seemingly conflicting messages I am receiving
concerning when to use Fisher's Exact test. I am clear on the
condition for the Chi-Square test of association relating to the need
for at least 80% of the expected frequencies to be of size 5 or more
and the usefulness of Fisher's Exact test where this condition is
violated. However, my query is specifically on whether the latter test
can be generalized to cover cases where the contingency table is large
than 2 x 2. I had assumed that this was okay, as I appear to get a
p-value as output for this test using SPSS for Fisher's Exact test for
these larger tables. Furthermore, on page 64 of 'Medical Statistics at
a Glance' (2004), I read "Some computer packages will compute Fisher's
exact P-values for larger contingency tables', where 'larger' appears
to mean larger than 2 x 2. Nevertheless, all other references which I
have found thus far stipulate that Fisher's Exact test is specifically
for the 2 x 2 case.

I would greatly value some assistance here.

Many thanks

Best wishes

Margaret

Ted Harding

unread,
Jan 11, 2006, 2:57:53 PM1/11/06
to MedS...@googlegroups.com

Fisher's exact test, in its original form, was for 2x2 tables
(and in that case it is relatively simple to grasp). It was
subsequently generalised to larger tables (I don't think it
suddenly burst on the world, but Mehta and Patel (see below)
seem to attribute the definitive exposition to Freeman and
Halton (1951)).[1]

In all cases, however, the sample space with respect to which
the probabilities of possible outcomes are calculated is the
set of all possible re-arrangements of the items within the
cells such that the marginal totals are preserved.

In this respect, even for 2x2 tables, it is conceptually,
and not merely procedurallyally, different from the usual
chi-squared test (which does not condition on the marginal
totals).

In the 2x2 case, the test statistic for the Fisher test
is (or at any rate is equivalent to) the value of the number
of cases in cell (1,1), all others being thereby determined.
Also, this is equivalent to the odds-ratio (a*d)/(b*c) in
the data (since as a increases, so does d; while b decreases
and so does c; hence increasing a increases the OR). And
the distribution of "a" depends on one parameter, the odds
ratio R = (pA*pD)/(pB*pC) in the parent population; the null
hypothesis is that R = 1; and from the data you can derive a
confidence interval for R. The straightforward correspondence
between sample odds-ratio and population odds-ratio makes
interpretation simple.

The measure of discrepancy between the data and the null
hypothesis can be taken as the difference between (a*d)/(b*c)
and 1; and the significance of this value is the probability
in the tail[s] of the null distribution at an beyond this
value of discrepancy.

(Here of course "a" and "pA" denote number, and population
probability, for cell (1,1); "b", "pB" for (1,2); "c" and
"pC" for (2,1); and "d" and "pD" for (2,2)).

Once you get above 2x2, hoever, this is not so straightforward.
First, with the marginals fixed, the table no longer depends
on fixing the value of a single cell. E.g. take the 2x3 table:

1 3 4

4 1 2

with column margins 5 4 6 and row margins 8 7.

You can now vary (e.g.) both (1,1) and (1,2) independently
of each other (within limits). For example, (a) change (1,1)
but keep (1,2) fixed, preserving marginals:

2 3 3

3 1 3

and (b) with the new value of (1,1) now change (1,2):

2 2 4

3 2 2

So there are two dimensions to the departure from the NH
of independence. So what, in this case, are you going to
adopt as your measure of discrepancy between the data and
the null hypothesis? There are infinitely many possibilities,
depending on how you choose to combine the departure of
N(1,1) from its expected value under independence, and
the departure of N(1,2) from its expectation (or, at your
choice, any other two cells). Since the P-value is a single
number, this must correspond to a single dimension of discrepancy;
and your choice of discrepancy measure corresponds to how
you will calculate the P-value (ands therefore will determine
the P-value you get from the one data set).

The general principle is that the set of all possible contingency
tables that are consistent with the fixed marginal totals is
ranked according to a measure of discrepancy, and the "P-value"
is the sum of the probabilities of the tables that are at
or beyond the rank of the observed table.

There are therefore a lot of choices for this ranking. You
can download a technical paper from the Cytel website (vendors
of the StatXact package) which gives a good theoretical survey
of this issue, and lists possible approaches. A lot depends
on the nature of the categories represented by the rows and
columns -- for instance, do the have a natural ordering
(ordinal variables)?; are they essentially unordered (purely
categorical)?; are you interested in trends along ordered
categories? and so on.

See

Exact Inference for Categorical Data (PDF 231KB)
Cyrus R. Mehta and Nitin R. Patel
Harvard University and Cytel Inc.
January 1, 1997

http://www.cytel.com/Papers/sxpaper.pdf

The P-value you get will depend on which of these or similar
options you have adopted.

As to the computation, while technically difficult (efficient
algorithms for working through the possible tables consistent
with fixed margins are tricky), the strategy is straightforward:
For each possible table, evaluate its probability under the
null hypothesis, and also its "discrepancy value" relative
to the NH. When that is done, arrange the tables in order
according to discrepancy, find the observed one, and add up
the probabilities for the observed one and all tables more
discrepant. The result is the P-value.

As to what options would be available to you when you embark
on the computation, well that depends on what your software
offers you. (Doing it by hand, except for very tiny tables,
is NOT recommended ... ).

To come back more specifically to your original question:
What options does SPSS offer you?

A commonly adopted discrepancy is the rank the tables by
their very probabilties under the NH (the smaller the probability,
the more discrepant); another is to evaluate chi-squared,
and rank according to this (but get the P-value by the exact
method), and so on. But you can use analogues of other tests
too. See the Mehta-Patel paper above for details.

But the short answer is: Yes, there are RxC versions of the
Fisher Exact 2x2 table test for R and/or C greater than 2;
and there are lots of them (all exact in the sense that
the P-value is exact in terms of the probabilities implied
by the null hypothesis).

Hoping this helps,
Ted.

[1] Freeman GH, Halton JH (1951). Note on an exact treatment
of contingency, goodness of fit and other problems of
significance.
Biometrika 38:141-149.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.H...@nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 11-Jan-06 Time: 19:57:48
------------------------------ XFMail ------------------------------

Margaret

unread,
Jan 12, 2006, 5:51:49 AM1/12/06
to MedStats
Many thanks, Ted for this tremendous and highly generous reply.
Presumably, where we read, 'Fisher's Exact Test' in the row of a table
of output obtained by running SPSS we are obtaining what you describe
as a p-value found by ranking tables "by
their very probabilties under the NH" (particularly, since the table
row above this one already provides a p-value for the Pearson
Chi-Square test both in the usual way and by means of an Exact method).
It is a little bit of a concern to me that when you right-click on one
of these tables for a greater than 2 x 2 case, you read that Fisher's
Exact test is for the 2 x 2 case. It would be good to know for sure
that when SPSS generates such a table for a 3 x 4 case, say, the
p-value generated for Fisher's Exact test is truly reliable and is
derived using the 'ranking tables' method you competently describe.

Many thanks

Best wishes

Margaret

Timot...@iop.kcl.ac.uk

unread,
Jan 12, 2006, 6:12:56 AM1/12/06
to MedS...@googlegroups.com
My belief is that the 'ranking table' method as described by Ted is what
you obtain if you specify the 'Exact' option in your SPSS crosstabulation,
and the measure of discrepancy, I believe, is the normal Pearson X2
statistic. The Fisher's exact test, in a 2 x 2 case, very often matches
this probability exactly, but not always. For tables larger than 2 x 2, I
have less experience, but the Fisher-Freeman-Halton test does have a very
complicated formula, according to the Stata manual, and usually does not
match that obtained by the Exact method.

My opinion in choosing these tests is - don't worry too much. Choose one
and stick to it. There's nothing worse than using Fisher's on one
comparison, and switching to Pearson on the other, and using the exact on
the third. The reason I said this is that the choice between Fisher's and
Pearson on a 2 x 2 table is already a century old debate concerning
whether you should condition on the margins in the 'usual' observational
study case. As a result, I'm pretty sure there is no 'absolutely right'
way of analysing a r x c table.

Tim


"Margaret" <Margaret....@ed.ac.uk>
Sent by: MedS...@googlegroups.com
12/01/2006 10:51
Please respond to
MedS...@googlegroups.com


To
"MedStats" <MedS...@googlegroups.com>
cc

Subject
MEDSTATS: Re: Scope of Fisher's Exact test

Ted Harding

unread,
Jan 12, 2006, 7:26:22 AM1/12/06
to MedS...@googlegroups.com
On 12-Jan-06 Timot...@iop.kcl.ac.uk wrote:
>
> My belief is that the 'ranking table' method as described
> by Ted is what you obtain if you specify the 'Exact' option
> in your SPSS crosstabulation, and the measure of discrepancy,
> I believe, is the normal Pearson X2 statistic.

I suspect so, too, but I'm not familiar with SPSS not do I
have the documentation, so cannot check. It would certainly
be the most straightforward and computationally lightest!

> The Fisher's exact test, in a 2 x 2 case, very often matches
> this probability exactly, but not always. For tables larger
> than 2 x 2, I have less experience, but the Fisher-Freeman-Halton
< test does have a very complicated formula, according to the Stata
> manual, and usually does not match that obtained by the Exact method.
>
> My opinion in choosing these tests is - don't worry too much.
> Choose one and stick to it. There's nothing worse than using
> Fisher's on one comparison, and switching to Pearson on the
> other, and using the exact on the third.

Tha't probably pretty sound pragmatic advice!

> The reason I said this is that the choice between Fisher's
> and Pearson on a 2 x 2 table is already a century old debate
> concerning whether you should condition on the margins in the
> 'usual' observational study case.

Well, make it 80 uears, since Fisher proposed his approach
in 1925! But yes, that is one of the big issues (as I said
previously, the Fisher method is "conceptually different"
from the chi-squared, for precisely this reason).

Indeed, Fisher originally devised his method (using the
probability of the table conditional on the margins) to
remove the "nuisance parameters" inherent in the situation,
namely the marginal probabilities (which the exact distribution
of, say, Pearson's chi-squared depends on, though this dependency
fades away aymptotically in large samples). From that point
of view, it is an ingenious solution.

But, in his exposition, he was obliged to ackowledge that data
obtained in a way that enforces fixed marginals is likely to be
uncommon, and came up with his cepebrated example of the
"tea-tasting experiment", on ther lines of:

A lady claims to be able to tell by tasting whether
the milk is poured first into a cup of tea, and then
the tea, or tea first and then milk.

So an experiment is devised to test this claim.
10 cups of tea are prepared, 5 with milk first
and 5 with tea first, and they are presented in
random order to the lady, who has been told that
there are 5 of each. For each cup, she must decide
either "milk first" or "tea first" (no "undecided").

The data can therefore be represented as a 2x2 table,
according to the true situation (the rows) and the
lady's decision (the columns).

The row-margins are certainly fixed, at 5 each, because
that is how the experiment is designed. On the asumption
that the lady will use her knowledge of the design in
making her decisions, one can expect that she will also
decide on 5 of each. Hence (ideally ... ) the column
margins are also fixed at 5 each. In this situation,
the conditions underlying Fisher's exact test will be
satisfied in all such experiments, and so the sample
space of Fisher's test is realistic.


> As a result, I'm pretty sure there is no 'absolutely
> right' way of analysing a r x c table.

And the debate rumbles on! I like the 2x2 table problem,
since it is almost the very simplest statistical problem
that you could possibly state. Yet the worms which emerge
from the can when you try to open it are some of the
largest, slimiest and most wriggly[1] in the whole of the
theory of statistical inference.

Best wishes to all,
Ted.

[1] This reminds me that the late (and prematurely deceased)
Michael Sampford used to delight in telling how, once
upon a time (pre-1960s) the Royal Statistical Society
was run by two formidably efficient secretaries whose
names were Miss Wrigley and Miss Crawley.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.H...@nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861

Date: 12-Jan-06 Time: 12:26:18
------------------------------ XFMail ------------------------------

John Whittington

unread,
Jan 12, 2006, 9:52:00 AM1/12/06
to MedS...@googlegroups.com
At 12:26 12/01/06 +0000, Ted Harding wrote (in small part):

> So an experiment is devised to test this claim.
> 10 cups of tea are prepared, 5 with milk first
> and 5 with tea first, and they are presented in
> random order to the lady, who has been told that
> there are 5 of each. For each cup, she must decide
> either "milk first" or "tea first" (no "undecided").

As a total aside to the discussion in hand, my attention was grabbed by
this description of Fisher's "celebrated tea-tasting experiment".

As described by Ted, that experimental design would appear to be
appreciably affected by the constraint resulting from telling the lady that
there were '5 of each'. Particularly if, as seems to be implied, the cups
of tea were presented serially, with no option for subsequent modification
of a decision once made, the constraint would clearly remove all choice in
relation to the last cup of tea tasted, and possibly remove all choice in
relation to one or more other cups.

My first reaction was to feel that this constraint must surely reduce the
'power' of the experiment to assess the hypothesis under test, but in
trying to formalise my thoughts about this, I have managed to confuse
myself! I think that much of the confusion arises because, although this
particular 'complicating factor' would obviously not exist in an experiment
with no such constraints (i.e. one in which the number of 'milk first' cups
could be anything from 0 to 10), one also expects that the power of the
experiment to test the hypothesis will diminish if there are too few of one
or other of the 'types of tea'. That leads me to suspect that the best
experiment would probably be one in which the true proportion of one type
of tea was, indeed, 50%. but the assessor had been told that it could be
anything between 0 and 100% - but leaves me wondering how much effect (if
any!) the constraint, if present, actually has.

Can someone help me get my mind around this question?

Kind Regards,


John

----------------------------------------------------------------
Dr John Whittington, Voice: +44 (0) 1296 730225
Mediscience Services Fax: +44 (0) 1296 738893
Twyford Manor, Twyford, E-mail: Joh...@mediscience.co.uk
Buckingham MK18 4EL, UK medis...@compuserve.com
----------------------------------------------------------------

Margaret

unread,
Jan 12, 2006, 2:36:54 PM1/12/06
to MedStats
Dear Timothy and Ted

On the basis of what you have said I think an alternative possibility
is more likely.

In particular, in the row for the 'normal' Chi-Square test, I
automatically obtain a 'normal' and an Exact p-value alongside the
regular Chi-Square statistic. I suspect that the Exact p-value is
found as you describe. However, once I select the option 'Exact', I
gain an additional row, where appropriate, with results for Fisher's
Exact test. This row has a new test statistic with a markedly
different value and is therefore not calculated as Timothy suggests as
the usual Chi-Square statistic. However, it is accompanied by an Exact
p-value and this value is not the same as the Exact p-value for the
ordinary Chi-Square test. Can you please suggest why this is the case.
It would be good to know which method is being used where.

Many thanks

Regards

Margaret

Ted Harding

unread,
Jan 12, 2006, 4:38:51 PM1/12/06
to MedS...@googlegroups.com
On 12-Jan-06 Margaret wrote:
>
> Dear Timothy and Ted
>
> On the basis of what you have said I think an alternative possibility
> is more likely.
>
> In particular, in the row for the 'normal' Chi-Square test, I
> automatically obtain a 'normal' and an Exact p-value alongside the
> regular Chi-Square statistic. I suspect that the Exact p-value is
> found as you describe. However, once I select the option 'Exact', I
> gain an additional row, where appropriate, with results for Fisher's
> Exact test. This row has a new test statistic with a markedly
> different value and is therefore not calculated as Timothy suggests as
> the usual Chi-Square statistic. However, it is accompanied by an Exact
> p-value and this value is not the same as the Exact p-value for the
> ordinary Chi-Square test. Can you please suggest why this is the case.
> It would be good to know which method is being used where.

Can I suggest that you post (or send privately, off-list, if
preferred) the data for the example in question? I could then
run it past an implementation of Fisher's exact test where I
know what's going on (I think ... ).

That might help to resolve the issue.

Best wishes,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.H...@nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861

Date: 12-Jan-06 Time: 21:38:48
------------------------------ XFMail ------------------------------

Timot...@iop.kcl.ac.uk

unread,
Jan 13, 2006, 4:39:37 AM1/13/06
to MedS...@googlegroups.com
I'm using SPSS 12.0.1 for Windows, and if I don't specify the 'Exact'
option, I get only one column of p-values, that for the asymptotic
results. If I specify the 'Exact' option, I get 3 additional columns. The
first column (2-tailed), I believe are the exact p-values obtained by the
table-ranking method. In addition, there is also an extra row of results,
and that's the Fisher's Exact. I don't know how this p-value is
calculated. Maybe it's the p-value based on another discrepancy measure
using the same table-ranking technique. For all the above I'm using a 3 x
3 cross tab.

If it is a 2 x 2, I get 6 rows of results whether or not I specify the
Exact option. The Fisher's exact test is given regardless. The only
difference is the other rows now also have exact probabilities.

Hope that helps.
Tim

12/01/2006 19:36

Margaret

unread,
Jan 14, 2006, 6:03:37 AM1/14/06
to MedStats
Dear Ted

Many thanks for this kind offer, which is much appreciated, I shall be
in touch with you shortly.

Regards

Margaret

Reply all
Reply to author
Forward
0 new messages