[R] Fisher's Test 5x4 table

312 views
Skip to first unread message

paul brett

unread,
Aug 27, 2015, 4:24:11 PM8/27/15
to r-h...@r-project.org
Dear all,
I am trying to do a fishers test on a 5x4 table on R
statistics. I have already done a chi squared test using Minitab on this
data set, getting a result of (1, N = 165.953, DF 12, p>0.001), yet using
these results (even though they are excellent) may not be suitable for
publication. I have tried numerous other statistical packages in the hope
of doing this test, yet each one has just the 2x2 table.
I am struggling to edit the template fishers test on R to fit
my table (as according to the R book it is possible, yet i cannot get it to
work). The template given on the R documentation and R book is for a 2x2
fisher test. What do i need to change to get this to work? I have attached
the data with the email so one can see what i am on about. Or do i have to
write my own new code to compute this.

Yours Sincerely,
Paul Brett
w.txt

Gerrit Eichner

unread,
Aug 28, 2015, 2:58:26 AM8/28/15
to paul brett, r-h...@r-project.org
Dear Paul,

quoting the email-footer: "PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented,
minimal, self-contained, reproducible code."

So, what exactly did you try and what was the actual problem/error
message?

Besides that, have you noted that two of you data rows have the same name?


Have you read the online help page of fisher.test():

?fisher.test


Have you tried anything like the following?

W <- as.matrix( read.table( "w.txt", head = T)[-1])

fisher.test( W, workspace = 1e8)
# For workspace look at the help page, but it presumably
# won't work because of your sample size.


set.seed( 20150828) # for reproducibility
fisher.test( W, simulate.p.value = TRUE, B = 1e5)
# For B look at the help page.


Finally: Did Minitab really report "p > 0.001"? ;-)

Hth -- Gerrit
______________________________________________
R-h...@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Gerrit Eichner

unread,
Aug 28, 2015, 9:55:29 AM8/28/15
to paul brett, R-Help
Paul,

as the error messages of your first three attempts (see below) tell you -
in an admittedly rather cryptic way - your table or its sample size,
respectively, are too large, so that either the "largest (hash table) key"
is too large, or your (i.e., R's) workspace is too small, or your
hardware/os cannot allocate enough memory to calculate the p-value of
Fisher Exact Test exactly by means of the implemented algorithm.

One way out of this is to approximate the exact p-value through
simulation, but apparently there occurred a typo in your (last) attempt to
do that (Error: unexpected '>' in ">").


So, for me the following works (and it should also for you) and gives the
shown output (after a very short while):

> Trapz <- as.matrix( read.table( "w.txt", head = T, row.names = "Traps"))

> set.seed( 20150828) # For the sake of reproducibility.
> fisher.test( Trapz, simulate.p.value = TRUE,
+ B = 1e5)

Fisher's Exact Test for Count Data with simulated p-value (based on
1e+05 replicates)

data: Trapz
p-value = 1e-05
alternative hypothesis: two.sided



Or for a higher value for B if you are patient enough (with a computing
time of several seconds) :

> set.seed( 20150828)
> fisher.test( Trapz, simulate.p.value=TRUE, B = 1e7)

Fisher's Exact Test for Count Data with simulated p-value (based on
1e+07 replicates)

data: Trapz
p-value = 1e-07
alternative hypothesis: two.sided


Hth -- Gerrit

(BTW, you don't have to specify arguments (in function calls) whose
default values you don't want to change.)



On Fri, 28 Aug 2015, paul brett wrote:

> Hi Gerrit,
> I spotted that, it was a mistake on my own part, it should
> read 1.trap.2.barrier. I have corrected it on the file attached.
>
> So I have done these so far:
> > fisher.test(Trapz, workspace = 200000, hybrid = FALSE, control = list(),
> or = 1, alternative = "two.sided", conf.int = TRUE, conf.level =
> 0.95,simulate.p.value = FALSE, B = 2000)
> Error in fisher.test(Trapz, workspace = 2e+05, hybrid = FALSE, control =
> list(), :
> FEXACT error 501.
> The hash table key cannot be computed because the largest key
> is larger than the largest representable int.
> The algorithm cannot proceed.
> Reduce the workspace size or use another algorithm.
>
>> fisher.test(Trapz, workspace = 2000, hybrid = FALSE, control = list(), or
> = 1, alternative = "two.sided", conf.int = TRUE, conf.level =
> 0.95,simulate.p.value = FALSE, B = 2000)
> Error in fisher.test(Trapz, workspace = 2000, hybrid = FALSE, control =
> list(), :
> FEXACT error 40.
> Out of workspace.
>> fisher.test(Trapz, workspace = 1e8, hybrid = FALSE, control = list(), or
> = 1, alternative = "two.sided", conf.int = TRUE, conf.level =
> 0.95,simulate.p.value = FALSE, B = 2000)
> Error in fisher.test(Trapz, workspace = 1e+08, hybrid = FALSE, control =
> list(), :
> FEXACT error 501.
> The hash table key cannot be computed because the largest key
> is larger than the largest representable int.
> The algorithm cannot proceed.
> Reduce the workspace size or use another algorithm.
>> fisher.test(Trapz, workspace = 2000000000, hybrid = FALSE, control =
> list(), or = 1, alternative = "two.sided", conf.int = TRUE, conf.level =
> 0.95,simulate.p.value = FALSE, B = 2000)
> Error: cannot allocate vector of size 7.5 Gb
> In addition: Warning messages:
> 1: In fisher.test(Trapz, workspace = 2e+09, hybrid = FALSE, control =
> list(), :
> Reached total allocation of 6027Mb: see help(memory.size)
> 2: In fisher.test(Trapz, workspace = 2e+09, hybrid = FALSE, control =
> list(), :
> Reached total allocation of 6027Mb: see help(memory.size)
> 3: In fisher.test(Trapz, workspace = 2e+09, hybrid = FALSE, control =
> list(), :
> Reached total allocation of 6027Mb: see help(memory.size)
> 4: In fisher.test(Trapz, workspace = 2e+09, hybrid = FALSE, control =
> list(), :
> Reached total allocation of 6027Mb: see help(memory.size)
>
> fisher.test(Trapz, workspace = 1e8, hybrid = FALSE, control = list(), or =
> 1, alternative = "two.sided", conf.int = TRUE, conf.level =
> 0.95,simulate.p.value = TRUE, B = 1e5)
> Error: unexpected '>' in ">"
>
> So the issue could be perhaps that R cannot compute my sample as the
> workspace needed is too big? Is there a way around this? I think I have
> everything set out correctly.
> Is my only other alternative is to do a 2x2 fisher test for each of the
> variables?
>
> I attach on the pdf the Minitab result for the Chi squared test as proof (I
> know that getting very low p values are highly unlikely but sometimes it
> happens). Seeing is believing i suppose!
>
> Regards,
> Paul

paul brett

unread,
Aug 28, 2015, 10:28:40 AM8/28/15
to Gerrit Eichner, r-h...@r-project.org
w.txt
Chi squared test in Minitab.pdf

paul brett

unread,
Aug 30, 2015, 6:10:29 PM8/30/15
to Gerrit Eichner, R-Help
Hi Gerrit,
I tried both of your suggestions and got the exact same thing.
Fisher's Exact Test for Count Data with simulated p-value (based on 1e+05
replicates)

data: Trapz
p-value = 1e-05
alternative hypothesis: two.sided

I put in a few changes myself based on the details section on what should
be used for a larger than 2x2 table, getting the exact same thing as
before. I have removed or = 1, conf.int = TRUE. Added y = NULL, control =
list(30) and changed simulate.p.value = TRUE.
> fisher.test( Trapz, y = NULL, workspace = 200000, hybrid = TRUE,control =
list(30), simulate.p.value = TRUE, B =1e5)
isher's Exact Test for Count Data with simulated p-value (based on 1e+05
replicates)

data: Trapz
p-value = 1e-05
alternative hypothesis: two.sided

> fisher.test( Trapz, y = NULL, workspace = 200000, hybrid = TRUE,control =
list(30), simulate.p.value = TRUE, B =1e7)

Fisher's Exact Test for Count Data with simulated p-value (based on 1e+07
replicates)

data: Trapz
p-value = 1e-07
alternative hypothesis: two.sided


Dispite these chages, the changes equations is not giving me the results
for the calculations. The changes I have made seem to satisfy what is in
the details section on R, and I don't have the issue of workspace in R.
What I do to get the results of the fisher test?
Is there something simple that I am missing?

Regards,
Paul
[[alternative HTML version deleted]]

peter dalgaard

unread,
Aug 31, 2015, 3:02:13 AM8/31/15
to paul brett, R-Help

> On 30 Aug 2015, at 13:54 , paul brett <brett...@gmail.com> wrote:
>
> Fisher's Exact Test for Count Data with simulated p-value (based on 1e+07
> replicates)
>
> data: Trapz
> p-value = 1e-07
> alternative hypothesis: two.sided
>
>
> Dispite these chages, the changes equations is not giving me the results
> for the calculations. The changes I have made seem to satisfy what is in
> the details section on R, and I don't have the issue of workspace in R.
> What I do to get the results of the fisher test?
> Is there something simple that I am missing?

The theory?

There is nothing more to Fisher's test than the calculation of the probability of obtaining a table as or less (im-)probable as the one observed. This is the p-value. You have done 10 million simulations and not found a single table that is less likely than the one observed. Hence, the p-value is 1/10 000 001 = ca. 1e-7, counting in the observed table.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd....@cbs.dk Priv: PDa...@gmail.com

Gerrit Eichner

unread,
Aug 31, 2015, 3:32:15 AM8/31/15
to paul brett, R-Help
Paul,

in addition to Peter's suggestion about the missing of theory you are also
completely missing to explain what you mean by "[it] is not giving me the
results for the calculations" or "[how] to get the results of the fisher
test". They are there in the output of R's fisher.test() (if you have an
idea about the theory).

And again:

> fisher.test( Trapz, simulate.p.value = TRUE, B = 1e5)

specifies enough arguments in the case of simulating to approximate the
p-value since workspace (quoting from the help page) is "Only used for
***non-simulated*** p-values [of] larger than 2 by 2 tables." (Similarly,
control and hybrid are not needed either here.)

Regards -- Gerrit

---------------------------------------------------------------------
Dr. Gerrit Eichner Mathematical Institute, Room 212
gerrit....@math.uni-giessen.de Justus-Liebig-University Giessen
Tel: +49-(0)641-99-32104 Arndtstr. 2, 35392 Giessen, Germany
Fax: +49-(0)641-99-32109 http://www.uni-giessen.de/cms/eichner
---------------------------------------------------------------------

paul brett

unread,
Aug 31, 2015, 12:04:19 PM8/31/15
to peter dalgaard, Gerrit Eichner, R-Help
Hi Peter and Gerrit,
Sorry about my confusion with the results I was
not entirely sure what they were. I was expecting some form of a table and
i didn't realize that with the results of a fisher test, one just gets a
p-value. I had tried the 'estimate' and 'null.value' which gave me a null
value which upon looking again I don't do but I know that now).
Thanks very much for the help, this has been my
5th different statistical package to try and do this test. So I suppose I
had a suspicous/this is too good to be true reaction to the result. I
wasn't entirely sure what it was. Thanks for clearing this up for me.

Thanks again,
Paul

On Mon, Aug 31, 2015 at 9:00 AM, peter dalgaard <pda...@gmail.com> wrote:

>
> > On 30 Aug 2015, at 13:54 , paul brett <brett...@gmail.com> wrote:
> >
> > Fisher's Exact Test for Count Data with simulated p-value (based on 1e+07
> > replicates)
> >
> > data: Trapz
> > p-value = 1e-07
> > alternative hypothesis: two.sided
> >
> >
> > Dispite these chages, the changes equations is not giving me the results
> > for the calculations. The changes I have made seem to satisfy what is in
> > the details section on R, and I don't have the issue of workspace in R.
> > What I do to get the results of the fisher test?
> > Is there something simple that I am missing?
>
> The theory?
>
> There is nothing more to Fisher's test than the calculation of the
> probability of obtaining a table as or less (im-)probable as the one
> observed. This is the p-value. You have done 10 million simulations and not
> found a single table that is less likely than the one observed. Hence, the
> p-value is 1/10 000 001 = ca. 1e-7, counting in the observed table.
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd....@cbs.dk Priv: PDa...@gmail.com
>
>
>
>
>
>
>
>
>

[[alternative HTML version deleted]]
Reply all
Reply to author
Forward
0 new messages