Need a mathematical description of the Fisher contingency table.

ocha...@eng.ucsd.edu

unread,

Nov 18, 2018, 8:20:37 PM11/18/18

to giggle

From the paper, we have the following description of constructing the fisher test:

GIGGLE eliminates this complexity by estimating the significance and enrichment between the query intervals and each indexed interval file with a Fisher's Exact two-tailed test and the odds ratio of a 2 × 2 contingency table containing the number of intervals that are in (i) both the query and indexed file, (ii) solely the query file, (iii) solely the indexed file, and (iv) neither the query file nor the indexed file. The first three values are directly computed with a GIGGLE search, and the last value is estimated by the difference between the union of the two sets and the quotient of the mean interval size of both sets and the genome size.

I would like to be able to recreate the fisher contingency table for given query and index bed files, and it's a bit unclear how I would actually go about calculating i-iv above.

- For (i), is this the number of intervals in query which overlap index? number of intervals in index which overlap query? The sum of both? Not sure how to deal with a case where an interval in query contains 2 or more intervals in index; would this be 1, 2, or 3 "intervals in both the query and indexed file"?

- For (ii) and (iii), this seems fairly easy; (the intervals in query which do not overlap with index) and (the intervals in index which do not overlap with query). basically the output of bedtools intersect -v; please correct me if I'm wrong.

- For (iv), GIGGLE computes as (genome_size / mean_interval_size) - ("union of the two sets"). Again, union seems unclear where intervals can partially overlap; would this be the output of e.g. bedtools merge?

In short, I'd like to be able to recreate the contingency table, given two arbitrary bed files, thanks!

Ryan Layer

unread,

Nov 20, 2018, 5:57:24 AM11/20/18

to ocha...@eng.ucsd.edu, giggle

Does this help?

https://bedtools.readthedocs.io/en/latest/content/tools/fisher.html

--
You received this message because you are subscribed to the Google Groups "giggle" group.
To unsubscribe from this group and stop receiving emails from it, send an email to giggle-discus...@googlegroups.com.
To post to this group, send email to giggle-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/giggle-discuss/2a561ed4-1319-4b65-ae3e-938df318b3d9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ocha...@eng.ucsd.edu

unread,

Nov 23, 2018, 10:49:21 PM11/23/18

to giggle

Hi Ryan,

Thanks for the link. Does this mean that GIGGLE uses bedtools fisher under the hood for its fisher tests, or is there some other software implementation? Asking because I can get discordant results between a GIGGLE test on one hand and a bedtools fisher test on the other.

Ryan Layer

unread,

Nov 23, 2018, 11:20:00 PM11/23/18

to ocha...@eng.ucsd.edu, giggle

It’s a different implantation. There are some rounding issues and some decisions that need to be made about what is zero etc that can cause different results.

To view this discussion on the web visit https://groups.google.com/d/msgid/giggle-discuss/6269249e-0756-4c13-ac17-723195e80b9c%40googlegroups.com.

ocha...@eng.ucsd.edu

unread,

Nov 26, 2018, 2:43:30 PM11/26/18

to giggle

Definitely more than a rounding issue. I'm able to get a highly significant Fisher test with a large odds ratio by other tools (and by hand), where GIGGLE returns a Fisher result in the wrong direction (left vs right) and a zero odds ratio. Think it seems to occur when we have more overlaps than we do regions in the query set (which is possible when the database regions are much smaller than the query regions; so multiple db hits can occur to the same query region). In this case, the hypergeometric test is inappropriate anyway -- we'd probably want a binomial test instead -- but we would expect the hypergeometric test to return p=0, because it's impossible to pull more than n+1 white balls from an urn containing only n white balls in the first place. I'll get around to a repro on Github soon, but this is what I'm observing in my data if you want to take a look.

O

Ryan Layer

unread,

Nov 27, 2018, 9:23:01 PM11/27/18

to ocha...@eng.ucsd.edu, giggle

Can you send over the two bed files?

To view this discussion on the web visit https://groups.google.com/d/msgid/giggle-discuss/1f24f742-363c-4244-bb12-5f2c23b2f0c9%40googlegroups.com.

Reply all

Reply to author

Forward