about statistical significance of the overlap

106 views
Skip to first unread message

Bogdan Tanasa

unread,
Sep 14, 2020, 5:49:48 PM9/14/20
to bedtools-discuss
Dear all, please would you remind me : 

-- which tool assesses the statistical significance of the overlap between 2 sets (or > 2 sets of peaks) and offers a p-value ?

-- and, do we have to consider 1000 random sets of peaks too, shall we desire to compute a Z-score of the overlap ?

thanks a lot, 

-- bogdan 


Aaron Quinlan

unread,
Sep 15, 2020, 10:28:54 AM9/15/20
to bedtools...@googlegroups.com
Hi Bogdan,

-- which tool assesses the statistical significance of the overlap between 2 sets (or > 2 sets of peaks) and offers a p-value ?

The bedtools “fisher” command approximates this using a Fisher’s exact test. However, it works with 2 datasets.
If you want to use this, I would clone the bedtools repository from Guthub, as John Marshall and others have recently added fixes to this tool that address some previous numerical instability.


-- and, do we have to consider 1000 random sets of peaks too, shall we desire to compute a Z-score of the overlap ?
For multiple sets, you could consider the GIGGLE approach: 

Or you could do a Monte Carlo simulation with the bedtools shuffle command. The challenge is often defining which subset of the genome is germane to your analysis.

Overall, this is what I would consider an unsolved problem. Recommended reading includes:



Aaron
--
You received this message because you are subscribed to the Google Groups "bedtools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bedtools-discu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bedtools-discuss/61ca6462-dc5b-4355-bf9e-07e022b7d1ean%40googlegroups.com.

Bogdan Tanasa

unread,
Sep 15, 2020, 7:58:00 PM9/15/20
to bedtools-discuss
Dear Aaron, thanks a lot ! your suggestions are very very helpful !

Klaas Vandepoele

unread,
Dec 7, 2020, 11:37:45 AM12/7/20
to bedtools-discuss
Dear Aaron, thanks for this interesting thread. One question, or better a feature request: to generate 1000 permutations we now run bedtools shuffle 1000x with the same input file. Would it be possible to add an option so we can specify how many output files we want (i.e. preventing we need to read in the same input file 1000x)?

Thanks & kind regards,
Klaas 

Op woensdag 16 september 2020 om 01:58:00 UTC+2 schreef Bogdan Tanasa:

Aaron Quinlan

unread,
Dec 8, 2020, 11:37:24 AM12/8/20
to bedtools...@googlegroups.com
This is an interesting idea. Let me have a look at the complexity and potential speed increases this might offer.

Reply all
Reply to author
Forward
0 new messages