Enhancers overlapping in at least 2/3 of datasets

49 views

Skip to first unread message

Daiane Hemerich

unread,

Dec 11, 2021, 11:39:30 AM12/11/21

to bedtools-discuss

Hi there,

I'd like to create a database of high confidence enhancers, by retrieving only those that overlap in at least 2/3 of my datasets.

For example, I have 6 datasets of heart enhancers, and would like to retrieve those that appear in at least 4.

I created a reference dataset of all enhancers, and intersected this reference file with each dataset using parameter -c, to get the number of overlaps of each enhancer with all my datasets. I then retrieved those enhancers with # of overlaps > 4. However, that means that large enhancers that overlap with several smaller enhancers are being counted as high confidence enhancers, although they may appear in only one dataset.

Any help on how to fix this is much appreciated.

Thank you,

Aaron Quinlan

unread,

Dec 12, 2021, 10:32:13 AM12/12/21

to bedtools...@googlegroups.com

I would recommend using the multiinter tool to find regions that are shared by at least M (in your case 4) of N datasets. One note is that each of your input files must be sorted by chrom and start (sort -k1,1 -k2,2n for BED files).

https://bedtools.readthedocs.io/en/latest/content/tools/multiinter.html

Please let me know if that does not address your question.

Aaron

On Dec 11, 2021, at 09:39, Daiane Hemerich <daianeh...@gmail.com> wrote:

Hi there,

--
You received this message because you are subscribed to the Google Groups "bedtools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bedtools-discu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bedtools-discuss/11803617-c7a0-4e08-9e85-1854b598fad5n%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages