Hi there,
I'd like to create a database of high confidence enhancers, by retrieving only those that overlap in at least 2/3 of my datasets.
For example, I have 6 datasets of heart enhancers, and would like to retrieve those that appear in at least 4.
I created a reference dataset of all enhancers, and intersected this reference file with each dataset using parameter -c, to get the number of overlaps of each enhancer with all my datasets. I then retrieved those enhancers with # of overlaps > 4. However, that means that large enhancers that overlap with several smaller enhancers are being counted as high confidence enhancers, although they may appear in only one dataset.
Any help on how to fix this is much appreciated.