Hi,
I have used MACS2 for peak calling with a pvalue cutoff of 0.1. In the ENCODE ATAC-seq pipeline, they mention taking the top 500,000 peaks for IDR analysis. However, my samples only have between 160,363 and 241,300 peaks total.
Is it best to sort the peaks by pvalue (column 8 of the narrowPeaks file) and then take the top e.g. 100,000 or 150,000 peaks into IDR analysis, or should I keep all of the peaks for IDR? If it is best to take only the top n peaks, how would I decide on a suitable value for n and should this be equivalent across all files? What is the purpose of filtering the peaks prior to IDR?
Many thanks,
Lucy