Should I filter MACS2 peaks before IDR?

Lucy

unread,

Sep 23, 2018, 12:10:40 PM9/23/18

to idr-discuss

Hi,

I have used MACS2 for peak calling with a pvalue cutoff of 0.1. In the ENCODE ATAC-seq pipeline, they mention taking the top 500,000 peaks for IDR analysis. However, my samples only have between 160,363 and 241,300 peaks total.

Is it best to sort the peaks by pvalue (column 8 of the narrowPeaks file) and then take the top e.g. 100,000 or 150,000 peaks into IDR analysis, or should I keep all of the peaks for IDR? If it is best to take only the top n peaks, how would I decide on a suitable value for n and should this be equivalent across all files? What is the purpose of filtering the peaks prior to IDR?

Many thanks,

Lucy

Anshul Kundaje

unread,

Sep 23, 2018, 12:21:52 PM9/23/18

to idr-d...@googlegroups.com

Use all of them.

Sometimes MACS2 will call a huge number of peaks > 500. We suggest thresholding to max 500K peaks for speed without any loss of signal.

-A

--
You received this message because you are subscribed to the Google Groups "idr-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to idr-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Lucy

unread,

Sep 23, 2018, 12:32:35 PM9/23/18

to idr-discuss

Ok great, thanks very much.

Mariangela

unread,

Jan 10, 2019, 9:49:05 AM1/10/19

to idr-discuss

Hi Anshul,

Can I be confident by using the MACS2 output obtained with the following line for the next IDR analysis?

macs2 callpeak -t ${aligDir}/${rep1} -c ${aligDir}/${ctrl} --format BAMPE --name "regular_noModel_rep1" --outdir "${peakDir}" -g hs --keep-dup=auto --bdg --nomodel --extsize 200 -q 0.05 --call-summits