IDR - Pooled Peak files

came...@cardiff.ac.uk

unread,

Sep 7, 2017, 11:47:01 AM9/7/17

to idr-discuss

Hi all,

I'm currently generating a script for an ATAC-seq analysis using the ATAC-seq pipeline v1 spec as a guide. I have 4 replicates and aligned using bowtie2 (to both hg19 and hg38) and called peaks with MACS2. My question relates to IDR, and in particular, the optional requirement of including a pooled peak file for the analysis.

I want to perform an IDR analysis on the narrowPeak files I've generated using the command for true replicates on page 23 of the pipeline:

idr --samples ${REP1_PEAK_FILE} ${REP2_PEAK_FILE} --peak-list ${POOLED_PEAK_FILE} --input-file-type narrowPeak --output-file ${IDR_OUTPUT} --rank p.value --soft-idr-threshold ${IDR_THRESH} --plot --use-best-multisummit-IDR

The purpose of the pooled peak file parameter in this command is unclear, and some explanation on the information that is gained/lost by its inclusion/exclusion would be helpful, i.e. why do we need it? My understanding is that this parameter is optional. If IDR is the most stringent method for replicate concordance, what statistical benefits are gained by including pooled information. Additionally, if I have 4 replicates, should my pooled peak file contain information from all 4 replicates for each pairwise comparison, i.e. do I only need one pooled peak file for all comparisons, or do I need a bespoke pooled peak file for each pairwise comparison I wish to carry out?

Any explanation/advice would be most welcome.

Many Thanks,

Darren

Anshul Kundaje

unread,

Sep 7, 2017, 8:13:50 PM9/7/17

to idr-d...@googlegroups.com

The pooled peaks provide a unified set of peak coordinates (obtained by leveraging the power from all replicates) to evaluate reproducibility between replicates. If you dont provide a pooled peak file, then there has to be some adhoc strategy to decide how to map a peak in one replicate to a peak in another replicate. This requires some arbitrary decision on how to align and merge peaks between the replicates because they more than often do not have exactly the same coordinates in the two replicates. IDR can still work without the pooled peaks but the output is not as interpretable i.e. the coordinates are neither from rep1 nor rep2 nor the pooled set. Also if replicate differ reasonably (which happens more often that one might like), then the arbitrary peak matching algorithms can do strange things.

-A

--
You received this message because you are subscribed to the Google Groups "idr-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to idr-discuss+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

came...@cardiff.ac.uk

unread,

Sep 8, 2017, 4:28:31 AM9/8/17

to idr-discuss

That makes sense. I'll pool all 4 replicates for my IDR Analysis.

Many Thanks Anshul, for the quick and detailed response.

Best,

Darren

To unsubscribe from this group and stop receiving emails from it, send an email to idr-discuss...@googlegroups.com.

Reply all

Reply to author

Forward