Interpreting IDR while running with multiple replicates

1,714 views
Skip to first unread message

g.atla

unread,
Sep 21, 2016, 6:28:58 AM9/21/16
to idr-discuss
Dear Al,

I am running the ATAC pipeline from Kundaje lab github page. I have around 15 replicates of ATAC-Seq data. I got a conservative set of peaks from the IDR ( lets say between rep10-rep11). 

But the rep10 and rep11 are sequenced at high depth compared to other samples. I would like to know, if I use the peaks from high depth samples, am I missing the reproducible peaks from low depth samples as they might be reflected as noise in high depth samples ? I would like to know if there any normalisation that takes care of depth issues.

I would like to have a peak set to interrogate the differential open chromatin regions between two conditions. Otherwise I could overlap all the conservative peaks across all possible pairs of replicates and take peaks that are reproducible in at least 50% of comparisons ( very arbitrary )  such that I might end up having most of the conservative peaks. Is this a good idea ?

Hope I am clear. Thanks in advance.

Goutham A

Anshul Kundaje

unread,
Sep 22, 2016, 2:30:48 PM9/22/16
to idr-d...@googlegroups.com
If you are performing differential analysis between 2 conditions you should not use IDR peaks. Instead do the following.

- For each of your conditions, first make sure all your replicates are reasonably equivalent. i.e. Run pairwise comparisons using IDR and if some replicate shows dramatically different results when paired with several other replicates drop those.
- Pool reads from all your replicates for condition 1 and call narrow peaks with MACS2 using a relaxed pvalue threshold of 0.01
- Pool reads from all your replicates for condition 2 and call narrow peaks with MACS2 using a relaxed pvalue threshold of 0.01 
- Take a union of these peaks from both conditions i.e. a set of merged peak coordinates across both conditions.
- Now for each of these merged regions, obtain read counts from all replicates of condition 1 (after matching their sequencing depth reasonably well). E.g. you can pool some of your low depth samples into a single high depth replicate so that it reasonably matches the other high depth replicates
- For each of these merged regions, obtain read counts from all replicates of condition 2 (after matching their sequencing depth reasonably well). E.g. you can pool some of your low depth samples into a single high depth replicate so that it reasonably matches the other high depth replicates
- Feed these into DESeq or EdgeR to obtain differential calls.

There are variations of this protocol that could work better or worse but this one should do well enough. It is not recommended to use replicates with very large differences in depth. Depth normalization strategies do not work in such situations where samples are far from saturation.

-Anshul.



--
You received this message because you are subscribed to the Google Groups "idr-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to idr-discuss+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sylvia Shiah

unread,
Dec 8, 2016, 10:11:22 AM12/8/16
to idr-discuss
Hi Anshul,

Sorry, this might be an obvious question, but I was wondering if my samples are biological replicates can I still pool reads from all three replicates for condition 1 and all three replicates for condition 2 and then run MACS2 for each? (and when you said pool reads you mean something like combing the bam files after alignment?) I was wondering if you have a pipeline or example code on how to run what you just describe here? like taking a union of the peaks for both conditions and then for each merged regions obtain read counts for each conditions?

Best,
Sylvia
To unsubscribe from this group and stop receiving emails from it, send an email to idr-discuss...@googlegroups.com.

Anshul Kundaje

unread,
Dec 8, 2016, 11:59:35 AM12/8/16
to idr-d...@googlegroups.com
You can use our automated pipeline implementation here https://github.com/kundajelab/chipseq_pipeline

To unsubscribe from this group and stop receiving emails from it, send an email to idr-discuss+unsubscribe@googlegroups.com.

Sylvia Shiah

unread,
Dec 8, 2016, 2:03:55 PM12/8/16
to idr-discuss
Hi Anshul,

Thank you for the quick reply! I thought you mentioned if I'm doing differential analysis between 2 conditions I should not use the IDR peaks? and this pipeline gives me the IDR peak in the end?

Best,
Sylvia

Anshul Kundaje

unread,
Dec 8, 2016, 3:10:22 PM12/8/16
to idr-d...@googlegroups.com
It will also give you relaxed peak sets.

-Anshul.

To unsubscribe from this group and stop receiving emails from it, send an email to idr-discuss+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages