Hello,
The goal of running ChIP-Seq pipeline for me was to figure out the binding regions of BZW1, HA-BZW1 and HA-BZW2 to the nearest gene regions. For this purpose I have 3 biological replicates per sample and 3 Input samples (1 each). I am keen on finding a ranking list of gene regions per sample which can later be validated and hoped to use IDR to generate reproducible peaks for the same as I have replicates. I ran the ENCODE-ChIP Seq pipeline in "tf" mode (no changes to defaults).
I have a couple of questions regrading the output and its interpretation:
1) The peak files obtained are suffixed with "regionPeak" and not "narrowPeak". Could you help me understand why this would be the case? The output of call-reproducibility_idr has 10 columns and not 9 (as is mentioned in the format description for broad/region peak). Here is an
excerpt from the output.
2) The IDR plots of all 3 samples show very little black dots as compared to the tutorials I have found online. Why would this be the case? Is it because the regions found from this protocol are broad instead of narrow? I have attached a plot here for reference.
3) While I understand the conservative peaks refer to peaks only from the replicates and optimal peaks come from both replicates and pooled samples, I was hoping to understand which set to consider for downstream analysis as there are very few black dots in my case in the IDR plot?
I have attached the qc report for one of the samples BZW1 with this thread in case you may find it useful to help me.
Thank you,
Asma