Pooled Peak List

erfan sharifi

unread,

Apr 28, 2022, 5:29:09 PM4/28/22

to idr-discuss

Hi,

I am working on Chip-seq data for evaluating the consistency between replicate using idr. I had some questions about ${POOLED_PEAK_FILE}.

Where can we find the information to generate the ${POOLED_PEAK_FILE} used as a peak-list for the IDR analysis (for TFs analysis)? We have not found the complete description of this file in the ENCODE3 pipeline v1 specifications (https://docs.google.com/document/d/1lG_Rd7fnYgRpSIqrIfuVlAz2dW1VaSQThzk836Db99c/edit#). Would this be a concatenation of ${REP1_PEAK_FILE} and ${REP2_PEAK_FILE} follow by the same procedure used for histone (described at the end of the pipeline):

# For narrowPeak files

# ======================

# Find pooled peaks that overlap Rep1 and Rep2 where overlap is defined as the fractional overlap wrt any one of the overlapping peak pairs >= 0.5

intersectBed -wo -a Pooled.narrowPeak.gz -b Rep1.narrowPeak.gz |

awk 'BEGIN{FS="\t";OFS="\t"}{s1=$3-$2; s2=$13-$12; if (($21/s1 >= 0.5) || ($21/s2 >= 0.5)) {print $0}}' | cut -f 1-10 | sort | uniq |

intersectBed -wo -a stdin -b Rep2.narrowPeak.gz |

awk 'BEGIN{FS="\t";OFS="\t"}{s1=$3-$2; s2=$13-$12; if (($21/s1 >= 0.5) || ($21/s2 >= 0.5)) {print $0}}' | cut -f 1-10 | sort | uniq > PooledInRep1AndRep2.narrowPeak.gz

And thus

Pooled.narrowPeak.gz = concatenation of ${REP1_PEAK_FILE} and ${REP2_PEAK_FILE}

PooledInRep1AndRep2.narrowPeak.gz = ${POOLED_PEAK_FILE}

??

We, also, generated ${POOLED_PEAK_FILE} using the cat command as following, but the output was not as we expected:

Cat {REP1_PEAK_FILE} {REP2_PEAK_FILE} > {POOLED_PEAK_FILE}

It would be a great help if you provided us more detail about the way we can generate ${POOLED_PEAK_FILE}.

Regards,

Erfan

Anshul Kundaje

unread,

Apr 29, 2022, 11:56:54 AM4/29/22

to idr-d...@googlegroups.com

You simply pool reads from all replicates (tagAligns/BAMs) and repeat the peak calling process on all the pooled reads using the same commands. That will give you a pooled peak list.

-Anshul.

--
You received this message because you are subscribed to the Google Groups "idr-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to idr-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/idr-discuss/d18f98b9-69ba-446c-8aa4-27f5bb6edb31n%40googlegroups.com.

erfan sharifi

unread,

Apr 30, 2022, 10:23:56 PM4/30/22

to idr-discuss

Thanks for your respond!

the other things that is a little bit strange is that when I run the idr with and without --use-best-multisummit-IDR, number of peaks passing IDR cutoff is the same!!

condition A(two replicates)

with --use-best-multisummit-IDR

Number of reported peaks - 26680/26680 (100.0%)
Number of peaks passing IDR cutoff of 0.05 - 9231/26680 (34.6%)

without --use-best-multisummit-IDR

Number of reported peaks - 26680/26680 (100.0%)
Number of peaks passing IDR cutoff of 0.05 - 9231/26680 (34.6%)

I also try this with a different condition( condition B) but using or not using --use-best-multisummit-IDR produce the same result!

condition B(two replicates)

with --use-best-multisummit-IDR

Number of reported peaks - 35682/35682 (100.0%)
Number of peaks passing IDR cutoff of 0.05 - 6910/35682 (19.4%)

without --use-best-multisummit-IDR

Number of reported peaks - 35682/35682 (100.0%)
Number of peaks passing IDR cutoff of 0.05 - 6910/35682 (19.4%)

The commands that use are as following,

nohup idr --samples replicate1 replicateb --peak-list pooledA&B.narrowPeak --input-file-type narrowPeak --rank p.value --output-file ConditionA.txt --plot --log-output-file condiionA.out --soft-idr-threshold 0.05 --use-best-multisummit-IDR &

I would appreciate if you have any idea why tool produces same output..

Regrads,

Erfan

Reply all

Reply to author

Forward