Pooled Peak List

138 views
Skip to first unread message

erfan sharifi

unread,
Apr 28, 2022, 5:29:09 PM4/28/22
to idr-discuss

Hi, 

I am working on Chip-seq data for evaluating the consistency between replicate using idr. I had some questions about ${POOLED_PEAK_FILE}.

Where can we find the information to generate the ${POOLED_PEAK_FILE} used as a peak-list for the IDR analysis (for TFs analysis)? We have not found the complete description of this file in the ENCODE3 pipeline v1 specifications (https://docs.google.com/document/d/1lG_Rd7fnYgRpSIqrIfuVlAz2dW1VaSQThzk836Db99c/edit#). Would this be a concatenation of ${REP1_PEAK_FILE} and  ${REP2_PEAK_FILE} follow by the same procedure used for histone (described at the end of the pipeline):

 

# For narrowPeak files

# ======================

# Find pooled peaks that overlap Rep1 and Rep2 where overlap is defined as the fractional overlap wrt any one of the overlapping peak pairs  >= 0.5

 

intersectBed -wo -a Pooled.narrowPeak.gz -b Rep1.narrowPeak.gz | 

awk 'BEGIN{FS="\t";OFS="\t"}{s1=$3-$2; s2=$13-$12; if (($21/s1 >= 0.5) || ($21/s2 >= 0.5)) {print $0}}' | cut -f 1-10 | sort | uniq | 

intersectBed -wo -a stdin -b Rep2.narrowPeak.gz | 

awk 'BEGIN{FS="\t";OFS="\t"}{s1=$3-$2; s2=$13-$12; if (($21/s1 >= 0.5) || ($21/s2 >= 0.5)) {print $0}}' | cut -f 1-10 | sort | uniq > PooledInRep1AndRep2.narrowPeak.gz

 

And thus

Pooled.narrowPeak.gz = concatenation of ${REP1_PEAK_FILE} and ${REP2_PEAK_FILE}

PooledInRep1AndRep2.narrowPeak.gz = ${POOLED_PEAK_FILE}

 ??

 

We, also, generated ${POOLED_PEAK_FILE} using the cat command as following, but the output was not as we expected:

Cat {REP1_PEAK_FILE} {REP2_PEAK_FILE} > {POOLED_PEAK_FILE}

It would be a great help if you provided us more detail about the way we can generate ${POOLED_PEAK_FILE}.

Regards,

 

Erfan

Anshul Kundaje

unread,
Apr 29, 2022, 11:56:54 AM4/29/22
to idr-d...@googlegroups.com
You simply pool reads from all replicates (tagAligns/BAMs) and repeat the peak calling process on all the pooled reads using the same commands. That will give you a pooled peak list.

-Anshul.

--
You received this message because you are subscribed to the Google Groups "idr-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to idr-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/idr-discuss/d18f98b9-69ba-446c-8aa4-27f5bb6edb31n%40googlegroups.com.

erfan sharifi

unread,
Apr 30, 2022, 10:23:56 PM4/30/22
to idr-discuss
Thanks for your respond!

the other things that is a little bit strange is that when I run the idr with and without --use-best-multisummit-IDR, number of peaks passing IDR cutoff is the same!!

condition A(two replicates)

with --use-best-multisummit-IDR

Number of reported peaks - 26680/26680 (100.0%)
Number of peaks passing IDR cutoff of 0.05 - 9231/26680 (34.6%)

without --use-best-multisummit-IDR
Number of reported peaks - 26680/26680 (100.0%)
Number of peaks passing IDR cutoff of 0.05 - 9231/26680 (34.6%)

I also try this with a different condition( condition B) but using or not using  --use-best-multisummit-IDR produce the same result!

condition B(two replicates)

with --use-best-multisummit-IDR
Number of reported peaks - 35682/35682 (100.0%)
Number of peaks passing IDR cutoff of 0.05 - 6910/35682 (19.4%)

without --use-best-multisummit-IDR
Number of reported peaks - 35682/35682 (100.0%)
Number of peaks passing IDR cutoff of 0.05 - 6910/35682 (19.4%)

The commands that use are as following, 

nohup idr --samples replicate1 replicateb --peak-list pooledA&B.narrowPeak --input-file-type narrowPeak --rank p.value --output-file ConditionA.txt --plot --log-output-file condiionA.out --soft-idr-threshold 0.05  --use-best-multisummit-IDR &

I would appreciate if you have any idea why tool produces same output..

Regrads, 

Erfan
Reply all
Reply to author
Forward
0 new messages