Low peak call for spp-IDR peak calling

116 views
Skip to first unread message

Paul Finn

unread,
Aug 28, 2017, 7:58:58 PM8/28/17
to idr-discuss
Hello,

I am slightly confused on the peak calling command:

Rscript run_spp_nodups.R -c=chipSampleRep1.tagAlign.gz -i=controlSampleRep0.tagAlign.gz -npeak=300000 -odir=/peaks/reps -savr -savp -rf -out=/stats/phantomPeakStatsReps.tab

I used the above command on my data but do not come anywhere near calling 150 - 300k peaks.  Here is example output:

################
ChIP data: /path/to/file/AP.tagAlign.gz
Control data: /path/to/file/Input.tagAlign.gz
strandshift(min): -500
strandshift(step): 5
strandshift(max) 1500
user-defined peak shift NA
exclusion(min): 10
exclusion(max): NaN
num parallel nodes: 4
FDR threshold: 0.01
NumPeaks Threshold: 3e+05
Output Directory: /path/to/folder/spp/peaks/reps
narrowPeak output file name: NA
regionPeak output file name: /path/to/file/spp/peaks/reps/AP.tagAlign_VS_Input.tagAlign.regionPeak
Rdata filename: NA
plot pdf filename: /path/to/file/spp/peaks/reps/AP.tagAlign.pdf
result filename: /path/to/file/spp/stats/_AP-phantomPeakStatsReps.tab
Overwrite files?: TRUE Decompressing ChIP file
Decompressing control file
Loading required package: caTools
Reading ChIP tagAlign/BAM file /path/to/file/AP.tagAlign.gz
opened /tmp/RtmpzJyRih/AP.tagAlign10e3418b6efa
done. read 13635712 fragments
ChIP data read length 50
[1] TRUE
Reading Control tagAlign/BAM file /path/to/file/Input.tagAlign.gz
opened /tmp/RtmpzJyRih/Input.tagAlign10e35aee9044
done. read 32294683 fragments
Control data read length 50
Loading required namespace: Rmpi
4 slaves are spawned successfully. 0 failed.
Calculating peak characteristics
[1] 1
Minimum cross-correlation value 0.04405169
Minimum cross-correlation shift 1500
Top 3 cross-correlation values 0.0576014765287295,0.0575764175222504
Top 3 estimates for fragment length 185,195
Window half size 300
Phantom peak location 50
Phantom peak Correlation 0.05390492
Normalized Strand cross-correlation coefficient (NSC) 1.307588
Relative Strand Cross correlation Coefficient (RSC) 1.375162
Phantom Peak Quality Tag 1
null device
1
Removing read stacks
4 slaves are spawned successfully. 0 failed.
Finding peaks
finding background exclusion regions ... done
determining peaks on provided 1 control datasets:
using reversed signal for FDR calculations
bg.weight= 32.95003 excluding systematic background anomalies ... done
determining peaks on real data:
bg.weight= 0.03034898 excluding systematic background anomalies ... done
calculating statistical thresholds
FDR 0.99 threshold= 2.000461
Detected 15288 peaks


I'm not sure how to proceed, are enough peaks called to move forward for IDR analysis?  Are there other options to call to increase the number of peaks?  

When I modified the command to include -fdr=0.1 However, I get the same output with one exception: line 11: FDR Threshold: 0.1

Also, I am confused by the -npeak threshold option.  Does this option indicate the max number of peaks to be called or does it tell the program to call 300k peaks.   

Any help would be great!

Thanks,

Paul 

Anshul Kundaje

unread,
Aug 28, 2017, 8:09:10 PM8/28/17
to idr-d...@googlegroups.com
Npeak 300K is the maximum number of peaks if it's able to find them. You should go ahead with IDR and check how many peaks you end up with. Your data looks decent from the QC scores. Your TF may just have few binding sites. 

Anshul 

Anshul 

--
You received this message because you are subscribed to the Google Groups "idr-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to idr-discuss+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Anshul Kundaje

unread,
Aug 28, 2017, 8:09:48 PM8/28/17
to idr-d...@googlegroups.com
Btw what genome is this.

Anshul 

Paul Finn

unread,
Aug 29, 2017, 8:16:29 PM8/29/17
to idr-discuss
Hi Anshul,

Thanks for the reply!

I aligned reads to hg19 with bowtie -m 1 -- best

Paul
To unsubscribe from this group and stop receiving emails from it, send an email to idr-discuss...@googlegroups.com.

Paul Finn

unread,
Aug 30, 2017, 12:53:00 PM8/30/17
to idr-discuss
Hi Anshul,

Should I expect to have a large difference between the number of peaks called depending on if the sample is a Rep, PseudoRep, Pooled or PseudoPool?  Or should the number of peaks be similar?  

Note: All peaks are called against the same pooled input file from my two replicates.

Example data: # Peaks called

Replicates

Rep1:  15,288
Rep2:  14,839
IDR (0.05):  145

PseudoReplicates

Rep1.pr1:  25,965
Rep1.pr2:  26,219
IDR (0.05):  850

Rep2.pr1:  30,334
Rep2.pr2:  29,929
IDR (0.05):  693

Pooled

AP_Pooled:  69,385

PseudoPool  

AP_Pooled.pr1:  111,696 
AP_Pooled.pr2:  111,247
IDR (0.01):  1339

Thanks,

Paul 

Anshul Kundaje

unread,
Aug 30, 2017, 12:58:52 PM8/30/17
to idr-d...@googlegroups.com
This is all consistent. The raw peaks are all in reasonably similar range and the number of IDR peaks are also in similar range. The TF seems to have very few binding sites. Not sure if this is its biology or some issue with the experiment. 

What do the signal tracks look like at these peaks and elsewhere in the genome.

Also was this single end or paired end and what was the read length. 

Anshul 

To unsubscribe from this group and stop receiving emails from it, send an email to idr-discuss+unsubscribe@googlegroups.com.

Anshul Kundaje

unread,
Aug 30, 2017, 1:06:04 PM8/30/17
to idr-d...@googlegroups.com
Also worth running this data through Macs2 + IDR using the same procedure. If your TF has more diffuse peaks SPP will not do a good job with them. It's designed for point binding behavior. 

If both peak callers give very few peaks post IDR, that's a problem with the dataset or a TF with very few binding sites (less likely).

Anshul 
Reply all
Reply to author
Forward
0 new messages