Aberrations in peak calling for deeply sequenced replicate data sets

Skip to first unread message

Sara Knaack

Aug 18, 2014, 11:51:12 AM8/18/14
to mosaics_u...@googlegroups.com
Dear MOSAiCS users, 

I have a quick question about an aberration I'm dealing with in calling peaks on two data replicate data sets. I have two data sets with high (1.8 reads/bp) sequencing depth in yeast experiment (a genome of about 12.5 Mbp). There are two replicates of this data and there are two points that I am finding strange  about the MOSAiCS results I'm getting. 

1. The number of peaks called in the two replicate data sets are nearly two fold different; 318 vs. 564 peaks. These data otherwise appear to be very similar (0.98 and better correlation for 200 bp bin read counts). The BIC scores are 652125.6 and 687859.1, respectively. I'm using the probTrunc=0.08 option in the fit function and the bgEst="rMOM" option as recommended in the manual. Somehow the fit results are different enough on these two called peak sets simply don't agree well, they are only consistent to the level of an f-score of 0.57.

Is there a way to seed the fit of the second replicate with the better fit results from the first replicate, to see if I can improve on the results from the second replicate to be more like those from the first replicate? Or should the fit for the second replicate already converge to an equally good result if it were able to given that I've used the same fit options? How likely is it I can reconcile this discrepancy in the fits, in other words?

2. One aberrant feature of this data is that there is an anomalously high number of reads on a specific chromosome compared to the majority of the genome, in a roughly flat distribution, too, not simply as associated with large, significant peaks. This seems to be the case in both the ChIP and Input data sets of both replicate experiments. When excluding this particular chromosome from the BIC scores become 581460.6 and 592073.5, respectively. The general picture is the same and in fact the numbers of peaks become 152 and 462, respectively, so the difference in the peak calls is even worsened, even though the data seems to be quite different. 

Has anyone encountered such a single-chromosome aberration and how did you address it? I may call peaks on that chromosome and the rest of the genome separately, for example. But right now, the remaining genome itself produces quite different peak sets across the two replicates.

I'd be greatly appreciative of any shared experiences or insight into how one can resolve this kind of an aberration. 

Thanks very much,

Reply all
Reply to author
0 new messages