Negative peak shift in positive control region

KL

unread,

Aug 14, 2013, 11:28:31 AM8/14/13

to chi...@googlegroups.com

Hi Anton,

I've got a rather specific question regarding QuEST and results with my dataset. I've been very happy using QuEST to analyze my ChIP-Seq data in bacteria, as I think that being able to optimize the parameters for my data is ideal and I appreciate the documentation and support you have provided via the Google Group.

With respect to my data, I am looking at a large number of datasets so I won't be able to validate all the peaks, but I have validated a small number and have seen good correlation between RT-PCR and ChIP-Seq. Additionally, we have very few (almost no) regions to serve as a positive control for enrichment- that is to say, we don't actually have much data about where DNA-associated proteins are located on the chromosome.

In order to be rigorous with respect to my data analysis, I perform each ChIP-Seq experiment in biological triplicate, align with bowtie2, and use QuEST to find peaks. Because I am working with the tiny chromosome of a bacterium and getting millions of reads back, we see significant background over the entire genome. As a result, I keep the requirement for fold enrichment low, (1.25 seeding, 1.5 fold over background, 1.25 extension). I then look through the peaks (from peak_caller.ChIP.out.with_metrics, ignoring the "region" data), and require that they have a positive peak shift, positive strand correlations, and a -log10q value >= 2. If a peak matches those parameters in 2/3 samples, I consider it a called peak.

So while I have used QuEST to analyze all my data, and that data seem consistent with what we would expect (and there has been some validation), I have just realized that one of my few potential positive control regions was not called as a peak. That is to say, we know a particular protein regulates expression of itself and would expect it to be found at its own promoter. ChIP-Seq of this protein does result in enrichment at the promoter site, but the issue is that the peak shift is negative (note that the average peak shift in my experiments tends to be around 130). These are the results for the peak region called around that promoter region, in three biological replicates:

A:
R-24 xxxxxx 424810-425186 ChIP: 15.8725 control: 1.4801 max_pos: 424988 ef: 10.7239 ChIP_tags: 5373 background_tags: 6419 tag_ef: 5.4187 ps: 120 cor: 0.356835 -log10_qv: 5883.77 -log10_pv: 5886.19 qv_rank: 36
P-24-1 xxxxxx 424988 ChIP: 15.8725 control: 1.4801 region: 424810-425187 ef: 10.7239 ps: 119 cor: 0.548702 -log10_qv: 6678.5 -log10_pv: 6680.92 qv_rank: 24
B:
R-25 xxxxxx 424807-425187 ChIP: 14.6417 control: 1.45236 max_pos: 424979 ef: 10.0813 ChIP_tags: 4031 background_tags: 6702 tag_ef: 4.94983 ps: -163 cor: 0.271332 -log10_qv: 3923.51 -log10_pv: 3925.91 qv_rank: 38
P-25-1 xxxxxx 424979 ChIP: 14.6417 control: 1.45236 region: 424807-425188 ef: 10.0813 ps: -158 cor: 0.465295 -log10_qv: 4628.29 -log10_pv: 4630.68 qv_rank: 25
C:
R-25 xxxxxx 424781-425199 ChIP: 14.9932 control: 1.50569 max_pos: 424990 ef: 9.95771 ChIP_tags: 5959 background_tags: 7012 tag_ef: 5.22367 ps: 132 cor: 0.229571 -log10_qv: 6125.94 -log10_pv: 6128.38 qv_rank: 34
P-25-1 xxxxxx 424990 ChIP: 14.9932 control: 1.50569 region: 424781-425200 ef: 9.95771 ps: -181 cor: 0.414566 -log10_qv: 6029.09 -log10_pv: 6031.53 qv_rank: 25

It is surprising to me that such good enrichment would be an artifact- which is of course, possible. But when I look at the distribution of the forward and reverse sequencing reads (image attached, around regions 24 or 25, red lines are forward reads, blue are reverse), I don't understand why sample A has a positive peak shift and samples B and C don't- qualitatively, the relative abundances of the forward and reverse reads look similar to me!

Can you possibly suggest a reason QuEST is calling a negative peak shift in this region or a way to modify this parameter? I think I may have to use a different peak calling algorithm if one of the only control regions we have is not called, and I would much rather not switch at this point! I appreciate any assistance you could offer.

PeakShift2.jpg

Anton

unread,

Aug 14, 2013, 12:21:11 PM8/14/13

to chi...@googlegroups.com

Hi KL,

The ps is mainly intended as a filter, to reduce the false positive rate. It's hard to tell exaclty why the peak shift is negative here, since I don't see separately + and - read densities. Because the ps for peak/region is calculated after the peak is called, it has no effect on the global peak shift which is what is important for proper peak calling. As long as the global peak shift is correct, the local peak shift doesn't matter for the peak calling. By changing the "region size" in the QuEST configuration, you may be able to get a more suitable region size for calculation of local peak shift, and that may help.

I would suggest to just ignore the local peak shift, which shouldn't be a problem if you have good control data.

KL

unread,

Aug 14, 2013, 3:09:07 PM8/14/13

to chi...@googlegroups.com

Hi Anton,

Thanks for the quick reply. We've found the peakshift filter to be quite effective and would like to keep using it. I've attached another image which has the read density on both positive (green) and negative (blue) strands for samples A (positive peak shift) and B (negative peak shift). To my eye, it seems quite clear that in order to maximize overlap between reads, this region should have a positive peak shift for both samples, not just sample A. I'm just finding it very troubling that what appears to be obvious by eye isn't registering mathematically. Given these separate read densities, do you have an idea of what is going on?

Thanks!

PeakShift3.tiff

Anton

unread,

Aug 16, 2013, 11:41:00 AM8/16/13

to chi...@googlegroups.com

KL,

this may indeed be a bug of some sort. Can you share with me (in an email) a QuEST file with alignment coordinates via dropbox?

I'll try to get that fixed.

Reply all

Reply to author

Forward