v-phaser strand bias test - general question

Richard Orton

未讀，

2014年5月23日上午9:21:0623/5/2014

收件者︰ viral-to...@googlegroups.com

Hi,

We have been using v-phaser to detect variants in viral samples.

We have a general question about the strand bias test.

We have a variant that sort of looks like this:

Postion 100

Reference Base T: Forward 800 Reverse 100

Variant Base C: Forward 100 Reverse 80

So this is an example (couldn't find the exact numbers). But that fails that V-Phaser strand bias test. But the variant base does not look strand-biased?

Is it because the strand bias test (Fishers?) compares the bias in the variant base to the bias in the reference base. And in this case although the variant does not appear strand-biased, the reference base is strand-biased, and the fact that the variant bias does not then reflect the reference bias means it fails. But it the variant was more strand-biased it would pass, as it would more reflect the reference bias.

Hope I've explained that well enough, just checking we understood what was going on, as the variant base did not appear strand biased.

Cheers,

Richard

Xiao Yang

未讀，

2014年5月23日下午12:11:5523/5/2014

收件者︰ viral-to...@googlegroups.com

Hi Richard,

Sorry I did not understand your question. Do you mean this example is strand-biased or not strand biased ? V-Phaser 2 just use the simple fisher's exact test plus FDR correction. Using your example and input in some online fisher's exact test tool, I got back "The two-tailed P value is less than 0.0001", so this appears to be strand biased without FDR.

--
You received this message because you are subscribed to the Google Groups "Broad Viral Tool Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to viral-tool-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
- Xiao

Richard Orton

未讀，

2014年6月2日上午10:49:152/6/2014

收件者︰ viral-to...@googlegroups.com

Hi,

Sorry my explanation was pretty bad. Here are some examples.

Example 1

Pos: 1234: Reference Base: 1000, 1000 (Forward, Reverse)

Pos: 1234: Variant Base: 100, 10 (Forward, Reverse)

In this example, the variant base is clearly biased, there is a clear strand bias, so it would fail the test.

Example 2:

Pos: 4321: Reference Base: 1000, 200 (Forward, Reverse)

Pos: 4321: Variant Base: 100,100 (Forward, Reverse)

In this example, the variant base is clearly not biased - it has equal number of reads in the forward and reverse direction. However, this will fail the strand bias test, even though it isn't strand biased. It seems that the test compares the bias in the variant base, to the bias in the reference base, and if they are significantly different it will fail. For example, if I take Example 2, and make the variant base more biased, it will now pass the strand bias test.

Example 3:

Pos: 4321: Reference Base: 1000, 200 (Forward, Reverse)

Pos: 4321: Variant Base: 100,20 (Forward, Reverse)

The variant will now pass the strand bias test, even though it is more strand biased than example 2.

We were just trying to get our head around what was going on, as he got a high frequency variant, that didn't appear to be strand biased failing the strand biased test (but it passed in other tools like LoFreq etc). And it seems it was failing because the reference base was biased, and as the variant was not it failed.

Does this explanation make sense? And is this just a side affect of using the fischers test.

Xiao Yang

未讀，

2014年6月3日上午11:35:283/6/2014

收件者︰ viral-to...@googlegroups.com

Hi Richard,

I think I kinda understand what you mean. And I believe this is due to the difference for our understanding of strand bias.

By strand bias, we meant the bias as resulted from sequencing not what it is supposed to be. In your second example:

Example 2:

Pos: 4321: Reference Base: 1000, 200 (Forward, Reverse)

Pos: 4321: Variant Base: 100,100 (Forward, Reverse)

You mentioned variant base is not biased -- you are correct if you are assuming the sequencing has 50% vs 50% percent chance of getting both strands. You are comparing what the sequencing process is supposed to be regardless of the actual bias introduced in the sequencing.

We are measuring something different. We did not assume the sequencing of two strands are 50% vs. 50% instead, we measure the relative bias during the sequencing. So, using your example, let's assume there's A, T observed in pos 4321 in such a manner A: 1000, 200, and T: 100, 100

if A is observed 1000 fwd and 200 rev, hence ideally you would observe T 100 fwd and 20 rv. Vice versa, if you observe T as 100:100, then you are supposed to observe 1000: 1000 for A. So, we did not treat A or T as ref or variant as this is a relative term. As long as the strand output is not consistent (assuming sequencing or amplification could have bias), we report it as strand-bias. Here, we use fisher's exact test + FDR to measure the extent of bias.

Hope this helps.

Xiao

--
You received this message because you are subscribed to the Google Groups "Broad Viral Tool Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to viral-tool-use...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
- Xiao

回覆所有人

回覆作者

轉寄