Vparser: LowQual reads leading to false Positives

35 views

Skip to first unread message

RG

unread,

Jan 7, 2014, 7:10:36 AM1/7/14

to viral-to...@googlegroups.com

Hi,

I am using vphaser2 and feeding the input into vparser using the -noendvariant=10 -nt -codon option. Using the codon level analysis is really great for me as I am sequencing HIV and you can have quite a lot of variance from the reference sequence.

After verifying by sanger some of the SNPs I find, I see that the fisher's test for strand bias omits some of the real snps in the codon analysis. Hence, I am now using the raw snp output, although perhaps some different stats can help here (like using confidence intervals, etc.).

However, I also noticed that false positives occur sometimes when there are many reads for a particular codon but a high fraction of them are low quality reads (e.g. >70% LowQual).

For example the GGG here is not observed by Sanger and has really high LowQual :

1467-1469 (coverage of 14495):

Accepted codons:

GGG (G) 3.78% (LowQual : 97.62%)

GGT (G) 96.22% (LowQual : 3.49%)

Rejected codons:

GAT (D) 2 count (2 HQ reads)

GG- (-) 1 count (1 HQ reads)

GTT (V) 2 count (0 HQ reads)

GGA (G) 2 count (1 HQ reads)

CGT (R) 2 count (0 HQ reads)

AGT (S) 2 count (0 HQ reads)

TGT (C) 3 count (1 HQ reads)

GGC (G) 5 count (3 HQ reads)

GCT (A) 4 count (0 HQ reads)

--- (-) 7 count (7 HQ reads)

My questions are:

First, can you let me know what is the basis for calling low quality reads (is this using Qual scores or other metrics).

Second, is there a way to exclude calling mutations that stem from codons with high percentage of low quality reads (e.g. in vphaser or vparser) or do I need to do this manually/write a new script.

Thanks again,

Ron

Reply all

Reply to author

Forward

0 new messages