Vparser: LowQual reads leading to false Positives

35 views
Skip to first unread message

RG

unread,
Jan 7, 2014, 7:10:36 AM1/7/14
to viral-to...@googlegroups.com
Hi,
I am using vphaser2 and feeding the input into vparser using the -noendvariant=10 -nt -codon option.  Using the codon level analysis is really great for me as I am sequencing HIV and you can have quite a lot of variance from the reference sequence.
After verifying by sanger some of the SNPs I find, I see that the fisher's test for strand bias omits some of the real snps in the codon analysis. Hence, I am now using the raw snp output, although perhaps some different stats can help here (like using confidence intervals, etc.).
However, I also noticed that  false positives occur sometimes when there are many reads for a particular codon but a high fraction of them are low quality reads (e.g. >70% LowQual). 

For example the GGG here is not observed by Sanger and has really high LowQual :

1467-1469 (coverage of 14495):
Accepted codons:
GGG (G) 3.78% (LowQual : 97.62%)
GGT (G) 96.22% (LowQual : 3.49%)
Rejected codons:
GAT (D) 2 count (2 HQ reads)
GG- (-) 1 count (1 HQ reads)
GTT (V) 2 count (0 HQ reads)
GGA (G) 2 count (1 HQ reads)
CGT (R) 2 count (0 HQ reads)
AGT (S) 2 count (0 HQ reads)
TGT (C) 3 count (1 HQ reads)
GGC (G) 5 count (3 HQ reads)
GCT (A) 4 count (0 HQ reads)
--- (-) 7 count (7 HQ reads)



My questions are:
First, can you let me know what is the basis for calling low quality reads (is this using Qual scores or other metrics). 
Second, is there a way to exclude calling mutations that stem from codons with high percentage of low quality reads (e.g. in vphaser or vparser) or do I need to do this manually/write a new script. 

Thanks again, 
Ron
Reply all
Reply to author
Forward
0 new messages