GQ for vcf

586 views
Skip to first unread message

Emily Fountain

unread,
Apr 6, 2015, 5:34:09 PM4/6/15
to stacks...@googlegroups.com
Hi,

Does anyone know how to calculate GQ for the vcf output?

I found the equation GQ=-10xLog10P(genotype is wrong) but am not sure what value should be used for "genotype is wrong" and where I could find this value.

Cheers,
Emily

Mark Ravinet

unread,
Apr 7, 2015, 2:48:16 AM4/7/15
to stacks...@googlegroups.com
Hi Emily,

This value is the probability of an error - i.e. 0.001 is 1 in 1000 chance that the genotype is incorrect. In this case GQ values are 40, 30 20 for 0.0001, 0.001 and 0.01 respectively.

See here for more info:

http://en.wikipedia.org/wiki/Phred_quality_score

All the best

Mark

Emily Fountain

unread,
Apr 7, 2015, 10:19:06 AM4/7/15
to stacks...@googlegroups.com

Thanks for the reply, Mark. I shamefully admit that throwing a "G" in front of the Q  in the equation was my downfall.

So I guess my ultimate question is, I know where to get the Phred score on my raw data, but how do I get the Phred score on each individual genotype without PL?

My best guess is using GL somehow?

Cheers,
Emily

--
Stacks website: http://creskolab.uoregon.edu/stacks/
---
You received this message because you are subscribed to a topic in the Google Groups "Stacks" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stacks-users/bvXkb1DjHPA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stacks-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mark Ravinet

unread,
Apr 7, 2015, 9:28:56 PM4/7/15
to stacks...@googlegroups.com
Hi Emily,

As far as I know, Stacks does not provide Phred likelihood scores for genotype calls. VCFs produced by Stacks only show the GL field and this is not encoded in the standard way for the VCF format (http://samtools.github.io/hts-specs/VCFv4.2.pdf) - i.e. only a single likeilhood is provided rather than the likelihood of each alternative genotype.

This likelihood should correspond to the genotype called by ustacks or pstacks using the multinomial SNP model. Since Stacks calls SNPs based on a log-likelihood ratio test (see Hohenloe et al 2010 PLOS Genetics for a detailed description), it assess only the two most likely genotypes (i.e. het vs hom) rather than assessing all three. I think this is one of the reasons that GQ is not provided (see here for more details: https://www.broadinstitute.org/gatk/guide/article?id=1268) but I am not certain which of these likelihoods the VCF records. Presumably it is the likelihood for the displayed genotype.

Irrespective of this, the critical value you set for the SNP calling model should in principle act as a filter on genotype quality. By increasing the critical value to say, 0.01 or 0.001, you are increasing the stringency of the calling method. Although not a direct equivalent, this is similar to allowing genotypes that have a 1% or 0.1% probability of being incorrect.

Please be aware this is just my interpretation!

All the best

Mark

Emily Fountain

unread,
Apr 9, 2015, 11:04:01 AM4/9/15
to stacks...@googlegroups.com
Hi Mark,

I have a rough estimate of my genotyping error rate and so I am going to go with just guesstimating GQ from that for now. Since I just want to get an idea of what is going on with my data and it seems there is no direct way to get GQ without running another analyses like VCFtools, I think this should be enough. Thanks for your help.

Cheers,
Emily


--
--" People like what science gives them, but not the questions, no. Not the questions science asks."--
--Mr. Rzykruski, science teacher, Frankenweenie
Reply all
Reply to author
Forward
0 new messages