Re: [stacks] Genotype quality estimation question

689 views
Skip to first unread message

Nicolas Rochette

unread,
May 24, 2018, 1:05:10 PM5/24/18
to Stacks

Hi Emma,

You are correct, the GQ field in populations.snps.vcf is the same as the p-value for the likelihood ratio test for the values given in the GL field, i.e. with --gt-alpha=0.01 the GQs will be higher than 20.

--var-alpha impacts the calling of polymorphisms (whether a particular site is variable rather than fixed/invariant across all individuals). Note that for the marukilow model no multiple-testing correction is needed, but that for the marukihigh and snp models you should in principle use one or at least use a conservative threshold.

By AB do you mean allele frequencies? At the moment for simplicity we do not report marukilow's optimized allele frequencies, but you can just count the alleles (this will be skewed if your coverage is low, though).

Best,

Nicolas

On 5/15/18 2:50 AM, Emma wrote:
Hi,

I am in the process of filtering my reference aligned ddRAD data (stacks v. 2.0Beta9) and wanted to ask about the stacks vcf output and quality score estimation.

Can you please confirm how the genotype quality that is exported as part of the vcf file is calculated? The gt-alpha parameter sets a minimum threshold for the likelihood-ratio test statistic, but if the p-value for each genotype is calculated is this used to estimate the quality score, e.g., if the alpha value is 0.01 would the minimum quality score be equivalent to 20? Or are these just not comparable? Also can you please confirm that the genotype quality score in the populations.snps.vcf file is for the genotype not the individual SNP? Or is this only calculated for SNPs if you change the default genotyping model from marikulow to snp?

Additionally, information like allele balance (AB) per SNP are not exported in the vcf file, but presumably would be available? However, this information is integrated into the genotype calling algorithm of Maruki & Lynch so I guess that's why it's not exported?

thanks for building and maintaining such a great pipeline!
Emma


--
Stacks website: http://catchenlab.life.illinois.edu/stacks/
---
You received this message because you are subscribed to the Google Groups "Stacks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users...@googlegroups.com.
Visit this group at https://groups.google.com/group/stacks-users.
For more options, visit https://groups.google.com/d/optout.

Carol

unread,
Jul 15, 2018, 5:11:18 AM7/15/18
to Stacks
Dear all,

I'm trying to understand how is the genotype likelihood estimated, I have read the Maruki & Lynch 2015/17 but still not very clear. 

Could you please explain a bit more how does the alpha value of 0.01 correspond to minimun GL of 20?

Hope you can give me a hand!

Cheers

Carol

Nicolas Rochette

unread,
Jul 16, 2018, 12:53:01 PM7/16/18
to Stacks

Hi Carol,

Yes, a gt-alpha of 0.01 corresponds to a GQ of ≥20. And the p-value that is compared to the alpha is derived from a likelihood ratio test on the likelihoods of the two most likely genotypes. In the standard GQ is defined somewhat loosely because no particular statistical framework is assumed but in our case it is –log10(likelihood ratio test p-value)

Best,

Nicolas

Emma

unread,
Jul 23, 2018, 5:31:49 AM7/23/18
to Stacks
Hi Nicolas,

Thanks for your response. By allele balance I meant number of reads per allele in each individual. Some researchers recommend filtering loci if the allele balance has a strong skew, but as the Maruki & Lynch (2017) algorithm takes into account depth of coverage per allele, this seems unnecessary for stacks-generated loci. 

Just so I am clear, what Maruki & Lynch call 'significant polymorphic sites' are identified, and the significance is set with var-alpha, and then the genotype is called using these loci, with significance determined by the gt--alpha call? 

cheers
Emma

Nicolas Rochette

unread,
Jul 23, 2018, 1:20:04 PM7/23/18
to Stacks

For per-allele coverage, the AD (allele depth) field is available and more informative that the lone allele balance figure.

Regarding your second point, you are correct.

Regards,

Nicolas

Carol Buitrago

unread,
Jul 29, 2018, 8:08:54 AM7/29/18
to stacks...@googlegroups.com
Hi Nicolas,

Thanks for the reply, but to be completely honest I don't understand how the GQ of 20 corresponds to gt-alpha of 0.01. If you don't mind it will be really helpful if you provide a more detail explanation. Also, if using GQ to filter out SNPs, what is recommended as threshold?

Thanks in advance,

Carol 

You received this message because you are subscribed to a topic in the Google Groups "Stacks" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stacks-users/VP4qAMaBg3U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stacks-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/stacks-users/73458e47-82d3-cbd8-79f5-a8224d46ce6b%40illinois.edu.
Reply all
Reply to author
Forward
0 new messages