Setting minimum coverage depth for SNP calls

467 views
Skip to first unread message

Ira Herniter

unread,
Feb 14, 2022, 4:30:05 PM2/14/22
to Stacks
Hi,

I've been trying to figure out how to make use of only SNPs that have deep enough coverage to eliminate artifacts in my trait mapping. I've been having trouble figuring out how I should go about doing so.

In the page for the ref_map.pl command there isn't anything mentioning read depth,but there is a "-X" option saying it's for "extra options" but doesn't spell it out. Is there a way I can set a minimum threshold for calling a SNP?

Thanks,
Ira

Catchen, Julian

unread,
Feb 15, 2022, 9:38:18 AM2/15/22
to stacks...@googlegroups.com

Hi Ira,

 

There is not an explicit method to set a minimum depth of coverage as the software relies on the SNP-calling model to weigh the evidence. You can require the model to use a more strict alpha value, which will require more evidence, i.e. more reads, to reach a level of statistical significance. You can change the default alpha from 0.05 to, e.g. 0.01 using the --gt-alpha flag to gstacks.

 

I would also recommend reviewing the overall depth of coverage for all your samples and excluding any samples that have very low coverage from the analysis entirely.

 

Best,

 

julian

Ira Herniter

unread,
Feb 16, 2022, 5:23:21 PM2/16/22
to Stacks
Hi  Julian,

I'll try changing the alpha value. 
Is there an easy way to check the coverage for all the accessions? 

Thanks,
Ira

Catchen, Julian

unread,
Feb 17, 2022, 2:57:47 AM2/17/22
to stacks...@googlegroups.com

Hi Ira,

 

See our newest protocol chapter that lays out how to do it:

 

https://www.biorxiv.org/content/10.1101/2021.11.02.466953v1.full

 

(the chapter is in press).

Andrea Bernard

unread,
Apr 5, 2022, 2:13:52 PM4/5/22
to Stacks
Hi Everyone,
I've run gstacks using three different values for gt-alpha (0.05, 0.01, and 0.005) to assess variance in genotyping stringency; however, when I used the "stacks-dist-extract" utility to assess coverage across values, all of the values are the same in the output table (n_loci, n_used_fw_reads, mean_cov, mean_cov_ns). Shouldn't these values change based on the value of gt-alpha? I'm just wondering if there is something I'm missing here.  Any help would be appreciated.

fyi: All other gstacks parameters are set to default values/options.

Thanks,
Andrea

Catchen, Julian

unread,
Apr 6, 2022, 11:08:30 AM4/6/22
to stacks...@googlegroups.com

Hi Andrea,

 

Changing the genotyping parameters will not affect the assembly of loci in gstacks. Assembly happens independently of genotyping, and depth of coverage is a result of assembly. However, while you should get the same number of loci, no matter what you set gt-alpha to, you should see more or less numbers of called SNPs. If you run the populations program, which will summarize the SNP calls, you should see that making gt-alpha more stringent reduces the number of SNP calls (as calls can only be made with sufficient coverage/evidence). If you look at say, the VCF export from populations, you should see the reported allele depths shift with different values of gt-alpha.

 

Best,

 

juilan

Andrea Bernard

unread,
Apr 13, 2022, 5:47:47 PM4/13/22
to Stacks
Hi Julian,
Thanks for your response - it is very much appreciated.
In case anyone has the same type of question ...
I did as suggested and ran the populations module using the output from all three gstacks gt-alpha values (0.05, 0.01, and 0.005) and with each increase in stringency, the number of SNPs written to the vcf file decreased, and the respective mean coverage (over individuals and sites) increased (as estimated via vcftools).   
Many thanks,
Andrea
Reply all
Reply to author
Forward
0 new messages