Hi Julian,
Many thanks for creating and maintaining this fantastic software. It's the foundation of my PhD research :)
I ran the Stacks denovo pipeline on a dataset of stingrays from different locations. I exported to vcf format from populations and then filtered SNPs and individuals using a custom script in R . I then exported a whitelist of loci to produce a new vcf with populations using only the SNPs of interest. The whitelist is specified as CHROM<tab>POS.
Weirdly, when I run populations, most of the whitelist SNPs are being filtered out despite the fact that I did not specify any filtering criteria beyond the whitelist. My script is here (minus the directories):
populations -t 20 -P <stacks_directory> -O <out_directory> --popmap whitelist_popmap.txt -W whitelist_loci.txt --hwe --radpainter --vcf
The funny line from the output file is here:
Removed 201387 loci that did not pass sample/population constraints from 208418 loci.
Kept 7031 loci, composed of 6629259 sites; 0 of those sites were filtered, 205 variant sites remained.
The 7031 value is the number of loci, not the number of SNPs; there were >10,000 SNPs distributed over these loci but this number doesn't show up in the log file anywhere - just the number of loci. Not sure if that's relevant ...
I also double-checked and the whitelist SNPs ARE SNPs i.e. there is variation among the samples at these locations. Why would populations filter out over 10,000 SNPs when no filters are set beyond the whitelist?
For reference, I'm using Stacks version 2.3d.
Thanks for your time and consideration and for any insight you can provide.
Best,
John