Different number of sites when rerunning population module stacks v2.62 and v2.65

82 views
Skip to first unread message

Kate Rick

unread,
Nov 3, 2023, 2:55:02 AM11/3/23
to Stacks
Hello,

I have re-run the populations module using both stacks v2.62 and v2.65 on a population and have very different results in the number of sites. This is then effecting my other diversity metrics, including heterozygosity. 

The catalogue file, popmap and code are all the same as far as I can tell. Can someone please explain to me why they are so different:

Population summary statistics for v2.62
Removed 692569 loci that did not pass sample/population constraints from 723739 loci.
Kept 31170 loci, composed of 3937110 sites; 13406 of those sites were filtered, 16857 variant sites remained.
    3932339 genomic sites, of which 4570 were covered by multiple loci (0.1%).
Mean genotyped sites per locus: 126.12bp (stderr 0.02).
Population summary statistics (more detail in populations.sumstats_summary.tsv):
  GUNT_Period3: 15 samples per locus; pi: 0.19875; all/variant/polymorphic sites: 3931221/16857/11788; private alleles: 1150
  GUNT_Period1: 7 samples per locus; pi: 0.22164; all/variant/polymorphic sites: 3931176/16857/11570; private alleles: 1208
  GUNT_Period2: 7 samples per locus; pi: 0.23771; all/variant/polymorphic sites: 3931153/16857/13545; private alleles: 2950
Populations is done.

Population summary statistics for v2.65
Removed 692569 loci that did not pass sample/population constraints from 723739 loci.
Kept 31170 loci, composed of 3937110 sites; 3909647 of those sites were filtered, 16857 variant sites remained.
    40653 genomic sites, of which 19 were covered by multiple loci (0.0%).
Mean genotyped sites per locus: 1.30bp (stderr 0.02).
Population summary statistics (more detail in populations.sumstats_summary.tsv):
  GUNT_Period1: 7 samples per locus; pi: 0.22164; all/variant/polymorphic sites: 40575/16857/11570; private alleles: 1208
  GUNT_Period2: 7 samples per locus; pi: 0.23771; all/variant/polymorphic sites: 40552/16857/13545; private alleles: 2950
  GUNT_Period3: 15 samples per locus; pi: 0.19875; all/variant/polymorphic sites: 40620/16857/11788; private alleles: 279090
Populations is done.


I have also uploaded the population.log files for comparison. 

Thank you for your assistance.
Kate
May2023GUNT_populations.log
Nov2023GUNT_populations.log

Catchen, Julian

unread,
Nov 3, 2023, 5:54:37 PM11/3/23
to stacks...@googlegroups.com

Hi Kate,

 

In earlier versions of the populations program filters were not applied to fixed sites since they were not exported from the program. However, recently we added a --vcf-all export which exports fixed sites along with variant sites. To make this more accurate, we updated the code to apply filters to fixed sites as well. If you examine the populations.sumstats.tsv files from your two runs, they should be the same, since this file contains all the variant sties in the data set and filters should have been applied without change between the two Stacks versions.

 

Best,

 

julian

Reply all
Reply to author
Forward
0 new messages