How are summary statistics calculated with --min-gt-depth?

14 views
Skip to first unread message

Yi Fang "Clare" Tan

unread,
Feb 28, 2026, 2:53:39 PM (10 days ago) Feb 28
to Stacks
Hi all,

I'm writing to ask for clarification on the --min-gt-depth argument in the populations module. I would like to run populations to calculate summary statistics on a minimally filtered dataset, just minimum genotype depth and genotype quality; my dataset has a mix of high and low quality data, so while I would like to incorporate both invariant and variant sites in my summary statistics calculations, I think some baseline quality filtering is also necessary. 

To do this, I first tried exporting the raw, unfiltered vcf into SNPfiltR to run the minimum genotype depth and genotype quality filters, but when tried running populations with this filtered vcf, there were errors in the formatting that caused stacks to think there were 83 samples instead of the 82 in my dataset (see pic at end of paragraph). I switched gears and instead tried to whitelist the filtered data, but the resulting summary statistics were identical to the sumstats calculated with just the raw, unfiltered data. I reasoned that this is because SNPfiltR is not dropping sites, only genotype calls. vcftools also does not seem to drop sites. 

filtered.png

I then tried the --min-gt-depth argument that's built into populations and did end up getting different results than the previous runs, so I am curious if this argument is dropping sites within the dataset. I would also appreciate any suggestions for how I might incorporate a genotype quality argument into the populations module (perhaps I just need to tweak some of the formatting for the filtered vcf that I put into populations?). Thanks for reading!

populations.png

Clare
Reply all
Reply to author
Forward
0 new messages