Hi Giulia,
The core Stacks pipeline (ustacks/cstacks/sstacks/gstacks) is agnostic
when it comes to population structure. It assembles all the loci across
the dataset without much thought to specific biology (with an exception
to SNP calling). So running the wrappers vs. running individual
components will not effect private alleles.
It is the populations program that adds the population frame to the data
and this is true for private alleles. These alleles will be defined
based on your population map -- if you have a single population in the
analysis then the definition is as you would expect. However, if you
have multiple populations then private alleles are specific to each
population. If you change the population map (say moving from a
geographically based population definition to populations based on sex),
you will see different alleles identified as private.
Of course, as your sample size increases you will also see more private
alleles, but this is related to your power to detect them or the
probability of sampling a low frequency allele in the population.
Relatedly, the other parameter which can affect this definition is the
minor allele frequency (MAF) filter (and its related MAC). Obviously,
private alleles are often at low frequency, so if you filter out all low
frequency alleles, you will see a commensurate drop in private alleles.
Best,
julian
Giulia Trauzzi wrote on 6/8/21 5:02 AM: