Hi,
Stacks has not specifically been altered for polyploid organisms, which
mainly would consist of deploying a SNP calling model that can handle
allele depths at a site in proportion to the ploidy level of the
organism being examined. Currently, Stacks' SNP calling model is written
with the expectation of a diploid organism, which means the expected
allele frequencies at a polymorphic site are 50% for each allele. With
higher levels of ploidy, you can have ratios such as 25%/75%, or other
combinations, and it can become difficult to distinguish sequencing
error from true alleles without high depth of coverage.
However, you can use the bounded SNP model in ustacks to call SNPs at
sites below 50% allele depths. The bounded model tells ustacks that
sequencing error is unlikely to exceed the bound, so if an allele is
found at less than 50% frequency, it is likely a second allele and not
sequencing error. You may also consider increasing the value of the
--max_locus_stacks parameter to handle loci with more than one SNP at a
locus (where each allele from the collection of paralogous loci will
appear as a separate stack).
It has been a while since I have looked into SNP models for polyploid
organisms, so let me know if anyone out there has had good experience
with a particular model.
Best,
julian