stacks for polyploid organisms

1,165 views
Skip to first unread message

geek_y

unread,
Apr 18, 2014, 11:47:57 AM4/18/14
to stacks...@googlegroups.com
Hi,

I would like to know if stacks has been updated to handle polyploid organisms. If not, I would like to know the tweak parameters to use stacks for polyploid organisms. 

Julian Catchen

unread,
Apr 19, 2014, 6:22:49 PM4/19/14
to stacks...@googlegroups.com, goutha...@gmail.com
Hi,

Stacks has not specifically been altered for polyploid organisms, which
mainly would consist of deploying a SNP calling model that can handle
allele depths at a site in proportion to the ploidy level of the
organism being examined. Currently, Stacks' SNP calling model is written
with the expectation of a diploid organism, which means the expected
allele frequencies at a polymorphic site are 50% for each allele. With
higher levels of ploidy, you can have ratios such as 25%/75%, or other
combinations, and it can become difficult to distinguish sequencing
error from true alleles without high depth of coverage.

However, you can use the bounded SNP model in ustacks to call SNPs at
sites below 50% allele depths. The bounded model tells ustacks that
sequencing error is unlikely to exceed the bound, so if an allele is
found at less than 50% frequency, it is likely a second allele and not
sequencing error. You may also consider increasing the value of the
--max_locus_stacks parameter to handle loci with more than one SNP at a
locus (where each allele from the collection of paralogous loci will
appear as a separate stack).

It has been a while since I have looked into SNP models for polyploid
organisms, so let me know if anyone out there has had good experience
with a particular model.

Best,

julian

geek_y

unread,
Apr 19, 2014, 11:51:27 PM4/19/14
to stacks...@googlegroups.com
Thank you Julian for your prompt response. So if we have a coverage of 30 I.e minimum cluster size of 30 would be enough to call SNPs in polyploid organisms with good confidence.

Julian Catchen

unread,
Apr 20, 2014, 1:03:34 PM4/20/14
to stacks...@googlegroups.com
I can't say what the proper coverage is going to be. It will be specific to your organism and the level of ploidy, among other things. You will have to work with the data and see how it responds to different parameter values. I would not use a minimum coverage of 30 to ustacks, this is much too high and you will artificially drop lots of alleles from your dataset. I would not set this above 5 or 10, and instead I would allow the rest of the pipeline to deal with the potential variation in coverage.

julian

mariogiov

unread,
Oct 28, 2014, 4:05:24 AM10/28/14
to stacks...@googlegroups.com, goutha...@gmail.com, jcat...@uoregon.edu
Hi Julian,

I've also been trying to figure a good way to analyze polyploids using RADSeq, and it looks like it has been done but I'm afraid my statistical background is not very good and so I'm not the one to try hacking this part of Stacks. I did see some interesting things I thought could be of help in implementing a new model into Stacks however.

In this paper they propose a model for polyploid genotyping:


This abstract makes specific reference to using a "modified version of Stacks" for polyploid genotyping, but I wonder if you were aware of / involved in this or not:


Freebayes uses a model that can accomodate various ploidies:


I don't know if any of this is useful and I suppose you may be too busy to rework this right now, but I would say it would be a great feature to have in Stacks.



Best,

Mario
Reply all
Reply to author
Forward
Message has been deleted
0 new messages