Hi Amir,
Sorry for the long delay, some answers below.
On 1/29/13 10:39 AM, Amir wrote:
> My first question mainly is how we can tell /Stacks/ that we have a polyploid
> organism.
Unfortunately, you can't. Stacks is designed for dipliod organisms and will also
work on haploid organisms. It can be used for polyploid organisms, but it is not
designed for this. For example, the SNP calling model is based on a diploid
individual, and it looks for alleles to be in a 50/50 ratio. In a polyploid
individual these ratios can differ quite a bit from 50/50 depending on how
duplicated a locus is and it eventually becomes difficult to tell certain
alleles from sequencing error. You can use the bounded SNP calling model, which
can cause ustacks to call alleles at lower frequency and may help you capture
additional mappable loci.
We have long term plans for supporting various levels of ploidy, but not in the
near term.
> According to previous threads, I concluded that in case of having
> polyploid organism we need to change "--max_locus_stacks" in ustacks and
> increment it by 1 to allow unpredictable errors. Is it a valid conclusion? If
> so, since it is /not/ one of the parameters in '
denovo_map.pl
> <
http://denovo_map.pl/>' script, does it mean in such cases we need to run the
> whole process by hand? / Is '
denovo_map.pl <
http://denovo_map.pl/>' designed
> just for diploid organisms?
>
You can adjust the --max_locus_stacks parameters to allow more alleles to stack
up at a locus, depending on your ploidy level. You will need to run ustacks by
hand to add this parameter, or modify
denovo_map.pl to insert this parameter for
you.
> Second question is about markers in such organisms. As we know there are 10
> classes of mappable markers in Stacks, but in polyploid organisms isn't it
> possible to have more than that? For example, there are some loci in which one
> of the parents show more than two different haplotypes and since we do not have
> a marker like "abcd/ab" in those ten classes, Stacks does not report it in
> 'batch_X.markers.tsv' file and ignores it as a marker. Could not that be a
> marker in a polyploid organism? If it could, how can we tell Stacks to consider
> it and report it?
Again, since Stacks is designed for diploid organisms, it assumes that there
will not be more than two alleles at a locus so if if finds more than two, it
considers it an error. My question to you would be: what linkage mapping
software will you use to handle these types of loci? I do not know of any
linkage mappers that explicitly handle polyploid markers but perhaps you do.
Stacks does export the raw haplotypes. It is fairly trivial to write a script or
even just use find/replace in a program like excel, to replace any particular
haplotype with the marker you want. This allows you to take any type of
haplotype if you know how it will be mapped in a linkage mapper.
>
> I also have a general question. The number of reads obtained by summing up each
> Allele's depth for each sample is always less than the number of reads fed into
> '
denovo_map.pl' program at the beginning. Those reads that are not used are only
> removed by '-t' option of
denovo_map.pl program that eliminates repetitive
> sequences or there are some other places that reads may be rejected?
>
Reads are removed chiefly because they don't initially form stacks (this is
controlled by the -m parameter). These reads are set aside until a later stage
when Stacks tries to align them back against assembled loci to increase locus
depth. If they don't align at this stage then they are discarded.
ustacks will output, and
denovo_map.pl will record in denovo_map.log, the exact
number of used and unused reads.
Best,
julian