Hi Crispin,
Zoo Keeper wrote:
> I have a question very similar to a previous post
> (
https://groups.google.com/forum/#!topic/stacks-users/hJr5g8gq-6c).
>
> Specifically, I'd like to better understand what prune_haplo does.
>
> In a reply to that previous post, Julian said, "As for the --prune_haplo
> filter, there are two algorithms to prune excess haplotypes from a
> locus. The first simply looks at haplotype frequencies at a particular
> locus across the population and tries to identify the haplotypes that
> occur least often (weighted by the read depth of each haplotype) and for
> each individual, keeps the two most frequent haplotypes. However, there
> are often ties when trying to decide which haplotypes to remove."
>
> With respect to the portion, "...and for each individual, keeps the two
> most frequent haplotypes", does this mean that prune_haplo assumes that,
> over the entire population, there should be only 2 haplotypes? (As a
> side-question, I'm unsure whether, in this context, a haplotype is
> equivalent to an allele?)
>
The --prune_haplo option to the rxstacks program tries to prune out
excess haplotypes in individuals by looking at the population level
frequencies. In the algorithm you cite, it is keeping the two most
frequent haplotypes in each individual.
In this case, a haplotype is the length of the RAD locus. Typically it
will be the length of the sequencing read, say 100bp. If there is one
SNP in the RAD locus than the haplotype is the same as a single
nucleotide polymorphism. But if there are multiple SNPs in the
haplotype, then you can get various combinations of the SNPs giving
multiple haplotypes across the population (but still only two haplotypes
per individual).
Best,
julian