prune_haplo - please clarify what it is doing?

84 views
Skip to first unread message

Zoo Keeper

unread,
Jun 12, 2015, 9:24:36 AM6/12/15
to stacks...@googlegroups.com
Hello,

I have a question very similar to a previous post (https://groups.google.com/forum/#!topic/stacks-users/hJr5g8gq-6c).

Specifically, I'd like to better understand what prune_haplo does.

In a reply to that previous post, Julian said, "As for the --prune_haplo filter, there are two algorithms to prune excess haplotypes from a locus. The first simply looks at haplotype frequencies at a particular locus across the population and tries to identify the haplotypes that occur least often (weighted by the read depth of each haplotype) and for each individual, keeps the two most frequent haplotypes. However, there are often ties when trying to decide which haplotypes to remove."

With respect to the portion, "...and for each individual, keeps the two most frequent haplotypes", does this mean that prune_haplo assumes that, over the entire population, there should be only 2 haplotypes?  (As a side-question, I'm unsure whether, in this context, a haplotype is equivalent to an allele?)

I ask because I'm using ddRAD to look at introgression between 2 species.  Given enough divergence between the species, it is possible (but not common) to have >2 alleles at a particular base-pair - more generally, we expect a fair amount of diversity (say, >5% for our species) across each RAD locus (117 bp, in our case). 

It would be useful to know how prune-haplo would affect our data, to understand whether it is appropriate to use it.

Thank you for your help!

Crispin

Julian Catchen

unread,
Jun 16, 2015, 7:13:06 PM6/16/15
to stacks...@googlegroups.com, zoo.keep...@gmail.com
Hi Crispin,

Zoo Keeper wrote:
> I have a question very similar to a previous post
> (https://groups.google.com/forum/#!topic/stacks-users/hJr5g8gq-6c).
>
> Specifically, I'd like to better understand what prune_haplo does.
>
> In a reply to that previous post, Julian said, "As for the --prune_haplo
> filter, there are two algorithms to prune excess haplotypes from a
> locus. The first simply looks at haplotype frequencies at a particular
> locus across the population and tries to identify the haplotypes that
> occur least often (weighted by the read depth of each haplotype) and for
> each individual, keeps the two most frequent haplotypes. However, there
> are often ties when trying to decide which haplotypes to remove."
>
> With respect to the portion, "...and for each individual, keeps the two
> most frequent haplotypes", does this mean that prune_haplo assumes that,
> over the entire population, there should be only 2 haplotypes? (As a
> side-question, I'm unsure whether, in this context, a haplotype is
> equivalent to an allele?)
>

The --prune_haplo option to the rxstacks program tries to prune out
excess haplotypes in individuals by looking at the population level
frequencies. In the algorithm you cite, it is keeping the two most
frequent haplotypes in each individual.

In this case, a haplotype is the length of the RAD locus. Typically it
will be the length of the sequencing read, say 100bp. If there is one
SNP in the RAD locus than the haplotype is the same as a single
nucleotide polymorphism. But if there are multiple SNPs in the
haplotype, then you can get various combinations of the SNPs giving
multiple haplotypes across the population (but still only two haplotypes
per individual).

Best,

julian
Reply all
Reply to author
Forward
0 new messages