Hi Kelly,
Almost all Fst calculations are susceptible to negative values in
certain instances. However, our original Fst implementation is more
susceptible than other methods, typically when you have extreme
differences in sample sizes. For this reason, we implemented the AMOVA
Fst and our smoothing/bootstrapping algorithms now rely on this method
which is much less likely to give negative values. In other words, we
consider the AMOVA Fst to be the best measure to use and have kept the
previous implementation just for historical purposes.
Just for the record, we do not calculate SNP-based Fst values for loci
that have more than two alleles present (in fact we do not calculate any
SNP-based summary statistics for these loci, they are filtered out).
The p-value, odds ratio and confidence limits are calculated from a 2x2
contingency table of allele counts, they are not calculated from the Fst
value:
| allele1 | allele2
-----+---------+--------
pop1 | Cnt1 | Cnt2
pop2 | Cnt3 | Cnt4
We use Fisher's exact test with the null hypothesis that the allele
counts in the two populations being compared are the same. A small
p-value indicates that the allele counts are not the same and the odds
ratio (e.g. effect size) gives an indication of how different they are,
with the confidence limit around that measure.
If you want a p-value specifically for your Fst measure, you should use
the bootstrapping feature in the populations program.
You might also consider using haplotype measures of Fst (Phi_st, Fst')
which have recently been implemented in Stacks. These calculations
consider each RAD locus as a haplotype and each haplotype may have one
or more SNPs in it. We have had a good experience with these measures in
our most recent work.
Best,
julian