Dear Stacks users,
I have recently been comparing various software packages for calculating FIS across loci in a population. I wanted to flag up that the result is quite different depending on the software package used, and also query the way that Stacks performs this calculation.
I can't write the following in proper mathematical notation, but hopefully it is clear:
Stacks:
FIS = average(1-(Ho/Hs)) across loci. If Hs=0 (common if the subpopulation is small and the alternate allele is not observed), then it evaluates Ho/Hs = 1 and hence 1-(Ho/Hs) = 0. So the mean FIS is decreased by including many 0's in the average.
hierfstat in R:
calculates (1-(Ho/Hs)) for each locus as above, but if Hs=0, it returns Ho/Hs=NaN and 1-(Ho/Hs) = NaN. Average across loci is not calculated natively but users are recommended to take FIS = average(1-(Ho/Hs)), excluding loci where the per-locus FIS is NaN. So the mean FIS is typically (considerably) higher than when Stacks performs this calculation.
GenoDive:
[I was using GenoDive because of its ability to handle polyploid data, but it turns out that even for diploids, FIS isn't comparable to the above two packages]
FIS across loci = 1-(average(Ho)/average(Hs)).
These calculation details are not at all clear from the respective software manuals, it took me some time to figure out exactly what they were all doing. So I wanted to post this partly for general interest (it's been asked at least once before on this group), and also to query with Julian why you think treating Ho/Hs = 1, where Hs=0, is the best approach?
Best regards, Harriet