FIS across loci - Stacks populations module vs. other software

264 views
Skip to first unread message

Harriet Hunt

unread,
Mar 11, 2024, 6:05:27 AM3/11/24
to Stacks
Dear Stacks users,

I have recently been comparing various software packages for calculating FIS across loci in a population. I wanted to flag up that the result is quite different depending on the software package used, and also query the way that Stacks performs this calculation.

I can't write the following in proper mathematical notation, but hopefully it is clear:

Stacks: 
FIS = average(1-(Ho/Hs)) across loci. If Hs=0 (common if the subpopulation is small and the alternate allele is not observed), then it evaluates Ho/Hs = 1 and hence 1-(Ho/Hs) = 0. So the mean FIS is decreased by including many 0's in the average.

hierfstat in R: 
calculates (1-(Ho/Hs)) for each locus as above, but if Hs=0, it returns Ho/Hs=NaN and 1-(Ho/Hs) = NaN. Average across loci is not calculated natively but users are recommended to take FIS = average(1-(Ho/Hs)), excluding loci where the per-locus FIS is NaN. So the mean FIS is typically (considerably) higher than when Stacks performs this calculation.

GenoDive:
[I was using GenoDive because of its ability to handle polyploid data, but it turns out that even for diploids, FIS isn't comparable to the above two packages]
FIS across loci = 1-(average(Ho)/average(Hs)).

These calculation details are not at all clear from the respective software manuals, it took me some time to figure out exactly what they were all doing. So I wanted to post this partly  for general interest (it's been asked at least once before on this group), and also to query with Julian why you think treating Ho/Hs = 1, where Hs=0, is the best approach?
 
Best regards, Harriet
Reply all
Reply to author
Forward
0 new messages