FIS across loci - Stacks populations module vs. other software

265 views

Skip to first unread message

Harriet Hunt

unread,

Mar 11, 2024, 6:05:27 AM3/11/24

to Stacks

Dear Stacks users,

I have recently been comparing various software packages for calculating FIS across loci in a population. I wanted to flag up that the result is quite different depending on the software package used, and also query the way that Stacks performs this calculation.

I can't write the following in proper mathematical notation, but hopefully it is clear:

Stacks:

FIS = average(1-(Ho/Hs)) across loci. If Hs=0 (common if the subpopulation is small and the alternate allele is not observed), then it evaluates Ho/Hs = 1 and hence 1-(Ho/Hs) = 0. So the mean FIS is decreased by including many 0's in the average.

hierfstat in R:

calculates (1-(Ho/Hs)) for each locus as above, but if Hs=0, it returns Ho/Hs=NaN and 1-(Ho/Hs) = NaN. Average across loci is not calculated natively but users are recommended to take FIS = average(1-(Ho/Hs)), excluding loci where the per-locus FIS is NaN. So the mean FIS is typically (considerably) higher than when Stacks performs this calculation.

GenoDive:

[I was using GenoDive because of its ability to handle polyploid data, but it turns out that even for diploids, FIS isn't comparable to the above two packages]

FIS across loci = 1-(average(Ho)/average(Hs)).

These calculation details are not at all clear from the respective software manuals, it took me some time to figure out exactly what they were all doing. So I wanted to post this partly for general interest (it's been asked at least once before on this group), and also to query with Julian why you think treating Ho/Hs = 1, where Hs=0, is the best approach?