Hello,
I am calculating diversity measures from a VCF table with over 60K SNPS. This data came from WGS methods and does have a large amount of missing data compared to RADseq.
Originally I used Hierfstat to calculate Ho, He, and FIS, among other stats. I then came across the DaRt package and used the the gl.report.heterozygosity function. The calculations for FIS and Ho are the same, but “He” is lower for each sample. Do you know why this would be and if the methods differ at all in their estimation of He? I can see if missing data is handled differently between the two calculations it could explain the differences.
If you have any insight into why the two methods calculate different results I would appreciate it. Thanks, Sean
Here is a simplified code and results I used.
#hierfstat
BS2 <- gl.basic.stats(Genlight, digits = 4)
Ho <- apply(BS2$Ho, MARGIN = 2, FUN = mean, na.rm = TRUE) %>%
round(digits = 2)
He <- apply(BS2$Hs, MARGIN = 2, FUN = mean, na.rm = TRUE) %>%
round(digits = 2)
Fis <- apply(BS2$Fis, MARGIN = 2, FUN = mean, na.rm = TRUE) %>%
round(digits = 2)
#gl.report.heterozygosity:
data <- gl.report.heterozygosity(
Genlight,
method = "pop")
Results:
Hobs He_Hierfstat He_Gl_report
0.31 0.3 0.27
0.27 0.27 0.24
0.26 0.27 0.24
0.29 0.26 0.24
0.26 0.25 0.23
0.29 0.25 0.23
0.27 0.23 0.18
0.25 0.22 0.18
0.26 0.25 0.23
0.26 0.27 0.24
0.25 0.27 0.22
0.26 0.26 0.22
0.27 0.24 0.23
0.3 0.28 0.24
0.27 0.2 0.19