Differing measures of Heterozygosity

Sean Streich

unread,

May 26, 2022, 3:10:23 PM5/26/22

to dartR

Hello,

I am calculating diversity measures from a VCF table with over 60K SNPS. This data came from WGS methods and does have a large amount of missing data compared to RADseq.

Originally I used Hierfstat to calculate Ho, He, and FIS, among other stats. I then came across the DaRt package and used the the gl.report.heterozygosity function. The calculations for FIS and Ho are the same, but “He” is lower for each sample. Do you know why this would be and if the methods differ at all in their estimation of He? I can see if missing data is handled differently between the two calculations it could explain the differences.

If you have any insight into why the two methods calculate different results I would appreciate it. Thanks, Sean

Here is a simplified code and results I used.

#hierfstat

BS2 <- gl.basic.stats(Genlight, digits = 4)

Ho <- apply(BS2$Ho, MARGIN = 2, FUN = mean, na.rm = TRUE) %>%

round(digits = 2)

He <- apply(BS2$Hs, MARGIN = 2, FUN = mean, na.rm = TRUE) %>%

round(digits = 2)

Fis <- apply(BS2$Fis, MARGIN = 2, FUN = mean, na.rm = TRUE) %>%

round(digits = 2)

#gl.report.heterozygosity:

data <- gl.report.heterozygosity(

Genlight,

method = "pop")

Results:

Hobs He_Hierfstat He_Gl_report

0.31 0.3 0.27

0.27 0.27 0.24

0.26 0.27 0.24

0.29 0.26 0.24

0.26 0.25 0.23

0.29 0.25 0.23

0.27 0.23 0.18

0.25 0.22 0.18

0.26 0.25 0.23

0.26 0.27 0.24

0.25 0.27 0.22

0.26 0.26 0.22

0.27 0.24 0.23

0.3 0.28 0.24

0.27 0.2 0.19

Jose Luis Mijangos

unread,

May 27, 2022, 1:53:34 AM5/27/22

to dartR

Hi,

I think the differences come from how missing data and sample size are treated.

In the latest version of dartR, the function gl.report.heterozygosity() was updated to include correction for sample size, including missing data (i.e. unbiased heterozygosity)
So the statistics you should look at from the output of the function is "uHe".

To install the latest version of dartR:

> devtools::install_github("green-striped-gecko/dartR")

You could also try removing all the missing data and testing all the functions again to see if you get the same results, for example:

test <- gl.filter.callrate(your_gl, threshold = 1)

Cheers,
Luis

Sean Streich

unread,

May 27, 2022, 7:15:19 PM5/27/22

to dartR

Thanks for the response.

Unfortunately with low coverage whole genome sequencing I do not have samples or loci have no mussing data. I will keep both results and talk with some other population geneticist as I progress with my research. The results from dartR are more of what I expected to see, in my populations so if gl.report.heterozygosity somehow handles missing data differently I can see why He estimates would be lower than in Hierfstat.

Thanks,
Sean

David Tork

unread,

Jan 27, 2025, 3:54:01 PM1/27/25

to dartR

Hello Sean,

I was wondering if you gained any insights about the differences observed for He/Hs across dartR and hierfstat. My situation is almost identical to yours, with the exception that FIS estimates also differ very slightly comparing the two packages (from 0.001 - 0.008).