Hi there,
I am interested in estimating expected heterozygosity for each variant site of my dataset.
I tested two options in plink2, since I am also interested in the F coefficient: --hardy or --freq options.
I noticed that the expected heterozygosity, computed as 2*p*(1-p), is different between the two approaches.
I dug a bit further and figure out that perhaps the differences between the two are due to the fact that --hardy ignores individuals with missing data at a given variant site. Thus, it computes heterozygosity only based on the individuals that are called at that site.
I concluded that --freq does not do the same.
Did I understood correctly?
Thank you,
Gabriele