Expected heterozygosity: --hardy vs --freq

12 views
Skip to first unread message

Gabriele Sgarlata

unread,
Oct 7, 2025, 5:32:04 AM (12 days ago) Oct 7
to plink2-users
Hi there,

I am interested in estimating expected heterozygosity for each variant site of my dataset.
I tested two options in plink2, since I am also interested in the F coefficient: --hardy or --freq options.

I noticed that the expected heterozygosity, computed as 2*p*(1-p), is different between the two approaches. 

I dug a bit further and figure out that perhaps the differences between the two are due to the fact that --hardy ignores individuals with missing data at a given variant site. Thus, it computes heterozygosity only based on the individuals that are called at that site.
I concluded that --freq does not do the same.

Did I understood correctly?

Thank you,
Gabriele

Chris Chang

unread,
Oct 7, 2025, 9:09:57 AM (11 days ago) Oct 7
to Gabriele Sgarlata, plink2-users
Please post full .log file(s) when asking for troubleshooting help.

In this case, you should also post the .hardy and .afreq output for one variant that illustrates what you’re talking about.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/plink2-users/c907963c-f53e-4d66-ac5b-595b0d156da8n%40googlegroups.com.

Gabriele Sgarlata

unread,
Oct 7, 2025, 10:24:07 AM (11 days ago) Oct 7
to plink2-users
Thank you Chris!

I am posting below the results of the --hardy option (on the left of the red vertical line) and of the --freq option (on the right of the red vertical line), which I imported in R.
I have added three fields: N_tot (HOM_A1_CT + HET_A1_CT + TWO_AX_CT), A1_freq (((2*HOM_A1_CT) + (HET_A1_CT)) / (2*N_tot)), Vx_hardy (2*A1_freq*(1-A1_freq)) and Vx_afreq (2*ALT_FREQS*(1-ALT_FREQS)).

Essentially, Vx_hardy reproduces the expected heterozygosity "E(HET_A1)" obtained from --hardy, whereas Vx_afreq does not reproduce these results.
I suspect that this is due to the fact that --freq include also the individuals with missing genotypes.

I am also sending you the log files of the --hardy and --freq analyses.

Thanks,
Gabriele

Screenshot 2025-10-07 at 15.04.46 (2).png
test_hardy.log
test_AF.log

Chris Chang

unread,
Oct 7, 2025, 10:48:13 AM (11 days ago) Oct 7
to Gabriele Sgarlata, plink2-users
—freq uses dosages when available, —hardy only looks at hardcalls.

Gabriele Sgarlata

unread,
Oct 7, 2025, 10:59:28 AM (11 days ago) Oct 7
to plink2-users
Ok, thanks. 
This clarifies my doubts.

Best,
Gabriele

Reply all
Reply to author
Forward
0 new messages