
The expected heterozygosity is calculated based on Hardy-Weinberg Equilibrium: 2pq. Since the max value of p and q (frequencies of the two alleles at a nucleotide position) are 0.5 each, the maximum value for expected heterozygosity would be 2 * 0.5 * 0.5 = 0.5. Here is a random slide from Google (https://slideplayer.com/slide/15034095/), one of many:

If you have collapsed paralogous loci, you might have an observed heterozygosity that is higher than expected (nucleotide positions that are being called as SNPs but are really fixed differences between two or more collapsed loci). You might consider applying the --max-obs-het filter in the populations program to remove these potentially confounded SNPs.
From:
stacks...@googlegroups.com <stacks...@googlegroups.com> on behalf of Peter G <pgray...@gmail.com>
Date: Wednesday, October 20, 2021 at 8:00 AM
To: Stacks <stacks...@googlegroups.com>
Subject: [stacks] Heterozygosity data
I feel this topic may have come up before but im unable to find anything from the search that covers it.
After using Stacks, we have been pulling the Heterozygosity data from sumstats.tsv and comparing observed against expected - distributions for 3 species shown below

Im wondering why the Expected data are restricted within 0.09(ish) and 0.5, and if we should be doing something with the observed to have it just in these bounds too?
We've been pulling every data point from within the sumstats.tsv file, but perhaps we should be conditionally filtering out some rows first?
Any advice would be appreciated
--
Stacks website: http://catchenlab.life.illinois.edu/stacks/
---
You received this message because you are subscribed to the Google Groups "Stacks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
stacks-users...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/stacks-users/049b185e-2ab6-4f4d-9689-f2092b464bfbn%40googlegroups.com.