FIS gives inverse relationship with He

39 views
Skip to first unread message

Frederik Van Daele

unread,
Jan 9, 2023, 1:56:16 PMJan 9
to dartR
Dear DartR admin,

When I calculate the relation between the mean inbreeding coëfficient (beta.dosage) and the expected heterozygosity (gl.report.heterozygosity), I get the expected relationship between both variables. However, the FIS as reported by gl.report.heterozygosity (by population) gives an inverse relationship. The expected heterozygosities are however stable between packages. Is it possible that something is coded wrong in the background when calculating FIS with gl.report.heterozygosity?
relationship_FIS_he.png

Thank you for checking!

With kind regards,
Frederik

Jose Luis Mijangos

unread,
Jan 10, 2023, 2:45:05 AMJan 10
to dartR
Hi Frederik,

You are comparing two different things which is understandable because they have similar names.

In your first plot, you are estimating individual inbreeding coefficients which is the probability that two alleles at any locus in an individual are identical by descent and it is estimated at the individual level. Here it is expected a negative correlation between heterozygosity and individual inbreeding coefficients.

In your second plot, you are estimating Wright's inbreeding coefficient (FIS) which is a measure of departure from Hardy–Weinberg proportions within populations, ie the deviation between observed heterozygosity and expected heterozygosity. FIS is calculated at the population level. If you want to estimate FIS using hierfstat, you could use the following code:

> library(dartR)
> library(hierfstat)
> t1 <- platypus.gl
> res <- fs.dosage(as.matrix(t1),pop=pop(t1))

Prof. Bill Sherwin commented:

" Every decade or so, starting with Jaccard in the 1950s, good population geneticists have told us not to confuse:

Hardy-Weinberg expected heterozygosity He (also called gene diversity). Eg for a 2-allele SNP, where p and q are proportions of the two alleles, A1 and A2, Gene diversity/Heterozygosity: He=2pq

With

Inbreeding coefficient (FIS, also measures effect of other things such as selection for or against heterozygotes). Inbreeding coefficient: FIS=1-(Ho/He) where Ho is the observed proportion of heterozygotes.

You can see from the equations that there will not be a simple linear relationship between FIS  and He.  Or you can see the same thing from these two examples of different populations, both with p=q=0.5:

EG1: if there is total inbreeding, with 50% of families having only A1 homozygotes, and the other families only having A2 homozygotes, so that He=0.5, Ho=1 then FIS =0.5.

EG2: Alternatively if all individuals are heterozygotes, He=0.5, Ho=0 then FIS=1.

Those are extreme examples, but they show that with the same He, you can get a very wide range of FIS values. So, it is not reasonable to expect any tight relationship between the two measures – they measure different things, so they behave independently.

MORAL: NEVER say that a population with low He is ‘inbred’ – there may be zero consanguineous matings occurring in this population; there is just low gene diversity – most of the A1 or A2 alleles have been lost by drift (which is NOT inbreeding)."

It is a bit strange that you see a correlation between heterozygosity and FIS in your dataset because it is not expected in "normal" circumstances, as Bill commented above. However, in clonal populations this pattern seems to be usual because these populations accumulate a lot of mutations which results in an accumulation of heterozygosity at all loci (observed heterozygosity is higher than expected heterozygosity and therefore FIS is negative). This accumulation of mutations in clonal populations occurs because once a homozygous site has experienced mutation, it becomes heterozygous and has very little chance of becoming homozygous again (reverse mutation is unlikely). Also in clonal populations, FIS directly reflects the size of the population. See for example:

Koffi, Mathurin, et al. "Population genetics and reproductive strategies of African trypanosomes: revisiting available published data." PLoS neglected tropical diseases 9.10 (2015): e0003985.

Cheers,
Luis

Frederik Van Daele

unread,
Jan 10, 2023, 5:54:30 AMJan 10
to dartR
Dear Luis,

Thank you very much for the insights! I read that negative values of Inbreeding Coefficients and the resulting excess observed heterozygosity could also mean that there was some erronous mapping of specific sites during SNP calling (https://gatk.broadinstitute.org/hc/en-us/articles/360035531992-Inbreeding-Coefficient). Do you perhaps know how I could determine if excess observed heterozygosity is caused by sites with bad mapping or by accumulation of mutations in clonal populations (my species displays clonal propagation)?

Thanks again!

With kind regards,
Frederik

Berry, Olly (NCMI, IOMRC Crawley)

unread,
Jan 10, 2023, 6:21:44 AMJan 10
to da...@googlegroups.com
Hi Frederik,
Less theoretically interesting, but something to check (and you may have already, so apologies if I’m telling you something you know already) is whether the excess Ho is across all individuals in a sample/population or is strong across loci in some individuals.. Sample cross contamination can create Ho excess and it can even appear to have a geographical basis if poor sampling technique was deployed in some pops but not others. The good news is that you can weed out problematic individuals with elevated Ho :-).
Cheers,
Olly

From: da...@googlegroups.com <da...@googlegroups.com> on behalf of Frederik Van Daele <ree...@gmail.com>
Sent: Tuesday, January 10, 2023 6:54 pm
To: dartR <da...@googlegroups.com>
Subject: [dartR] Re: FIS gives inverse relationship with He
 
--
You received this message because you are subscribed to the Google Groups "dartR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dartr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dartr/6990b63b-ad66-4e4c-8c80-048dd3bf1775n%40googlegroups.com.

Jose Luis Mijangos

unread,
Jan 11, 2023, 12:22:29 AMJan 11
to dartR
Hi Frederik,

One of the reasons that loci have higher heterozygosity than expected is when raw reads from different parts of the genome are erroneously call together during the bioinformatic processing because the reads are very similar, either because they are paralogs, repetitive elements or otherwise very much alike. These are infrequent artifacts that might not influence greatly in analyses that use the complete dataset such as genetic structure and mean FIS. In case of DArT data, most of these artifacts are filtered out during their bioinformatic pipelines.

It seems that in your case the excess of heterozygotes is systematic. You could run a PCA and check whether the patterns make sense with the geographic sampling or other biological or ecological feature of your species.

Another possible option would be to filter out loci with high heterozygosity using the filter "filter.excess.het" described in the article below. 
You could also filter out loci based on read depth (gl.filter.rdepth) these kind of artifacts have a very high read depth. If you have DArT data,  you could filter out loci that have very similar sequence tags using the filter "gl.filter.hamming".

Cheers,
Luis
Reply all
Reply to author
Forward
0 new messages