FIS gives inverse relationship with He

174 views
Skip to first unread message

Frederik Van Daele

unread,
Jan 9, 2023, 1:56:16 PM1/9/23
to dartR
Dear DartR admin,

When I calculate the relation between the mean inbreeding coëfficient (beta.dosage) and the expected heterozygosity (gl.report.heterozygosity), I get the expected relationship between both variables. However, the FIS as reported by gl.report.heterozygosity (by population) gives an inverse relationship. The expected heterozygosities are however stable between packages. Is it possible that something is coded wrong in the background when calculating FIS with gl.report.heterozygosity?
relationship_FIS_he.png

Thank you for checking!

With kind regards,
Frederik

Jose Luis Mijangos

unread,
Jan 10, 2023, 2:45:05 AM1/10/23
to dartR
Hi Frederik,

You are comparing two different things which is understandable because they have similar names.

In your first plot, you are estimating individual inbreeding coefficients which is the probability that two alleles at any locus in an individual are identical by descent and it is estimated at the individual level. Here it is expected a negative correlation between heterozygosity and individual inbreeding coefficients.

In your second plot, you are estimating Wright's inbreeding coefficient (FIS) which is a measure of departure from Hardy–Weinberg proportions within populations, ie the deviation between observed heterozygosity and expected heterozygosity. FIS is calculated at the population level. If you want to estimate FIS using hierfstat, you could use the following code:

> library(dartR)
> library(hierfstat)
> t1 <- platypus.gl
> res <- fs.dosage(as.matrix(t1),pop=pop(t1))

Prof. Bill Sherwin commented:

" Every decade or so, starting with Jaccard in the 1950s, good population geneticists have told us not to confuse:

Hardy-Weinberg expected heterozygosity He (also called gene diversity). Eg for a 2-allele SNP, where p and q are proportions of the two alleles, A1 and A2, Gene diversity/Heterozygosity: He=2pq

With

Inbreeding coefficient (FIS, also measures effect of other things such as selection for or against heterozygotes). Inbreeding coefficient: FIS=1-(Ho/He) where Ho is the observed proportion of heterozygotes.

You can see from the equations that there will not be a simple linear relationship between FIS  and He.  Or you can see the same thing from these two examples of different populations, both with p=q=0.5:

EG1: if there is total inbreeding, with 50% of families having only A1 homozygotes, and the other families only having A2 homozygotes, so that He=0.5, Ho=1 then FIS =0.5.

EG2: Alternatively if all individuals are heterozygotes, He=0.5, Ho=0 then FIS=1.

Those are extreme examples, but they show that with the same He, you can get a very wide range of FIS values. So, it is not reasonable to expect any tight relationship between the two measures – they measure different things, so they behave independently.

MORAL: NEVER say that a population with low He is ‘inbred’ – there may be zero consanguineous matings occurring in this population; there is just low gene diversity – most of the A1 or A2 alleles have been lost by drift (which is NOT inbreeding)."

It is a bit strange that you see a correlation between heterozygosity and FIS in your dataset because it is not expected in "normal" circumstances, as Bill commented above. However, in clonal populations this pattern seems to be usual because these populations accumulate a lot of mutations which results in an accumulation of heterozygosity at all loci (observed heterozygosity is higher than expected heterozygosity and therefore FIS is negative). This accumulation of mutations in clonal populations occurs because once a homozygous site has experienced mutation, it becomes heterozygous and has very little chance of becoming homozygous again (reverse mutation is unlikely). Also in clonal populations, FIS directly reflects the size of the population. See for example:

Koffi, Mathurin, et al. "Population genetics and reproductive strategies of African trypanosomes: revisiting available published data." PLoS neglected tropical diseases 9.10 (2015): e0003985.

Cheers,
Luis

Frederik Van Daele

unread,
Jan 10, 2023, 5:54:30 AM1/10/23
to dartR
Dear Luis,

Thank you very much for the insights! I read that negative values of Inbreeding Coefficients and the resulting excess observed heterozygosity could also mean that there was some erronous mapping of specific sites during SNP calling (https://gatk.broadinstitute.org/hc/en-us/articles/360035531992-Inbreeding-Coefficient). Do you perhaps know how I could determine if excess observed heterozygosity is caused by sites with bad mapping or by accumulation of mutations in clonal populations (my species displays clonal propagation)?

Thanks again!

With kind regards,
Frederik

Berry, Olly (NCMI, IOMRC Crawley)

unread,
Jan 10, 2023, 6:21:44 AM1/10/23
to da...@googlegroups.com
Hi Frederik,
Less theoretically interesting, but something to check (and you may have already, so apologies if I’m telling you something you know already) is whether the excess Ho is across all individuals in a sample/population or is strong across loci in some individuals.. Sample cross contamination can create Ho excess and it can even appear to have a geographical basis if poor sampling technique was deployed in some pops but not others. The good news is that you can weed out problematic individuals with elevated Ho :-).
Cheers,
Olly

From: da...@googlegroups.com <da...@googlegroups.com> on behalf of Frederik Van Daele <ree...@gmail.com>
Sent: Tuesday, January 10, 2023 6:54 pm
To: dartR <da...@googlegroups.com>
Subject: [dartR] Re: FIS gives inverse relationship with He
 
--
You received this message because you are subscribed to the Google Groups "dartR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dartr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dartr/6990b63b-ad66-4e4c-8c80-048dd3bf1775n%40googlegroups.com.

Jose Luis Mijangos

unread,
Jan 11, 2023, 12:22:29 AM1/11/23
to dartR
Hi Frederik,

One of the reasons that loci have higher heterozygosity than expected is when raw reads from different parts of the genome are erroneously call together during the bioinformatic processing because the reads are very similar, either because they are paralogs, repetitive elements or otherwise very much alike. These are infrequent artifacts that might not influence greatly in analyses that use the complete dataset such as genetic structure and mean FIS. In case of DArT data, most of these artifacts are filtered out during their bioinformatic pipelines.

It seems that in your case the excess of heterozygotes is systematic. You could run a PCA and check whether the patterns make sense with the geographic sampling or other biological or ecological feature of your species.

Another possible option would be to filter out loci with high heterozygosity using the filter "filter.excess.het" described in the article below. 
You could also filter out loci based on read depth (gl.filter.rdepth) these kind of artifacts have a very high read depth. If you have DArT data,  you could filter out loci that have very similar sequence tags using the filter "gl.filter.hamming".

Cheers,
Luis

Manuela Cascini

unread,
Oct 20, 2023, 3:33:36 AM10/20/23
to dartR
Dear Luis,

I am interested in checking relatedness between individuals using the beta.dosage function in hierfstat package. 

Ideally, I would like calculate it separately for each population. 

Could you please suggest a way to run it in a loop for each of the pop in my genlight dataset? This may be trivial, but I could not come up with a useful code.

Best,
Manuela

Jose Luis Mijangos

unread,
Oct 20, 2023, 4:03:16 PM10/20/23
to dartR
Hi Manuela,

Check code below:

library(dartR)
library(hierfstat)
t1 <- gl.filter.callrate(platypus.gl,threshold = 1)
t2 <- seppop(t1)
t3 <- lapply(t2,as.matrix)
res <- lapply(t3, beta.dosage,inb=TRUE,Mb=FALSE,MATCHING=FALSE)

Cheers,
Luis 

Manuela Cascini

unread,
Oct 23, 2023, 8:41:45 PM10/23/23
to da...@googlegroups.com
Thanks Jose, it works well!

Cheers,
Manuela

--
You received this message because you are subscribed to a topic in the Google Groups "dartR" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dartr/nNC5__wYg1o/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dartr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dartr/9b551805-df4b-4699-8f3e-6811985b2ddan%40googlegroups.com.

Manuela Cascini

unread,
Nov 5, 2023, 7:13:56 PM11/5/23
to da...@googlegroups.com
Hi Luis, 

I am trying to get a better understanding of the relationship between FIS and IBD kinship.

I am working on a dataset showing moderate-high identity by descent kinship between most individuals at each site (k between 0.125 and 0.35) but negative FIS. 
I would expect subpopulations with moderate-high relatedness to have some degree of inbreeding and a positive FIS. 
How can I explain the opposite pattern (moderate kinship and low FIS)?

I would really appreciate it if you could share your insights on this. 

Regards,
Manuela

Carlo Pacioni

unread,
Nov 5, 2023, 9:49:29 PM11/5/23
to da...@googlegroups.com
Hi Manuela!
one situation where that is possible in my mind is when you have immigrants from a different genetic cluster in your dataset. Immigrants will have alleles that are not commonly present in the 'resident' individuals and will be considered related by most analyses (because the probability of two individuals to have independently rare alleles is low if these are truly coming from a panmictic pop) but your observed Ho is generally higher than expected because these rare alleles didn't have enough time to mix in the pop and most individuals (and definitely the offspring of the immigrants X resident individuals) will be heterozygous at these loci.

I'm sure Luis will have more to add.

Ciao,
carlo



You received this message because you are subscribed to the Google Groups "dartR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dartr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dartr/CAA9bSb6-EDsRYv_tRZvZfCPzad-8ivhkLvj--5EuGKrD52Ezng%40mail.gmail.com.

Manuela Cascini

unread,
Nov 6, 2023, 9:19:31 PM11/6/23
to da...@googlegroups.com, carlo....@gmail.com
Ciao Carlo,

Thank you for your reply, I really appreciate it.
I should have mentioned (sorry) that the species in question is Isopogon fletcheri, a fire-tolerant plant that can resprout after fire.
One other thing to consider is that I calculated kinship and FIS separately for each subpopulation. 
Do you think the immigrant hypothesis can still hold despite this, and considering short distance seed dispersal?

Best,
Manuela


Message has been deleted

Manuela Cascini

unread,
Nov 8, 2023, 11:55:44 PM11/8/23
to da...@googlegroups.com, Jose Luis Mijangos
Hi Luis,

Thank you so much for this. 

Would you be able to send me the lines of code you used to produce your estimates? So I can try to reproduce them.

Yes, the small sample size can explain my odd results. I used the function "basicStats" from the package "diveRsity" which may not be suitable for small sample size. The manual does not mention correction.

Best,
Manuela


Il giorno gio 9 nov 2023 alle ore 15:27 Jose Luis Mijangos <luis.m...@gmail.com> ha scritto:
Hi Manuela,

Based on the data you sent me, I found that FIS is actual positive (see attached). 
- What method did use to estimate FIS?
- You have a small sample size, heterozygosity estimates are not accurate with such a small sample sizes, There should be at least ten individuals per population.
- In dartR, heterozygosity estimates have a correction for sample size. 

Cheers,
Luis 

Jose Luis Mijangos

unread,
Nov 13, 2023, 12:40:09 AM11/13/23
to dartR

Hi Manuela,

 

Based on the data you sent me, two of your populations (Mount_Solitary and Waterfall_Creek) are very different from the other populations. They might be different subspecies.

 

That’s one of the reasons you have a lot of missing data in your dataset. This missing data is due to null alleles arising from a mutation at one or both restriction enzyme recognition sites. The species has a low genetic variation that, together with the filtering you are using (MAF, call rate, and reproducibility), amplifies the effect you observe in FIS.

 

One usually filters by call rate, reproducibility, and MAF to remove potential sequencing artefacts; however, what is driving missing data and allele frequencies in your dataset is the large genetic differentiation between populations.

 

Some ideas about what you could do:

  1. Perform a fixed difference analysis to investigate genetic divergence between populations (see this tutorial).

TechNote_fixed_difference_analysis_25-Feb-22.pdf (biomatix.org)

  1. Ask DArT to re-analyze your dataset separately for each group of genetically different populations.
  2. SNPs reported by DArT are already filtered, so for many applications, it is not necessary to filter the dataset.
  3. SilicoDart data might be useful to perform phylogenetic analyses.

 

Cheers,

Luis

Manuela Cascini

unread,
Nov 13, 2023, 2:32:57 PM11/13/23
to da...@googlegroups.com
  Hi Luis,

Thank you so much for your great suggestions!

I ran a fixed difference analysis and all the p-values are exactly zero after agglomeration (see R screenshot below). 

In D5 I agglomerated Anvil Rock to Birrabang_Walls+ because the fixed differences between the two pops were only 2. 
I was expecting the fd between the merged Anvil Rock+Birrabang_Walls+ vs Mount_Solitary to increase in D5, instead it is still 32. Do you know why?

I understand that in allopatry a fixed difference analysis with significant p-values is not enough for a subspecies status. 

I will perform the analysis you suggested in point 4 and see if it can provide additional support to the subspecies hypothesis.

Thanks again,
Manuela


image.png




Reply all
Reply to author
Forward
0 new messages