Rare alleles homozygosity

31 views
Skip to first unread message

Manuela Cascini

unread,
May 18, 2024, 7:49:44 PMMay 18
to dartR
Hi Luis,

I would like to calculate the homozygosity of rare alleles in my data set and compare it between groups. 
Could you please suggest a way to do that?
Thank you,
Manuela

Bernd.Gruber

unread,
May 19, 2024, 12:00:22 AMMay 19
to da...@googlegroups.com
Good one 

Just to clarify. quick question do you want the homozgosity if the rare allele only  or homozygosity of the rare loci (which is high due to the low allele frequency 

Cheers Bernd

---------


On 19 May 2024, at 09:49, Manuela Cascini <manuela...@gmail.com> wrote:

Hi Luis,
--
You received this message because you are subscribed to the Google Groups "dartR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dartr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dartr/329b9aff-7808-40cf-8ccf-ba7c53dca911n%40googlegroups.com.

Manuela Cascini

unread,
May 21, 2024, 6:39:14 PMMay 21
to da...@googlegroups.com
Hi Bernd, your question raises a good point.
The initial idea was rare alleles only, but now I'm contemplating whether it's worth looking at rare loci as well.
I am studying two groups of plants from the same species: one subjected to deliberate outcrossing and the other left to open pollination.
Interestingly, while growth data indicate a significant size disparity favoring the outcrossed plants, genetic diversity and inbreeding metrics are comparable between the two groups.
My hypothesis is that variations in homozygosity at rare alleles/loci might elucidate the observed differences in fitness, as measured by growth.
Best,
Manuela

Jose Luis Mijangos

unread,
May 28, 2024, 2:55:33 AMMay 28
to dartR
Hi Manuela,

You can try the code below.

Cheers,
Luis 

library(dartRverse)
library(reshape)
t1 <- platypus.gl
# filtering loci with all missing data
t1 <- gl.filter.allna(t1)
# filtering monomorphs
t1 <- gl.filter.monomorphs(t1)
# recalculating metrics
t1 <- gl.recalc.metrics(t1)
# converting MAF to minor allele count (MAC)
MAC <- t1$other$loc.metrics$maf * (nInd(t1) * 2)
t1$other$loc.metrics$MAC <- MAC
# getting loci with less than 5 alleles
t1_rare <- gl.filter.locmetric(t1,
                               metric = "MAC",
                               upper = 5,
                               lower = 0,
                               keep = "within")
# Calculating heterozygosity
res_div <- utils.basic.stats(t1_rare)
Ho_by_pop <- res_div$Ho
# converting to homozygosity
hom_by_pop <- 1 - Ho_by_pop
# formating data for plotting
plot_hom <- melt(Ho_by_pop)
# plot
ggplot(plot_hom,aes(y= value, color = variable))+
  geom_boxplot()



Manuela Cascini

unread,
May 28, 2024, 4:38:36 AMMay 28
to da...@googlegroups.com
Hi Luis, thank you so much!

I have just tried your script and there are a few things that I am not sure I am interpreting correctly.

  1. Why are the MAC not integers? Because the function is taking missing data into account ? 
  2. About the bit of code # getting loci with less than 5 alleles: Is it filtering for loci that have one of the two alleles occurring less than 5 times in the data set?
  3. The results for the homozygosity by pop showed that most values equal 1. Does that mean that the majority of rare alleles are found in homozygosity? My expectation was that since they are rare, they are less likely to occur in homozygosity. What am I missing?
Thank you




Jose Luis Mijangos

unread,
May 29, 2024, 1:46:33 AMMay 29
to dartR
Hi Manuela,

1. The reason of why MAC are not integers using the code I shared is because MAF has only 4 decimals, we would need to use an infinite number of decimals to get integers... Alternatively, we could just count the number of alleles....
2. Yes!
3. Because heterozygosity/homozygosity is calculated only taking in account loci with rare alleles and it is calculated by population, it is reasonable that homozygosity is 1 in those populations in which the rare allele is not present. Heterozygosity/homozygosity is the probability of two alleles drawn at random from the gene pool are heterozygous/homozygous, it does not provide information about which of the two alleles is heterozygous/homozygous. 

I think what you want to know is the percentage of individuals that are homozygous for the rare allele by population, for that see code below.

Cheers,
Luis 

library(dartRverse)
library(reshape)
t1 <- platypus.gl
# filtering loci with all missing data
t1 <- gl.filter.allna(t1)
# filtering monomorphs
t1 <- gl.filter.monomorphs(t1)
# recalculating metrics
t1 <- gl.recalc.metrics(t1)
# converting MAF to minor allele count (MAC)
MAC <- t1$other$loc.metrics$maf * (nInd(t1) * 2)
t1$other$loc.metrics$MAC <- MAC
# getting loci with less than 5 alleles
t1_rare <- gl.filter.locmetric(t1,
                               metric = "MAC",
                               upper = 5,
                               lower = 0,
                               keep = "within")
# calculate frequencies of the reference and alternative alelles
allele_freq <- gl.allele.freq(t1_rare,by="loc")
 # Get the loci where the rare allele is the reference allele
ref_rare <- locNames(t1_rare)[which(allele_freq$frequency > 0.5 )]
# separating datasets based on whether the rare allele is the reference or the alternative
t1_rare_ref <- gl.keep.loc(t1_rare,loc.list = ref_rare)
t1_rare_alt <- gl.drop.loc(t1_rare,loc.list = ref_rare)
#separating populations
t1_rare_ref_bypop <- seppop(t1_rare_ref)
t1_rare_alt_bypop <- seppop(t1_rare_alt)
#converting to matrix
t1_rare_ref_bypop_mat <- lapply(t1_rare_ref_bypop, as.matrix)
t1_rare_alt_bypop_mat <- lapply(t1_rare_alt_bypop, as.matrix)
#counting the number of individuals that are homozygous for the reference allele
ind_ref <- lapply(t1_rare_ref_bypop_mat, function(y){
  unlist(unname(apply(y,2,function(x){
    length(which(x==0))
  })))
})
#counting the number of individuals that are homozygous for the alternative allele
ind_alt <- lapply(t1_rare_alt_bypop_mat, function(y){
  unlist(unname(apply(y,2,function(x){
    length(which(x==2))
  })))
})
# merging number of individuals that are hmozygous for the alternative and
# the reference allele
ind_all <- mapply("c",ind_ref,ind_alt,SIMPLIFY = F)
# number of individuals per population
nInd_bypop <- as.numeric(table(pop(t1_rare)))
# getting the percentage of individuals that are homozygous
perc_ind <- as.data.frame(mapply("/", ind_all, nInd_bypop))
#mean by pop
perc_ind_mean <- data.frame(hom=colMeans(perc_ind),
                            pop= colnames(perc_ind))
# plot
ggplot(perc_ind_mean,aes(x = pop,y= hom,fill=pop))+
  geom_col() +
  labs(y= "Percentage of individuals that \n are homozygous for the rare allele (MAC < 5)")

Manuela Cascini

unread,
May 29, 2024, 2:29:02 AMMay 29
to da...@googlegroups.com
Thanks Luis, I understand it now. 
I thought that the results for the homozygosity were showing only the genotypes for the alternate allele.
It seems that the new code you sent will do that! 
Thanks again,
Manuela

Reply all
Reply to author
Forward
0 new messages