Error found in gl.report.heterozygosity

179 views
Skip to first unread message

stephan...@gmail.com

unread,
Mar 10, 2021, 2:37:06 AM3/10/21
to dartR
Hello,
I thought id let you know about a couple of mistakes I found in the gl.report.heterozygosity function that result in incorrect output. 

for example:

 # Calculate heterozygosity for each population in the list
  Ho <- data.frame(lapply(sgl, function(x) mean(colMeans(as.matrix(x, na.rm=TRUE)==1), na.rm=TRUE) ))
>> The na.rm=TRUE argument should be for colMeans() instead of for as.matrix(). Currently it fails to include loci with any missing data in Ho calculations.

There are a couple of other similar problems with the way the number of loci and the number of individuals are calculated - see attached.

I have only checked the method = pop option, not method = ind

let me know if you have any questions.

Cheers,
Stephanie

stephan...@gmail.com

unread,
Mar 10, 2021, 2:44:31 AM3/10/21
to dartR
Apologies I couldnt seem to attach the edited code file, so here is a link to it: https://drive.google.com/file/d/140cSlwFNAuX2g_wwNpVXVlGTSuV2CaFU/view?usp=sharing

Jose Luis Mijangos

unread,
Mar 13, 2021, 3:31:12 AM3/13/21
to dartR
Hi Stephanie,

Thank you very much for taking the time in checking the function. This problem is already solved in the latest release of dartR. By the way, in the latest release many cool functions were added.
Users that wish to update dartR, can type:

> install.packages("dartR")

It would be a good idea to restart your R session after the update.

I found that you actually reported the same problem back in May 2020, see link below :-)


Thank you,
Luis

Stephanie Todd

unread,
Mar 23, 2021, 9:18:46 PM3/23/21
to da...@googlegroups.com
Ahaha whoops, I have a terrible memory!! Apologies for the double up, and thanks for letting me know about the update ill check it out

--
You received this message because you are subscribed to a topic in the Google Groups "dartR" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dartr/3zyXXoh3U98/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dartr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dartr/95ddd721-5344-4c67-b29c-57d2cbddc9f2n%40googlegroups.com.

Gabriella Scatà

unread,
Aug 30, 2021, 1:36:23 AM8/30/21
to dartR
Hi,
I am new to SNPs filtering and to DartR...but I have a big doubt on the calculations made by the following functions: report.heterozygosity, filter.heterozygosity and filter & report.HHW.
In the "filter.hw" function description for example, in the PDF manual, it says "Input is a genlight adegenet object containing SNP genotypes (0 homozygous for reference SNP, 1
heterozygous, 2 homozygous for alternate SNP)."
However, when I look at the "metadata" info from the SNPs dataset obtained from DarT the heterozygote is coded as "2" not "1" (from DarT metadata sheet info: "description: SNP 1 Row Mapping Format: "0" = Reference allele homozygote, "1" = SNP allele homozygote, "2"= heterozygote and "-" = double null/null allele homozygote (absence of fragment with SNP in genomic representation)".

I have found a similar problem in the function filter./repor.heterozygosity: the fucntion source code calculates individuals observed heterozygosity as "c.hets[i] <- sum(m[i,]==1,na.rm=TRUE)/(nInd(x)-c.na[i])" where i from 1 to nInd is the number of individuals in the dataset...

It just seems like there is an inconsistency in the genotypes coding between the dataset rpovided by DarT and the DartR functions...
Could someone help me understand what I am missing?
Thanks a lot
Gabriella

Jose Luis Mijangos

unread,
Aug 30, 2021, 2:46:56 AM8/30/21
to dartR

Hi Gabriella,

You are right in your observation: DArT uses 0 for the reference homozygote, 1 for SNP homozygote and 2 for heterozygote. Conversely, dartR uses 0 for the reference homozygote, 1 for heterozygote and 2 for SNP homozygote.
When you read your DArT report into dartR using for example the function gl.read.dart(), the numbers are for each genotype are changed, this is so for at least two reasons:
(a) It is the coding schema used by the package Adegenet, so it is difficult to move from this while maintaining access to all the Adegenet features;
(b) It allows summing the codes to obtain the frequency of SNP's, which simplifies greatly the computational side of things.

This information is documented in the manuals and vignettes of dartR.

Please let us know whether this information solved your question.

Cheers,
Luis

Gabriella Scatà

unread,
Aug 31, 2021, 9:32:37 PM8/31/21
to dartR
Hi Luis,
thank you so much for this explanation. I realized I had read about how the data is re-coded in the genlight object but I had totally forgot about it....sorry about that!
Thanks a lot for your explanation!

Anyways, I have another question about the individual heterozygosity values reported by the 2 functions filter.heterozygosity & report.heterozygosity.
In my data, the histogram for report.heterozygosity (method=ind) gives an histogram with values from 0.08 to 0.016 on the x-axis, but if I set as thresholds 0.10 and  0.15 (within the range of the reported values from the report.heterozygosity function) for the filter.heterozygosity function then i get the error "subscript out of bound" and the function tells me min individual heterozygosity is 3.5 and max is 9, so very different values.

I looked up the source code for the 2 functions and I  noticed that:
- in report.heterozygosity: individual heetrozygosity is calculated as % of heterozygous loci for each individual over all individuals dived by the number of loci (" c.hets[i] <- sum(m[i,]==1,na.rm=TRUE)/(nLoc(x)-c.na[i]) ")
- in filter.heterozygosity: individual heterozygosity is calculated as % of heterozygous loci for each individual over all individuals BUT divided by the number of individuals (" c.hets[i] <- sum(m[i,]==1,na.rm=TRUE)/(nInd(x)-c.na[i])" )

How come there is this difference in the calculation? Shouldn't the 2 function report the same thing or am I missing something?
Thanks a lot!
Gabriella
Reply all
Reply to author
Forward
0 new messages