gl.report.heterozygosity assumes SNP data are from SilicoDArT?

275 views
Skip to first unread message

John Swenson

unread,
Jul 22, 2021, 1:47:14 PM7/22/21
to dartR
Hi all,

I am using dartr to analyze data from ~30,000 SNPs. My process involves importing a vcf file and converting it to a genlight object which I've creatively called genlight

When I try to evaluate heterozygosity with the function gl.report.heterozygosity(genlight), I get the following error:

Error in gl.report.heterozygosity(genlight) : 
    Processing  Presence/Absence (SilicoDArT) data, heterozygosity can only be calculated for SNP data

My data are definitely SNP data and I'm not sure why the function is assuming they're not? I don't see an option to specify the type of data in the function and haven't seen this question answered elsewhere in this group (sorry if I missed it). I would appreciate any insight about how I can convince the function to accept my data.

Many thanks for creating and maintaining this wonderful package.

Sincerely,
John

Arthur Georges

unread,
Jul 22, 2021, 5:18:50 PM7/22/21
to da...@googlegroups.com
Hi John,

Can you try running

genlight <- gl.compliance.check(genlight)
all(ploidy(genlight)==2)

and get back to me.

A

--
You received this message because you are subscribed to the Google Groups "dartR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dartr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dartr/41edcfbe-2fbc-49c3-ac08-b68180f8c888n%40googlegroups.com.

John Swenson

unread,
Jul 30, 2021, 10:46:11 AM7/30/21
to dartR
Hi Arthur,

Apologies for the delay. I just joined this group and apparently I had notifications turned off so didn't see your response before (notifications are on now).

Here's the output when I run those commands, as well as some information about the genlight object:

> genlight2 <- gl.compliance.check(genlight)
Starting gl.compliance.check 
  Processing Presence/Absence (SilicoDArT) data
  Checking coding of Tag P/A data
    Tag P/A data (SilicoDArT) scored 1, 0 (present or absent) confirmed
  Checking locus metrics and flags
  Recalculating locus metrics
  Checking for monomorphic loci
    No monomorphic loci detected
  Checking for individual metrics
    Individual metrics confirmed
  Checking for population assignments
  Population assignments confirmed
Spelling of coordinates checked and changed if necessary
Completed: gl.compliance.check 
> all(ploidy(genlight2)==2)
[1] TRUE

> head(genlight2)
 /// GENLIGHT OBJECT /////////

 // 6 genotypes,  21,101 binary SNPs, size: 3 Mb
 16554 (13.08 %) missing data

 // Basic content
   @gen: list of 6 SNPbin

 // Optional content
   @ind.names:  6 individual labels
   @loc.names:  21101 locus labels
   @chromosome: factor storing chromosomes of the SNPs
   @position: integer storing positions of the SNPs
   @pop: population of each individual (group size range: 6-6)
   @other: a list containing: loc.metrics  loc.metrics.flags  verbose  history 

I don't know why it confirms that the data are P/A when they're definitely not coded as 1s and 0s ... 

Thanks for your time and for any insight you're willing to provide. It would be great to get this program working for my SNP data :) 

Best,
John

Jose Luis Mijangos

unread,
Jul 31, 2021, 4:08:38 AM7/31/21
to dartR
Hi John,

I have updated some functions that might solve the problem. To use these updated functions you can install the developing version of dartR as described below. Note that for many analyses it is necessary that individuals are grouped into populations. You can add this information using a text file. I am attaching an example of this file ("strata.txt), in which rows correspond to individuals and columns to the fields with information for each individual like population, sex, etc. Then you can load this file and link it to genlight object, as described in the code below. Note that for the below code to work, this text file should be placed in your working directory.

Please let us know whether it worked for you.

Cheers,
Luis

# library required to install from github
library(devtools)
# installing developing version of dartR
install_github("green-striped-gecko/dartR@dev")

# if you are working with R-studio restart your session with menu Session -> Restart R

library(dartR)

# reading vcf
your_gl <- gl.read.vcf("your_data.vcf")
# loading information for each individual including population
strata_gl <- read.table(paste0(getwd(),"/strata.txt"))
your_gl$other$ind.metrics <- strata_gl
colnames(your_gl$other$ind.metrics) <-c("INDIVIDUALS" , "pop" ,"sex" ,  "Age")
your_gl <- gl.compliance.check(your_gl)
your_gl <- gl.recalc.metrics(your_gl)
pop(your_gl) <- your_gl$other$ind.metrics$pop

out <- gl.test.heterozygosity(your_gl, nreps=100)
strata.txt

Arthur Georges

unread,
Aug 1, 2021, 1:10:41 AM8/1/21
to dartR
Hi John,

Thank you for trying those two scripts I posted. From what I can see, the compliance check (and other scripts that check for the type of data) are finding only scores of 0 and 1, confirming the assumption that the data are presence absence data. In SNP datasets, one would expect a smattering of 0's (homozygous ref), 1's (heterozygotes) and 2's (homozygous alternate).

Then when the scripts check the ploidy of the dataset after the compliance check, they find ploidy=2, which suggests that the data are SNP data.

These conflicting signals are confusing our scripts.

I am unable to get to the root of the problem without having access to your data. Would you be willing to use saveRDS(genlight,"testset.Rdata") and post testset.Rdata for me to look at? Or send it to geo...@aerg.canberra.edu.au.

Arthur

John Swenson

unread,
Aug 3, 2021, 11:36:09 AM8/3/21
to dartR
Hi Luis and Arthur,

Many thanks for your replies. Interesting that the function is only finding 0s and 1s in the data. This makes me think there may be a formatting issue on my side of things that I've missed ... I'll look into this and see if that's the issue. If it is, then this error just saved me from drawing a bunch of erroneous conclusions! If it's not the issue, then I'll try installing the developer version and adding a population file as Luis suggested. Finally, if that doesn't work, I'll save the RDS file and send it to Arthur. Either way, I'll let you know how it goes :)

Thanks again for your responsiveness and for helping me get to the bottom of this!

Best regards,
John

Bernd.Gruber

unread,
Aug 3, 2021, 6:11:50 PM8/3/21
to da...@googlegroups.com
Hi John

If you run 

table(as.matrix(genlight))

You can see what’s in your imported data. 

Should be zero ,ones and twos in that table. 

Cheers Bernd

---------


On 4 Aug 2021, at 01:36, John Swenson <swenso...@gmail.com> wrote:

Hi Luis and Arthur,
Reply all
Reply to author
Forward
0 new messages