SNP filtering

192 views
Skip to first unread message

Jean Rodrigue Sangaré

unread,
Oct 9, 2024, 6:44:36 PM10/9/24
to dartR
Hi,
Please how to filter out using R sofware (or another software):
 - unknown  chromosome or  SNP with unkknown positition; 
- loci for which the missing data  is more than 80%
Regards, 
Jean

Jose Luis Mijangos

unread,
Oct 10, 2024, 3:25:37 AM10/10/24
to dartR
Hi Jean,

A good place to start is by reading our tutorials, see particularly tutorial 5:


Here are some examples:

library(dartRverse)
t1 <- platypus.gl
# assign chromosome
t1$chromosome <- t1$other$loc.metrics$Chrom_Platypus_Chrom_NCBIv1
# assign Snp position
t1$position <- t1$other$loc.metrics$ChromPos_Platypus_Chrom_NCBIv1
# Table of chromosomes
# the name of unknown chromosomes (SNPs that were not mapped to a reference genome)
# appear empty in the below table ie ""
table(t1$chromosome)
# get index of loci with unknown chromosomes
unknown_chr <- which(t1$chromosome == "")
# remove loci with unknown chromosomes
t2 <- gl.drop.loc(t1,loc.list = locNames(t1)[unknown_chr])
# check the first 10 SNP positions
# loci with unknown position have a 0 position
head(t1$position,10)
# get index of loci with unknown position
unknown_pos <- which(t1$position == 0)
# remove loci with unknown position
t3 <- gl.drop.loc(t1,loc.list = locNames(t1)[unknown_pos])
# remove loci for which the missing data  is more than 80%
t4 <- gl.filter.callrate(t1,threshold = 0.8)

Cheers,
Luis 

Jean Rodrigue Sangaré

unread,
Oct 21, 2024, 7:08:54 AM10/21/24
to dartR
Dear Luis,
Thank you very much for your response . Now I am able  to do the filtering thanks to the link you gave me. 
Now I would ask for your help again:
1- my study is about more than 500 rice landrace collected from several rice growing regions in Mali. It can happen that in such a collection, several genotypes may have the same name even though they are very different. Similarly, a genotype may have different (even though it's the same genotype) names depending on the locality where it is grown. So I would like to identify such duplicatted genotypes. 
2- When I Converted the  genlight object obtained  into hapmap or vcf format for his using in Tassel for GWAS analysis. Infortunattely Tassels could not open the file.
Thank you.

Jose Luis Mijangos

unread,
Oct 24, 2024, 1:31:23 AM10/24/24
to dartR
Hi,

Please try the code below:

library(dartRverse)
t1 <- platypus.gl
# identifying duplicated individuals
# you might need to use a higher threshold of perc_geno (the mimimum
# percentage of genotypes in which two individuals should be the same)
# if you have lower levels of genetic diversity or higher missing data
dup <- gl.report.replicates(t1,
                            perc_geno = 0.95)
# removing duplicates
t2 <- gl.drop.ind(t1,ind.list = dup$ind.list.drop)

# here is the chromosome information
t1$other$loc.metrics$Chrom_Platypus_Chrom_NCBIv1
# here is the SNP position information
t1$other$loc.metrics$ChromPos_Platypus_Chrom_NCBIv1
#filtering loci with all missing data
t1 <- gl.filter.allna(t1)
# writing VCF file
# plink binary should be in the working directory
gl2vcf(t1,
       plink.bin.path = getwd(),
       outpath = getwd(),
       outfile = "test_vcf",
       snp.pos = "ChromPos_Platypus_Chrom_NCBIv1",
        snp.chr = "Chrom_Platypus_Chrom_NCBIv1"
       )

in Tassel
Menu > File > Open As > Format > VCF > (select Sort Positions) > OK > select file > Open 


Cheers,
Luis 

Jean Rodrigue Sangaré

unread,
Dec 1, 2024, 11:27:31 AM12/1/24
to dartR
Dear Luis 
thank for responses 
The problem has still not been resolved :
I work on rice genome , so I supposed to have 12 chromosome (1 to 12)
but my SNP data has 14 chromosome  image 1
when runing the code for vcf writting this error occred (image 2)
Thanks
image 2.png
image 1.png

Jean Rodrigue Sangaré

unread,
Dec 1, 2024, 5:29:34 PM12/1/24
to da...@googlegroups.com
when I tried again

--
You received this message because you are subscribed to the Google Groups "dartR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dartr+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dartr/62ab3ce9-8835-418b-8dc2-8defdf26f2f4n%40googlegroups.com.
image 3.png

Jose Luis Mijangos

unread,
Dec 1, 2024, 7:28:33 PM12/1/24
to dartR

Hi,

The issue seems to arise because your genlight object was created using dartR instead of dartRverse.

When you open your RStudio session with a genlight object created by dartR in your workspace/environment, RStudio automatically loads dartR, which overrides (or "masks") the functions from dartRverse. This means the functions you run will be from dartR, not dartRverse.

To resolve this issue, you can choose one of the following approaches:

  1. Read your input file using dartRverse functions

    • Clear your workspace.
    • Restart R.
    • Load dartRverse and read your input file (e.g., DArT report or any other format) using dartRverse's dedicated reading functions: gl.read.csv, gl.read.dart, gl.read.fasta, gl.read.PLINK, gl.read.silicodart, or gl.read.vcf.
  2. Change the class of your "genlight" object

    • Convert the class of your genlight object to be compatible with dartRverse as shown in the example code below:
Cheers,
Luis 

library(dartRverse)
# load genlight object created using dartR
t1 <- gl.load("test.rds")
# assign the class "dartR" to genlight object
class(t1) <- "dartR"
# saving "dartR" object
gl.save(t1,"test2.rds")
# 1. Clean your workspace: Menu > Session > Clear Worspace
# 2. Restart R: Menu > Session > Restart R
# 3. Load dartRverse
library(dartRverse)
# load dartR object
t2 <- gl.load("test2.rds")
# convert to VCF
gl2vcf(t2,

       plink.bin.path = getwd(),
       outpath = getwd(),
       outfile = "test_vcf",
       snp.pos = "ChromPosTag_Rice_RGAP_v7",
       snp.chr = "Chrom_Rice_RGAP_v7")
# read VCF
t3 <- gl.read.vcf("test_vcf.vcf")

Jean Rodrigue Sangaré

unread,
Dec 5, 2024, 6:52:50 PM12/5/24
to da...@googlegroups.com
Dear Luis 
thant you for you help . Now I converted the genlight object to a vcf file. but When try to load in tassels 5 following error message occurs  : java.lang.ArrayIndexOutofBoundsException
Regards

Tassel error message.png

Jose Luis Mijangos

unread,
Dec 9, 2024, 4:51:33 PM12/9/24
to dartR
Hi,

I didn't have any issues reading the vcf file into Tassel. See code below

Cheers,
Luis 

library(dartRverse)
t1 <- gl.load("finalimp.Rdata")
# convert to VCF

gl2vcf(t1,
       plink.bin.path = getwd(),
       outpath = getwd(),
       outfile = "test_vcf",
       snp.pos = "ChromPosTag_Rice_RGAP_v7",
       snp.chr = "Chrom_Rice_RGAP_v7")
# Open in Tassel 5 Version 20240820 (20240820)
# Tassel > Menu > File > Open as > Format > VCF > select "Sort Positions" > OK > Select "test_vcf.vcf" > Open

Reply all
Reply to author
Forward
0 new messages