gl2vcf - error?

293 views
Skip to first unread message

Chiara Duijser

unread,
May 16, 2023, 1:39:03 AM5/16/23
to dartR
Hi everyone,

I want to investigate the admixture coefficients between 4 populations in my dataset and plot those in a bar graph. My genlight object has 1920 SNPs after filtering, 23 individuals and 4 populations.

I have used the functions within the dartR package for almost everything, to calculate Fst values, network analysis, etc. But to my knowledge it doesn’t have a function to do admixture analysis. Am I right? Therefore, I converted my genlight object into a geno format to use the snmf function (sparse Non-Negative Matrix Factorization; Frichot et al., 2014) within the R Bioconductor package LEA (Frichot, 2015) to estimate the number of genetic clusters within the dataset. However, the results differ quite a lot between a regularization parameter (alpha) of 100 or 1000 within the snmf function. Because there’s no specific rule of choosing alpha I want to compare the admixture coefficients with a likelihood-based method like ADMIXTURE.

Unfortunately I’m having some trouble converting my genlight object into a vcf format that I need for imput in admixture. I’m using:

gl2vcf(gl5, plink_path = getwd(), outfile = "gl_vcf", outpath=getwd())

I have downloaded the Plink.exe files and placed those in my working directory but the gl2vcf() function gives me the error Error in system(..., intern = T) : “[it states the working directory here]” not found. Despite this error I do see two newly created files: gl_plink_temp.map and gl_plink_temp.ped in my working directory file, so it looks like it can find the working directory. For admixture analysis I believe I need the *.bed, *.bim and *.fam files. I was just wondering if this might be an error in gl2vcf function of if I'm doing something wrong here?

Thanks in advance,
Chiara

Jose Luis Mijangos

unread,
May 16, 2023, 3:19:34 AM5/16/23
to dartR
Hi Chiara,

Can you please try the updated version of the function using the developing version of dartR as shown below. Also, see two other alternative options for your analysis. . 

library(dartR)
# solution 1: Run structure
out_struc <- gl.run.structure(bandicoot.gl[,1:100], k.range = 2:5, num.k.rep = 3, exec = "./structure", noadmix=FALSE)
out_evanno <- gl.evanno(out_struc)
qmat <- gl.plot.structure(out_struc)
gl.map.structure(qmat, bandicoot.gl[,1:100],K=3)

# solution 2: convert to LEA format
gl2geno(testset.gl,outpath = getwd())

# soution 3: install developing version of dartR
devtools::install_github("green-striped-gecko/dartR@dev")
library(dartR)
gl2vcf(platypus.gl,outpath  = getwd())

Cheers,
Luis 

David Tork

unread,
May 17, 2023, 1:14:42 PM5/17/23
to da...@googlegroups.com
Hi Chiara,

I recently encountered the same error -- see here: https://groups.google.com/g/dartr/c/g-f7hBPLwyw/m/HjPpam0xAwAJ

In my case, the error was caused by a space in the filepath. If you still want to use gl2vcf(), I would recommend removing spaces (and maybe special characters) and see if that helps.

David

--
You received this message because you are subscribed to the Google Groups "dartR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dartr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dartr/dbeb5956-9923-4bc0-b7ad-dcb05fba259cn%40googlegroups.com.


--
Researcher
M.S. Plant Breeding and Molecular Genetics
Dept. of Horticultural Science, University of Minnesota

Chiara Duijser

unread,
May 18, 2023, 6:00:45 AM5/18/23
to dartR
Hi Luis & David,

Thank you for the suggestions and help! I will try the alternatives and will also see if the developer version of dartR overcomes the issue. 
It's much appreciated!

Kind regards,
Chiara

Op donderdag 18 mei 2023 om 03:14:42 UTC+10 schreef tork...@umn.edu:

Chiara Duijser

unread,
Jun 13, 2023, 5:33:01 AM6/13/23
to dartR

Hi Louis,


I have used gl.run.structure to assess the number of genetic clusters present in my dataset. I was wondering why you specifically select [,1:100] in your example with bandicoot.gl. Does this mean that you only want to look at the first 100 loci or is this just for the example to save computation time? 


The reason I ask is I tried for my own dataset (say gl5) with gl5 and gl5[,1:100] and with the same k.range and num.k.rep I do get different results when I plot the structure result.


out_struc <- gl.run.structure(gl5[,1:100], k.range=2:5, num.k.rep=10, exec="./structure/structure", noadmix=FALSE)

out_evanno <- gl.evanno(out_struc, plot.out=TRUE) #check deltaK for correct estimation of number of clusters (Evanno et al., 2005)

qmat <- gl.plot.structure(out_struc, colors_clusters=c("#A6CEE3", "#1F78B4", "#33A02C"), K=3)

gl.map.structure(qmat, gl5[,1:100], K=3)


Thank you!


Kind regards,

Chiara


Op dinsdag 16 mei 2023 om 17:19:34 UTC+10 schreef luis.m...@gmail.com:

Jose Luis Mijangos

unread,
Jun 13, 2023, 3:36:40 PM6/13/23
to dartR
Hi Chiara,

Yes, I used the first 100 loci as an example; ideally, you would use your complete dataset. Also, note that the default values of the function are intended to be used for testing purposes and not for a final run. Put special attention to the parameters k.range, num.k.rep, burning, numreps and noadmix, You can find more information about these parameters in the Structure manual (attached). 

Cheers,
Luis 

structure_doc.pdf
Reply all
Reply to author
Forward
0 new messages