genlight (dartR) to Plink for Admixture (.bed, .fam and .bim)

942 views
Skip to first unread message

Gabriella Scatà

unread,
Oct 5, 2021, 9:20:43 PM10/5/21
to dartR
Hi all,
I am trying to use the SNPs dataset filtered with dartR (a genlight format) for population cluster analysis with the Admixture software
Admixture input is a PLINK (.bed) file, but when I try the "gl2plink" function I obtain only a dataframe and it does not seem to be a Plink class object or .bed file, and I also seem to lose all the individuals metadata info including population of origin (I also get weird numbers in the marker data, such as -9 along with the normal 0, 1, and 2).

Plus, Admixture needs the main input file as a .bed Plink format, and 2 associated files the associated .bim (binary marker information file) and .fam (pedigree stub file), and I haven’t found a way to obtain those extra files yet in dartR or with any other R package (i.e. radiator) or softwares. 
I need a way to convert the genlight object into these 3 file formats required by Admixture (.bed, .fam, .bim).
Any idea?
Thanks a lot as usual for all your help!
Cheers
Gabriella

Bernd.Gruber

unread,
Oct 5, 2021, 10:26:32 PM10/5/21
to da...@googlegroups.com

Hi Gabriela,

 

Not sure if it helps, but Admixture is also reading Eigenstrat format and it is fairly easy to convert genlight into the Eigenstrat format.

 

Here is an example from the write_snp function of the genio package.

 

I do not have admixture installed, if you manage to use the eigenstrat format, can you please let me know and then we can include a conversion function

 

gl2eigenstrat (gl2admixture) so we have a conversion that works.

 

Here the link to the write_snp function (it has also an example that shows the structure of the tibble):

 

https://rdrr.io/cran/genio/man/write_snp.html

 

 

Cheers, Bernd

 

P.s. Plink itself is a fairly complicated format with all it family structure etc. but we may have a look to create a fully working plink output in dartR, but this might take some time.

--
You received this message because you are subscribed to the Google Groups "dartR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dartr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dartr/e925514c-1324-4785-8f9d-d5ef1d081c8en%40googlegroups.com.

Gabriella Scatà

unread,
Oct 7, 2021, 8:56:17 PM10/7/21
to dartR
Hi Bernd,
I have looked at the write_snp function you suggested, but it does not seem to convert genlight into a Eigenstrat format. 
I also contacted Alejandro Ochoa, who developed the packaged "genio", and he confirmed that write_snp" cannot accept a genlight object as input.

If I manage to convert genlight into Eigenstrat (.geno) format I can try to use that format file in Admixture and let you know how it goes. However, I read in the Admixture manual that if you have a .geno file, you need to also have an accompanying .map file (a variant information file accompanying the main file,  usually a .ped file - https://www.cog-genomics.org/plink/1.9/formats#map).

So overall, I am still really confused on how to obtain the proper format files for Admixture..I do not have chromosome location info for the loci, but only SNP genotype info for each individual and the individual metadata (population of belonging and sex).

I also have a major question on the format of the genlight object: 
In the dartR Tutorial, it states that "A genlight object can be considered to be a matrix containing the SNP data encoded in a particular way. The matrix entities (rows) are the individuals, and the attributes (columns) are the SNP loci. In the body of this individual x locus matrix are the SNP data, coded as 0 for homozygous reference state, 1 for heterozygous, and 2 for homozygous alternate (or SNP) state.

However, when I look at the structure of the genlight object, i see this:

> str(mySNPs)
Formal class 'genlight' [package "adegenet"] with 12 slots
  ..@ gen       :List of 188
  .. ..$ :Formal class 'SNPbin' [package "adegenet"] with 5 slots
  .. .. .. ..@ snp    :List of 2
  .. .. .. .. ..$ : raw [1:4838] 00 08 00 00 ...
  .. .. .. .. ..$ : raw [1:4838] 4c 09 42 c2 ...
  .. .. .. ..@ n.loc  : int 38693
  .. .. .. ..@ NA.posi: int [1:3120] 49 93 143 176 213 226 290 319 363 392 ...
  .. .. .. ..@ label  : NULL
  .. .. .. ..@ ploidy : int 2

So, for individual [[1]] - which I assume is the first object of class S4- SNPbin, there are 2 lists with the SNP values...but in these lists the SNPs values are not just 0, 1 o2 2...but there are other numbers such as 4 and 8...and even letters...!

> mySNPs@gen[[1]]@snp
[[1]]
   [1] 00 08 00 00 00 00 00 00 82 10 86 01 00 00 20 00 00 00 00 80 00 05 28 02 00 01 02 00 00 00 00 00 08 20 00 0e 00 18
  [39] 00 04 00 90 40 30 21 00 40 80 20 00 81 00 00 01 00 40 12 00 08 00 01 00 00 00 00 04 60 01 00 8c 00 20 42 00 40 00
etc...

[[2]]
   [1] 4c 09 42 c2 04 02 08 02 f3 11 86 01 20 08 30 01 08 00 00 85 80 4d 2a 02 00 29 02 a0 00 01 00 44 08 27 e0 0e 48 58
  [39] 01 06 20 90 54 31 63 00 40 90 60 10 e1 00 14 81 02 41 16 00 08 01 01 05 01 04 02 07 64 21 14 bc 02 28 62 80 50 88
  etc...


What does this mean? Is it in these lists that the genotype for each individual for each locus is coded? And how is it coded if not in the form 0,1,2 ?
I am really confused at this point so I would really appreciate some clarifications.
Thanks a lot!
Gabriella

Arthur Georges

unread,
Oct 7, 2021, 9:35:06 PM10/7/21
to dartR
In the dartR Tutorial, it states that "A genlight object can be considered to be a matrix containing the SNP data encoded in a particular way. The matrix entities (rows) are the individuals, and the attributes (columns) are the SNP loci. In the body of this individual x locus matrix are the SNP data, coded as 0 for homozygous reference state, 1 for heterozygous, and 2 for homozygous alternate (or SNP) state.

Just on that point Gabriella. The genlight object as described above looks like that if you use the adegenet accessors. It is not actually like that, but rather a very compact binary representation. If you poke into the genlight object without using the adegenet accessors, you will see the binary representation, which I expect would be very confusing.

Basically, if you jump outside of dartR and adegenet, the results can be unpredictable. Have a look at as.matrix(gl)[1:5,1:10] to see how the SNP coding works.

Leave it to Bernd to address the other issues.

Arthur

Bernd.Gruber

unread,
Oct 7, 2021, 10:51:06 PM10/7/21
to da...@googlegroups.com

Hi Gabriela,

 

I guess Arthur already explained the coding issue. Basically genlight object are compressed into byte format (to be efficient) and you need to use the as.matrix function to access the snps in the 0/1/2/NA format.

 

as.matrix(yourgl)[1:3,1:7]

 

 

for the first 3 individuals and 7 loci.

 

Then the problem with the plink/eigenstrat. I will have a look into it and aim to use the genio write_snp function, but I am really confused (and googling I did not find as well), why admixture needs to have the genomic position. In their tutorial they recommend to filter for LD to make sure loci are as unlinked as possible so this cannot be the reason and the introduction claims it is similar to STRCUTURE which certainly does not use the genomic positon information either. Therefore my “feeling” is that the information in the snp file that codes for the chromosome position is not used.

 

I will try to create some example input files and will use different randomw values here for the chromosome position, but if someone could shed some light here that would be good.

 

Basically the question is: does admixture use the chromosomal position in the simulation?

 

 

Finally any reason why you want to use admixture (running STRUCTURE is now fairly simple (if you are using Linux or Windows) from within dartR (though the tutorial is not done yet, because some of the structure function rely on another package which is about to be upgraded).

 

Final suggestion if speed is an issue is to use faststructure which is faster and both outputs for structure can be produced via dartR.

 

Cheers, Bernd

Peter Kriesner

unread,
Oct 11, 2021, 4:14:00 AM10/11/21
to da...@googlegroups.com
Hi folks,

Sorry as I believe this has come up before in some form, but I'm still trying to reconcile the difference between report and filter functions in dartR, now for both HWE and heterozygosity.

I have SNP data for one single population, 48 individuals with effectively no pop'n substructure.

As an example of what seems confusing:

bd_WA <- gl.keep.pop(bandicoot.gl, pop.list = "WA",
            recalc = TRUE, mono.rm = TRUE, verbose = 3)
# 29 individuals in one population
gl.report.hwe(bd_WA, subset = "each", plot = TRUE,
            method = "ChiSquare", alpha = 0.05, bonf = TRUE, verbose = 3)
# reports two SNPs with significant departure from HWE at 0.05 < p < 0.01
#  (after Bonferroni Correction - presumably the red points showing on the ternary plot that's produced)

But:

x1 <- gl.filter.hwe(bd_WA, alpha = 0.05, basis = "any",
            bon = TRUE, verbose = 3)
# reports 0 loci with significant departure from HWE

Also:

gl.report.heterozygosity(bd_WA, method = "ind", plot = TRUE,
          boxplot = "adjusted", verbose = 3)
# reports one individual with Ho < 0.2, and one with Ho > 0.5
# to filter these two individuals out, trying:
bd_WA_filter1 <- gl.filter.heterozygosity(bd_WA, t.upper = 0.5,
          t.lower = 0.2, verbose = 3)
# but it errors, with the following output...
# Minimum individual heterozygosity 4.1429
# Maximum individual heterozygosity 17.4207
# Retaining individuals with heterozygosity in the range 0.2 to 0.5
# Error in x@gen[[1]] : subscript out of bounds

I'm using dartR 1.9.9.1
Do I need the latest development version for this?
(also R 4.1.1, but a very old version of RStudio, as I'm suddenly unable to start newer versions of RStudio on my current version of Windows 10)

Thanks!
Peter


Jose Luis Mijangos

unread,
Oct 12, 2021, 2:16:44 AM10/12/21
to dartR
Hi Peter,

These issues have been solved in the beta version of dartR which you can install as shown below.

# library required to install from github
library(devtools)
# installing developing version of dartR
install_github("green-striped-gecko/dartR@beta")
# if you are working with R-studio restart your session with menu Session -> Restart R
library(dartR)

Cheers,
Luis

Peter Kriesner

unread,
Oct 12, 2021, 2:33:23 AM10/12/21
to da...@googlegroups.com
Great. Thanks Luis.

I thought I'd read that somewhere but couldn't remember where.

Cheers,
Peter

--
You received this message because you are subscribed to the Google Groups "dartR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dartr+un...@googlegroups.com.

Jose Luis Mijangos

unread,
Oct 13, 2021, 2:36:16 AM10/13/21
to dartR
Hi Gabriella,

We have been working on 4 new/updated conversion functions that might be of help:

- gl2gds converts a genlight object into gds format (package SNPRelate).
- gl2plink converts a genlight object into PLINK format: bed, bim, fam, ped and map.
- gl2vcf converts a genlight object into vcf format.
- gl2eigenstrat converts a genlight object into eigenstrat format.

Cheers,
Luis

Jose Luis Mijangos

unread,
Oct 13, 2021, 2:40:42 AM10/13/21
to dartR
I forgot to say that for using these new functions you need to install the developing version of dartR with the following command:

dartR::gl.install.vanilla.dartR(flavour="dev")

Cheers,
Luis

Gabriella Scatà

unread,
Oct 14, 2021, 9:17:41 AM10/14/21
to dartR
Thank you Arthur, that was really helpful!
Cheers,
Gabriella

Gabriella Scatà

unread,
Oct 14, 2021, 9:19:49 AM10/14/21
to dartR
Hi Luis,
thank you so much for this!
It is such a relief I do not have to figure out how to convert the dataset manually myself.
I will use dartR @ dev version and try the gl2plink function. Will let you know if I have any problem.
Thanks again!
Best,
Gabriella

Gabriella Scatà

unread,
Oct 18, 2021, 7:02:24 PM10/18/21
to dartR
Hi Bernd,
thank you so much as well for your explanation.
I am aiming at using Admixture because it is faster with large datasets (over 15000 SNPs)...I can also try faststructure if Admixture is difficult..as I have also had issues understanding how it works...the manual is not very informative - at least for me - and the developer hasn't yet replied to my questions...so will let you know if he does.

However, now that you have a genlight2plink function though i'll just try first Admixture and see how it goes..
Thanks again!
Gabriella
Message has been deleted

Gabriella Scatà

unread,
Oct 20, 2021, 3:27:47 AM10/20/21
to dartR
Hi everyone,
and sorry for the additional post...and I already apologize if it's going to be another long post...

I have reinstalled the developer version of dartR and loaded it. I also downloaded Plink from their website as indicated by the gl2plink function.
Then, I have then tried to run the function gl2plink, which should convert a genlight object into PLINK format: bed, bim, fam, ped and map.

So...
1) First of all, when I try to run the function with the individuals' sex information, I get the following warning message:

<Warning messages:
1: In if (sex_code != "unknown") { :
the condition has length > 1 and only the first element will be used
2: In `[<-.factor`(`*tmp*`, sex_code == "female", value = "2") :
livello factore non valido, generato NA
3: In `[<-.factor`(`*tmp*`, sex_code == "male", value = "1") :
livello factore non valido, generato NA
4: In `[<-.factor`(`*tmp*`, sex_code == "unknown", value = "0") :
livello factore non valido, generato NA
>

As the sex in my dataset is coded as "M" or "F" (and as a factor) I thought the function needs the coding as "female"/"male" as states in the function description. So, I modified the information in my genlight object to be "female" or "male". But I still obtained the same warning message (Note: I don't know why this message seems to be in italian..."livello factore non valido, generato NA", but anyways it means "factor level not valid, NA generated" )

Then I thought it might be due to the "sex" being coded as a factor, soI transformed it into a character, and I get only the following warning message:

<Warning messages:
1: In if (sex_code != "unknown") { :
the condition has length > 1 and only the first element will be used
 
>

So, I don't think it's fundamental for me to include the sex information, but I thought I would ask what is happening and what I am doing wrong here. What does it mean that only the first element will be used? (the first sex index - whatever it is, either M/F - used for all individuals?)



2) Second and most concerning, when I run the gl2plink function (with sex=character: "male"/"female"), the function proceeds until completion, but I get the following messages:

<Starting gl2plink
Processing genlight object with SNP data
--Output of function start:
PLINK v1.90b6.24 64-bit (6 Jun 2021) www.cog-genomics.org/plink/1.9/
(C) 2005-2021 Shaun Purcell, Christopher Chang GNU General Public License v3

Logging to C:/Users/gabry/MAF0.01_t.het060_plink.log.
Options in effect:
--allow-extra-chr
--allow-no-sex
--file C:/Users/gabry/MAF0.01_t.het060_plink
--keep-allele-order
--out C:/Users/gabry/MAF0.01_t.het060_plink

16271 MB RAM detected; reserving 8135 MB for main workspace.
Possibly irregular .ped line. Restarting scan, assuming multichar alleles.
Rescanning .ped file... 3%
Error: Half-missing call in .ped file at variant 1, line 7. ** (see below)

----------Output of function finished...
Completed: gl2plink
Warning messages:
1: In if (sex_code != "unknown") { : the condition has length > 1 and only the first element will be used
2: In system(..., intern = T) : running command 'C:/Users/gabry/plink --file C:/Users/gabry/MAF0.01_t.het060_plink  --allow-no-sex --keep-allele-order --out 
C:/Users/gabry/MAF0.01_t.het060_plink --aec' had status 3>


So my questions now are:
1) What would cause an irregular .ped line and what does it mean?
I checked my genlight object with as.matrix(gl) , and I could not find any missing value at the 1st locus (first column) and 7th individual (7th line): its value is 0.

Is this possibly causing a problem in the generation of my .bed and .bim, .fam and .map files?

In my original genlight dataset, I have 187 individuals from 10 different populations.

When I check my .FAM file, it looks WEIRD: instead of having all individuals from all populations listed it has data listed only for the first 7 individuals;  each individual has the same 4 numbers listed in the following columns: 0 0 2 0. In addition the name of one population, "Harrvey Bay" (so for the 7th individual - the only individual for that population listed out of the total 7 individuals listed) is composed by 2 words and it has been split in 2 columns so that its 2nd part ("Bay") falls in the column of the individuals names and thus everything else is shifted too for this population. 

i.e. 
Lriver INDI017 0         0 NA 0
Harrvey Bay         INDI026 0 0 NA

A second population has a very long one-word name, and the data for the individual for this population (2nd individual in the only 7 listed) also has been similarly shifted so that individual name is in column 3 instead of 2 (as example above).

As far as I read from this link "https://www.cog-genomics.org/plink/1.9/formats#fam", the .FAM file needs to have 6 fields, such as Family ID, within-famiy-ID (which in my case are the population & individual ID), father_ID, mother_ID, sex_code and phenotype value. In the last 3rd-4th columns I have all 0 as I have no pedigree information, but I have all 0 in the phenotype column, which I am not sure whether is correct or not. Plus, in the sex column I have either all 0 or 2 if I do not modify the sex coding, or all NA if I try to modify the sex coding to "male"/"female".

The .MAP file seems OK: I have the same number of loci as in my original dataset and their name is listed in the 2nd column while all other 3 columns (1st, 3rd and 4th columns) are 0 because I do not have information on either which chromosome each locus belongs to or position along the chromosome.

The .PED file also seems OK, it seems to contain the correct number of columns (although I cannot count all the variants columns but I am assuming its correct because they are a lot). It has 1st column = population name, 2nd = individual name, 3rd-6th = 0,0,2,0 apparently for all individuals (as it appears in the .FAM file. It actually seems that the .FAM file includes only the first 7 line - so 7 individuals - of this .PED file).
Always at this link "https://www.cog-genomics.org/plink/1.9/formats#ped", they explain that the .PED format should have from 7th column onwards 2 columns for each variant/locus the call for that locus for that individual, and I have the name of the base pair at that locus either G/T/A/C...but also some zero (0), which I assume is correct? are the 0 the missing values?

I thought that the error at variant 1 for individual 7 (see above**) is due to the population name having a space within it and so shifting the values of 1 column each for all individuals of this population (so first column = "Harrvey", 2nd column = "Bay" instead of the individual name, 3rd column = "individual_7", 4th column = 0, and so on...everything shifted). So I tried to rename this population and link the name together such as "Harrvey_bay". But I still got the same error :

Possibly irregular .ped line. Restarting scan, assuming multichar alleles.
Rescanning .ped file... 3%
Error: Half-missing call in .ped file at variant 1, line 7. 


And the .FAM file is still stopping at the 7th individual.
I tried also to shorten the name of the population of the 2nd individual (the other one that had been shifted in the .FAM file), but I still obtained the exact same error.

What is this error due to and how to fix it?

In addition, I also seem to miss the .BIM file...I only have the plink.LOG file which only lists the parameters and the process the function went through.

In the Admixture manual they say you can either have a .ped or .bed file. If you have a .ped file you only need an accompanying .map file as far as I understood (while for .bed you need associated with it a .bim and a .fam).
So I guess if I obtain a correct .ped and .map it doesn't matter if my .fam is wrong? But my concern is that also the .ped is wrong or has some errors because it is explicitly mentioned in the function progress report.

I would really appreciate some assistance once again on this issue!
Thanks a lot!
Gabriella

PS. Bernd mentioned that it is possible to run STRUCTURE directly in dartR...is that already available and fully functional in the non-developer dartR version? 

Il giorno mercoledì 13 ottobre 2021 alle 16:36:16 UTC+10 luis.m...@gmail.com ha scritto:

Jose Luis Mijangos

unread,
Oct 26, 2021, 10:20:38 PM10/26/21
to dartR
Hi Gabriella,

Thank you for reporting those bugs. I have fixed both of them.
1.- Now the sex information needs just to start with an "F" or "f" for females, with an "M" or "m" for males and with a "U", "u" or being empty if the sex is unknown.
2.- The second problem occurred if populations have a name with a space. Now, if names of populations or individuals contain spaces, they are replaced by an underscore "_".

To use this updated version you should install the developing version of dartR as follows:
 
dartR::gl.install.vanilla.dartR(flavour="dev”)

# then you can use:

gl2plink(
  x = test_Gabry,
  plink_path = getwd(),
  bed_file = TRUE,
  outfile = "test_Gabry_plink",
  outpath = getwd(),
  sex_code = test_Gabry$other$ind.metrics$sex,
)

Cheers,
Luis

Ana Filipa Sobral

unread,
Mar 8, 2022, 11:37:54 AM3/8/22
to dartR
Hi Luis,
I am trying to use the function gl2plink, but am encountering some problems.

Session info:
R version 4.0.4 (2021-02-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base    

other attached packages:
[1] dartR_1.9.9.1  ggplot2_3.3.5  adegenet_2.1.5 ade4_1.7-18  

If I run what you suggested above, I get an error, please see below:

gl2plink(topegl6, plink_path = getwd(), bed_file = TRUE, outfile = "topegl6_plink",outpath = getwd())

Error in gl2plink(topegl6, plink_path = getwd(), bed_file = TRUE, outfile = "topegl6_plink",  :
  unused arguments (plink_path = getwd(), bed_file = TRUE)


If I run the function as suggested in the respective help file, the output is a single .csv file, please see below:

gl2plink(topegl6, outfile = "plink.csv", outpath = getwd(), verbose = 5)

Starting gl2plink [ Build = Jacob ]
  Processing a SNP dataset
  Writing data to output file
Completed: gl2plink
NULL

Would be great to get your help to understand if I might be doing something wrong or if this could be an error.

Thank you!

Ana

Jose Luis Mijangos

unread,
Mar 8, 2022, 6:18:58 PM3/8/22
to dartR
Hi Anna,

I think the problem is with the installation of the developing version of dartR, can you please try the following:

library(dartR)
gl.install.vanilla.dartR(flavour = "dev")
# restart your R session: Menu > Session > Restart R
library(dartR)
# download the plink binary for your system from: https://www.cog-genomics.org/plink/
# unzip the file, access the unzipped folder and move the binary file ("plink")
# to your working directory
# in Mac you might need to open the binary first to grant access for the binary
test <- platypus.gl
# assigning SNP position
test$position <- test$other$loc.metrics$ChromPos_Platypus_Chrom_NCBIv1
# assigning a dummy name for chromosomes
test$chromosome <- as.factor("1")
# run the function
gl2plink(test, plink_path = getwd(), bed_file = TRUE, outfile = "topegl6_plink",outpath = getwd())

If you like to use chromosome information when converting to plink format and your chromosome names are not from human, you need to change the chromosome names as 'contig1', 'contig2', etc. as described in the section "Nonstandard chromosome IDs" in the following link:

Cheers,
Luis

Ana Filipa Sobral

unread,
Mar 9, 2022, 9:48:36 AM3/9/22
to dartR
Hi Luis,
Thank you for your prompt reply.

It works for the platypus data.

However, in my case I do not have a reference genome, so I tried the following:

# assigning SNP position
topegl9$position <- topegl9$other$loc.metrics$SnpPosition

#assigning unique ID
topegl9$chromosome <- as.factor(topegl9$other$loc.metrics$CloneID)

In my output file however under chromosome instead of the unique CloneID numbers what I have is:
as.numeric(topegl9$chromosome) - which are different numbers completely

When trying the following function: 
topegl9$chromosome <- as.character(topegl9$other$loc.metrics$CloneID)

I got the following error, the same if I tried to add "contig_" before each CloneID:
Error in checkSlotAssignment(object, name, value) :
  assignment of an object of class “character” is not valid for slot ‘chromosome’ in an object of class “genlight”; is(value, "factorOrNULL") is not TRUE

As for assigning a dummy name for chromosomes, won't that affect further analyses with Plink?

I hope I was able to make this clear enough...

Thanks for your help!

Ana

Jose Luis Mijangos

unread,
Mar 10, 2022, 10:24:10 PM3/10/22
to dartR
Hi Ana,

The class of the chromosome slot is "factor":
> class(topegl9$chromosome)
[1] "factor"

If you want to convert from factor to numeric you can use:
> as.numeric(as.character(topegl9$chromosome))

Note that in this type of issues in which users want to explore approaches outside of dartR, is the responsability of users to learn how to use R. There are many online resources where you can find support:

- As for assigning a dummy name for chromosomes, won't that affect further analyses with Plink?

I recommend you to read carefully the documentation of the Plink function that you want to use to understand whether using a dummy name for chromosome would affect your analysis.

Cheers,
Luis
Message has been deleted

Zoriana Lam

unread,
Sep 17, 2023, 10:01:15 PM9/17/23
to dartR
Hi,

I tried to converting my genlight object into plink using the gl2plink function: 
gl2plink(X,bed_file=TRUE, plink_path="/home/lam/.conda/envs/dartRYuma/bin/", outfile = "X_plink", outpath="/scratch/user/lam/X/", chr_format="numeric")

It comes up with an error when I try to create the .bed file: 
unused arguments (bed_file = TRUE, plink_path = "/home/lam/.conda/envs/dartRYuma/bin/", chr_format = "numeric")

I believe there is something wrong with creating the .bed file but I already provided the plink executable in the plink pathway mentioned in the command.

Jose Luis Mijangos

unread,
Sep 17, 2023, 10:12:21 PM9/17/23
to dartR
Hi,

You might have installed in your computer our new package dartR.base, in this package the correct parameters is "bed.file". 

?You can install dartR using:

install.packages("dartR")

or the developing version using:

library(devtools)
install_github("green-striped-gecko/dartR@dev")

Cheers,
Luis 

David Encalada-Bustamante

unread,
Jun 3, 2024, 9:03:32 AMJun 3
to dartR

Hi Luis, in this case, what tag would be use? ChromPosTag or ChromPosSnp? I have 2 reference genomes
Thanks,
David.

Jose Luis Mijangos

unread,
Jun 12, 2024, 1:43:43 AMJun 12
to dartR
Hi David,

Based on the dataset you sent me, ChromPosTag is the position of the first nucleotide of the trimmed sequence that was mapped to the reference genome and ChromPosSnp is the position of the SNP. So, please use ChromPosSnp.

Cheers,
Luis 

Reply all
Reply to author
Forward
0 new messages