Hi Gabriela,
Not sure if it helps, but Admixture is also reading Eigenstrat format and it is fairly easy to convert genlight into the Eigenstrat format.
Here is an example from the write_snp function of the genio package.
I do not have admixture installed, if you manage to use the eigenstrat format, can you please let me know and then we can include a conversion function
gl2eigenstrat (gl2admixture) so we have a conversion that works.
Here the link to the write_snp function (it has also an example that shows the structure of the tibble):
https://rdrr.io/cran/genio/man/write_snp.html
Cheers, Bernd
P.s. Plink itself is a fairly complicated format with all it family structure etc. but we may have a look to create a fully working plink output in dartR, but this might take some time.
--
You received this message because you are subscribed to the Google Groups "dartR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
dartr+un...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/dartr/e925514c-1324-4785-8f9d-d5ef1d081c8en%40googlegroups.com.
Hi Gabriela,
I guess Arthur already explained the coding issue. Basically genlight object are compressed into byte format (to be efficient) and you need to use the as.matrix function to access the snps in the 0/1/2/NA format.
as.matrix(yourgl)[1:3,1:7]
for the first 3 individuals and 7 loci.
Then the problem with the plink/eigenstrat. I will have a look into it and aim to use the genio write_snp function, but I am really confused (and googling I did not find as well), why admixture needs to have the genomic position. In their tutorial they recommend to filter for LD to make sure loci are as unlinked as possible so this cannot be the reason and the introduction claims it is similar to STRCUTURE which certainly does not use the genomic positon information either. Therefore my “feeling” is that the information in the snp file that codes for the chromosome position is not used.
I will try to create some example input files and will use different randomw values here for the chromosome position, but if someone could shed some light here that would be good.
Basically the question is: does admixture use the chromosomal position in the simulation?
Finally any reason why you want to use admixture (running STRUCTURE is now fairly simple (if you are using Linux or Windows) from within dartR (though the tutorial is not done yet, because some of the structure function rely on another package which is about to be upgraded).
Final suggestion if speed is an issue is to use faststructure which is faster and both outputs for structure can be produced via dartR.
Cheers, Bernd
To view this discussion on the web visit https://groups.google.com/d/msgid/dartr/d7d97f3b-e2ae-4111-a5ea-3e42d73e8ad9n%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "dartR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dartr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dartr/cf5fbd66-d24f-48ce-8f6a-787012b7e2e5n%40googlegroups.com.
<Warning messages:
1: In if (sex_code != "unknown") { :
the condition has length > 1 and only the first element will be used
2: In `[<-.factor`(`*tmp*`, sex_code == "female", value = "2") :
livello factore non valido, generato NA
3: In `[<-.factor`(`*tmp*`, sex_code == "male", value = "1") :
livello factore non valido, generato NA
4: In `[<-.factor`(`*tmp*`, sex_code == "unknown", value = "0") :
livello factore non valido, generato NA>
As the sex in my dataset is coded as "M" or "F" (and as a factor) I thought the function needs the coding as "female"/"male" as states in the function description. So, I modified the information in my genlight object to be "female" or "male". But I still obtained the same warning message (Note: I don't know why this message seems to be in italian..."livello factore non valido, generato NA", but anyways it means "factor level not valid, NA generated" )
Then I thought it might be due to the "sex" being coded as a factor, soI transformed it into a character, and I get only the following warning message:
<Warning messages:
1: In if (sex_code != "unknown") { :
the condition has length > 1 and only the first element will be used >
So, I don't think it's fundamental for me to include the sex information, but I thought I would ask what is happening and what I am doing wrong here. What does it mean that only the first element will be used? (the first sex index - whatever it is, either M/F - used for all individuals?)
2) Second and most concerning, when I run the gl2plink function (with sex=character: "male"/"female"), the function proceeds until completion, but I get the following messages:
<Starting gl2plink
Processing genlight object with SNP data
--Output of function start:
PLINK v1.90b6.24 64-bit (6 Jun 2021) www.cog-genomics.org/plink/1.9/
(C) 2005-2021 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to C:/Users/gabry/MAF0.01_t.het060_plink.log.
Options in effect:
--allow-extra-chr
--allow-no-sex
--file C:/Users/gabry/MAF0.01_t.het060_plink
--keep-allele-order
--out C:/Users/gabry/MAF0.01_t.het060_plink
16271 MB RAM detected; reserving 8135 MB for main workspace.
Possibly irregular .ped line. Restarting scan, assuming multichar alleles.
Rescanning .ped file... 3%
Error: Half-missing call in .ped file at variant 1, line 7. ** (see below)
----------Output of function finished...
Completed: gl2plink
Warning messages:
1: In if (sex_code != "unknown") { : the condition has length > 1 and only the first element will be used
2: In system(..., intern = T) : running command 'C:/Users/gabry/plink --file C:/Users/gabry/MAF0.01_t.het060_plink --allow-no-sex --keep-allele-order --out
C:/Users/gabry/MAF0.01_t.het060_plink
--aec' had status 3>