Genetic distance matrix for individuals using genlight objet

231 views
Skip to first unread message

Oumar Boro

unread,
Sep 1, 2023, 1:12:00 AM9/1/23
to dartR
Hi Team,
Hope you are all very fine.

I'm trying to generate genetic distance  (whether in a graph or a matrix) between individuals using a genlight object but I'm having some challenges.
In fact, the function that is assigned to that is "gl.dist.ind()" and when I load it, the following message appears in the dialogue:" Processing genlight object with SNP data Matrix converted.. Prepare genind object...Completed: gl2gi"
But when I'm doing it for population relatedness using "gl.dist.pop()", I can view clearly the matrix of sub populations.
I have to signal that I have quite a huge data made up of a thousand genotypes with about 3000 loci. Then I'm wondering if it's due to the size of the data.
I went through this link " https://rdrr.io/cran/dartR/man/gl.dist.ind.html and many others but still the same.

Any suggestion will be enormously appreciated.

Best regards,
Oumar

Bernd.Gruber

unread,
Sep 1, 2023, 1:28:10 AM9/1/23
to da...@googlegroups.com

What kind of distance are you after? , in case for simple Euclidean distance you can try

 

 

ED <- dist(as.matrix(gl)-1)  #you want centre around zero

 

#check via

gl.plot.heatmap(ED)

--
You received this message because you are subscribed to the Google Groups "dartR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dartr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dartr/274000e8-74e2-4ee9-b5a3-f14a74aa0f2an%40googlegroups.com.

Jose Luis Mijangos

unread,
Sep 1, 2023, 1:59:27 AM9/1/23
to dartR
Hi Oumar,

Yes, when your matrix contains thousands of individuals you will not be able to visualise it in the R console.

Try the below solutions. 

library(dartR)
# filtering
t1 <- gl.filter.callrate(platypus.gl,threshold = 1)
t1 <- gl.filter.monomorphs(t1)
# dropping the same individuals
t1 <- gl.drop.ind(t1, ind.list = c("T42","T3"))
# genetic distance with matrix as output
res <- gl.dist.ind(t1,output = "matrix")
# setting the matrix diagonal to NA
diag(res) <- NA
# writing the matrix to a CSV file
write.csv(res,file="test.csv")
library(viridis)
# plotting matrix using a heatmap
gl.plot.heatmap(res,
                palette_divergent = viridis)

# printing the first five rows and five columns of the matrix
res[1:5,1:5]

Cheers,
Luis 

Oumar Boro

unread,
Sep 1, 2023, 4:16:52 AM9/1/23
to dartR
Thank so much Luis,
I filtered the raw data for several parameters such as call rate, and monomorphic sites as you mentioned, but also for missing data, minor allele frequencies, and also for heterozygosity at a given threshold (where I dropped many loci and individuals).  So I run the script you sent in starting by "genetic distance with matrix as output" but I'm still facing challenges.
First, setting the diagonal is not working (this is the error: Error in `diag<-`(`*tmp*`, value = NA) : only matrix diagonals can be replaced ) ; then with plotting the matrix via heatmap I got this:Starting gl.plot.heatmap Warning: Found object of class genind Error in utils.check.datatype(D, accept = c("dist", "fd", "matrix"), verbose = verbose) : Fatal Error: inappropriate object passed to function, found genind expecting dist or fd or matrix  

NB: Due to the size of my data, I just worked on a subset to make it shorter but no success so far.
I think it can help if I remind that the genlight object is made up of two datasets, one in "0, 1, 2" format (metadata), and the second in "A:A" format where missing values are scored "-:-" ; 
then I merge both (not sure it is the right way to say it) to get the genlight.

Oumar Boro

unread,
Sep 1, 2023, 12:01:25 PM9/1/23
to dartR
Hi Gruber,
Thanks a lot.
Yes I'm interested in euclidean distance

Arthur Georges

unread,
Sep 1, 2023, 8:53:22 PM9/1/23
to da...@googlegroups.com
Hi guys. This is all a bit of a mystery to me because gl.dist.ind() does not call on gl2gi() nor does it reference genind objects. Luis, can you point Omar to the latest dev version of this script which should be on dartR.base. The older versions used third party scripts.

Omar, can you make sure your genlight object is compliant with dartR by using gl <- gl.compliance.check(gl).

The distance function should still work regardless of the size of the dataset, though obviously at some point computational time will become an issue (1000 x 1000 distance matrix).  

We might need to have a look at the heatmap function to see how it handles 1000 individuals.

A

--
You received this message because you are subscribed to the Google Groups "dartR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dartr+un...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages