Transforming Dgeo in function gl.ibd

25 views
Skip to first unread message

David Tork

unread,
May 5, 2025, 6:50:36 PMMay 5
to dartR
Hello,

I am trying to calculate individual-based IBD within three separate species using a Euclidean distance matrix. My question relates to the following note in the function documentation: "Often a problem arises, if an individual based distance is calculated (e.g. propShared) and some individuals have identical coordinates as this results in distances of zero between those pairs of individuals. If the standard transformation [log(Dgeo)] is used, this results in an infinite value, because of trying to calculate'log(0)'. To avoid this, the easiest fix is to change the transformation from log(Dgeo) to log(Dgeo+1)...

This caught my eye as I am aware that a number of individuals in my dataset have the same lat/long. To avoid issues related to this, I tried the recommended transformation as follows: 
IBD <- gl.ibd(genlight, distance = "euclidean", permutations = 10000, save2tmp = T, Dgeo_trans = "log(Dgeo+1)")
IBD


This seems to have had a major impact, as the resulting plots are considerably different in terms of both the orientation of points and the scale of the x-axis. The Mantel statistic r and p-value also showed major changes following transformation. Here's the data from one of the three species as an example:
Pre-transformation (permutations = 999)
r = 0.2462, p = 0.001
Screenshot 2025-05-05 at 5.27.42 PM.png
Post-transformation "log(Dgeo+1)" (permutations = 10,000)
r = 0.2675, p = 0.00009
Screenshot 2025-05-05 at 5.27.19 PM.png
One final question related to the presentation of these data:  I have seen gl.ibd plots presented in the literature without modification to the axis titles (e.g. figure 3: https://doi.org/10.1016/j.foreco.2024.122256). Are there more descriptive axis titles that should be used in place of "Dgen" and "Dgeo"? My understanding is that these are just general variable names within the function. Would it be more descriptive to instead use Y: "Euclidean genetic distance", X: "Transformed Euclidean geographic distance" or similar. 

I have not seen this topic discussed here, nor could I find examples of implementing this transformation in the literature. I just want to make sure that the recommended transformation has been applied correctly and that the plots are appropriately presented.

Thank you,
David

Jose Luis Mijangos

unread,
May 5, 2025, 7:49:49 PMMay 5
to dartR
Hi David,

You can read Rousset (1997) to understand the use of log transformation on geographic distance. You can download the article using the link below. 

Another interesting article about the use of IBD in landscape genetics is:
Van Strien, M. J., Holderegger, R., & Van Heck, H. J. (2015). Isolation-by-distance in landscapes: considerations for landscape genetics. Heredity114(1), 27-37.

To modify the axes titles, you can use the code below:

library(dartRverse)
library(ggplot2)
t1 <- platypus.gl
# running IBD and saving the ggplot in the working directory
res <- gl.ibd(t1, plot.file = "test", plot.dir = getwd() )
# loading the ggplot
p1 <- readRDS("test.RDS")
# modifying axes labels
p2 <- p1 +
  xlab("Transformed Euclidean geographic distance") +
  ylab("Euclidean genetic distance")
p2 

Cheers,
Luis 

Bernd.Gruber

unread,
May 5, 2025, 8:41:57 PMMay 5
to da...@googlegroups.com

Hi David,

 

If I am not mistaken the main difference is the log transformation and you can see when comparing the plots (the big blob in the untransformed version is smoothed out to the right and the line is no longer mainly fitted between the two blobs but more by more points along the line. So I think the log transformation makes sense from a statistical point of view as the predictor (log(Dgeo+1) is spread out more evenly. This is also the reason why the R2 is much stronger. You can see the “zero” at distance 1 (not too many so would not be too worried about it). Another trick is to add some randomness to the x, y coordinates so the distance is not zero anymore (just some centimeters, but it will give you the same result)

 

The log transformation on distance is often used because Rousset found the relationship between Fst/1-Fst and the log of the distance in a stepping stone model.

Admittedly individual based distances do not necessary follow this and not sure someone has found a theoretical relationship that warrants a log transformation (if you know paper please let me know).

 

In regards to the axes labels

I guess the “trick” is to have the generic labels so the user is reminded what the axis are and then there should be the possibility to change them via parameters in the function. So will add this feature in the next iteration.

 

You can already change the labels from the ggplot

 

#save the plot as rds

gg <- gl.ibd(possums.gl, plot.dir = tempdir(), plot.file = "test")

 

#load the plot

pl <- readRDS(file.path(tempdir(), "test.rds"))

 

#add things as you like in ggplot

pl+xlab("whatever")+ylab("whatelse")

 

Hope that makes sense,

 

Cheers, Bernd

 

 

 

From: 'David Tork' via dartR <da...@googlegroups.com>
Sent: Tuesday, 6 May 2025 8:51 AM
To: dartR <da...@googlegroups.com>
Subject: [dartR] Transforming Dgeo in function gl.ibd

 

Hello,

 

I am trying to calculate individual-based IBD within three separate species using a Euclidean distance matrix. My question relates to the following note in the function documentation: "Often a problem arises, if an individual based distance is calculated (e.g. propShared) and some individuals have identical coordinates as this results in distances of zero between those pairs of individuals. If the standard transformation [log(Dgeo)] is used, this results in an infinite value, because of trying to calculate'log(0)'. To avoid this, the easiest fix is to change the transformation from log(Dgeo) to log(Dgeo+1)...

 

This caught my eye as I am aware that a number of individuals in my dataset have the same lat/long. To avoid issues related to this, I tried the recommended transformation as follows: 

IBD <- gl.ibd(genlight, distance = "euclidean", permutations = 10000, save2tmp = T, Dgeo_trans = "log(Dgeo+1)")
IBD

 

This seems to have had a major impact, as the resulting plots are considerably different in terms of both the orientation of points and the scale of the x-axis. The Mantel statistic r and p-value also showed major changes following transformation. Here's the data from one of the three species as an example:

Pre-transformation (permutations = 999)

r = 0.2462, p = 0.001

Post-transformation "log(Dgeo+1)" (permutations = 10,000)

r = 0.2675, p = 0.00009

One final question related to the presentation of these data: I have seen gl.ibd plots presented in the literature without modification to the axis titles (e.g. figure 3: https://doi.org/10.1016/j.foreco.2024.122256). Are there more descriptive axis titles that should be used in place of "Dgen" and "Dgeo"? My understanding is that these are just general variable names within the function. Would it be more descriptive to instead use Y: "Euclidean genetic distance", X: "Transformed Euclidean geographic distance" or similar.

 

I have not seen this topic discussed here, nor could I find examples of implementing this transformation in the literature. I just want to make sure that the recommended transformation has been applied correctly and that the plots are appropriately presented.

 

Thank you,

David

 

--
You received this message because you are subscribed to the Google Groups "dartR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dartr+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dartr/770dcc0f-f6f3-40ea-8509-ebdcaf2fba0cn%40googlegroups.com.

David Tork

unread,
May 6, 2025, 1:51:55 PMMay 6
to dartR
Thank you Luis and Bernd, that is very helpful!

Yayan Kusuma

unread,
May 10, 2025, 9:49:22 PMMay 10
to dartR
Dear Bernd,

I have a slight problem with the code you've shared to modify the axis title in gl.ibd command. I have reinstalled dart.verse (base, popgen, and spatial) and the dependencies. But the code gave me this error.

> p1 <- readRDS("test.rds") > # modifying axes labels > p2 <- p1 + + xlab("Transformed Euclidean geographic distance") + + ylab("Euclidean genetic distance") Error in p1 + xlab("Transformed Euclidean geographic distance") : non-numeric argument to binary operator

Could you give me an idea of how to solve this?

Thanks in advance,
YK
Reply all
Reply to author
Forward
0 new messages