Use of gl.ibd on a large dataset (3,612 SNP, 3.19% missing data)

443 views
Skip to first unread message

kirstenmt...@gmail.com

unread,
Jan 29, 2021, 12:23:00 PM1/29/21
to dartR
Hi there,

I am trying to use the IBD function. I can get it to work on a subset of my data (175 ind), but when I use the full dataset (1018 ind), I get this error:

> ibd_range_test <-gl.ibd(range_clean.gl)
Standard analysis performed on the genlight object. Mantel test and plot will be Fst/1-Fst versus log(distance) Coordinates transformed to Mercator (google) projection to calculate distances in meters. Mantel statistic based on Pearson's product-moment correlation

Call: mantel(xdis = Dgen, ydis = Dgeo, permutations = 999, na.rm = TRUE)
Mantel statistic r: NaN Significance: NA
Upper quantiles of permutations (null model): 90% 95% 97.5% 99% NA NA NA NA Permutation: free Number of permutations: 999

Error in kde2d(Dgeo, Dgen, n = 300) : missing or infinite values in the data are not allowed

Is there an extra step of filtering that I can use to make this function work for me? My data is spread over 1000km^2 distance, so I am not sure if that is causing some issues.

Jose Luis Mijangos

unread,
Feb 2, 2021, 11:09:46 PM2/2/21
to dartR

Hi Kirsten,

 

Could you please test whether the following function solves the problem.

 

Please let us know whether it worked or not.

 

Regards,

Luis

gl.ibd_fix <- function(gl = NULL, Dgen = NULL, Dgeo = NULL, projected = FALSE,
    permutations = 999, plot = TRUE)
{
    if (!(requireNamespace("dismo", quietly = TRUE))) {
        stop("Package dismo needed for this function to work. Please install it.")
    }
    else {
        if (!is.null(Dgen) & !is.null(Dgeo))
            cat("Analysis performed on provided genetic and Euclidean distance matrices.")
        if (class(gl) == "genlight") {
            cat("Standard analysis performed on the genlight object. Mantel test and plot will be Fst/1-Fst versus log(distance)\n")
            if (nrow(gl@other$latlong) != nInd(gl))
                stop("Cannot find coordinates for each individual in slot @other$latlong")
            if (sum(match(names(gl@other$latlong), "long"), na.rm = T) == 1)
                gl@other$latlong$lon <- gl@other$latlong$long
            if (!projected) {
                xy <- dismo::Mercator(gl@other$latlong[, c("lon", "lat")])
                cat("Coordinates transformed to Mercator (google) projection to calculate distances in meters.\n")
            }
            else {
                xy = gl@other$latlong[, c("lon", "lat")]
                cat("Coordinates not transformed. Distances calculated on the provided coordinates.")
            }
            pop.xy <- apply(xy, 2, function(a) tapply(a, pop(gl),
                mean, na.rm = T))
            Dgeo <- dist(pop.xy)
            Dgeo <- log(Dgeo)
            Dgen <- as.dist(StAMPP::stamppFst(gl, nboots = 1))
            Dgen <- Dgen/(1 - Dgen)
            ordering <- levels(pop(gl))
            Dgen <- as.dist(as.matrix(Dgen)[ordering, ordering])
            Dgeo <- as.dist(as.matrix(Dgeo)[ordering, ordering])
        }
        miss = FALSE
        if (sum(is.na(Dgen)) > 0 | sum(is.infinite(Dgen)) > 0 ) {
            miss = TRUE
            cat("There are missing values in the genetic distance matrix. No kernel distance plot is possible.\n")
        }
        if (sum(is.na(Dgeo)) > 0 | sum(is.infinite(Dgeo)) > 0 ) {
            miss = TRUE
            cat("There are missing values in the geographic distance matrix. No kernel distance plot is possible.\n")
        }
        manteltest <- vegan::mantel(Dgen, Dgeo, na.rm = TRUE, permutations = 999)
        print(manteltest)
        if (plot) {
            if (!miss) {
                dens <- MASS::kde2d(Dgeo, Dgen, n = 300)
                myPal <- colorRampPalette(c("white", "blue",
                  "gold", "orange", "red"))
                plot(Dgeo, Dgen, pch = 20, cex = 0.8)
                image(dens, col = transp(myPal(300), 0.7), add = TRUE)
                points(Dgeo, Dgen, pch = 20, cex = 0.8)
                abline(lm(Dgen ~ Dgeo))
                title("Isolation by distance")
            }
            else {
                plot(Dgeo, Dgen)
                abline(lm(Dgen ~ Dgeo))
                title("Isolation by distance")
            }
        }
        out <- list(Dgen = Dgen, Dgeo = Dgeo, mantel = manteltest)
        return(out)
    }
}

Kirsten M. Thompson

unread,
Feb 26, 2021, 5:24:23 PM2/26/21
to da...@googlegroups.com
Hi Jose,

Sorry for the late reply. Yes, this generates a similar graphic, but it doesn't not have the line or the heat map. I've attached two images for comparison.

image.png
image.png

--
You received this message because you are subscribed to a topic in the Google Groups "dartR" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dartr/lULh7pMe3vM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dartr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dartr/56f295a9-9815-4174-82bd-78551bfde904n%40googlegroups.com.


--
Kirsten M. Thompson, M.Sc.
PhD Candidate (NRES-Murray Lab) 
University of Northern BC
Lab 4-403 (1-250-960-5730)
3333 University Way
Prince George, BC V2N 4Z9

Jose Luis Mijangos

unread,
Mar 1, 2021, 2:05:21 AM3/1/21
to dartR
Hi Kirsten,

It seems that you are using the gl.ibd function with individuals. Could you please confirm this. 

At the moment the function only works properly with populations. We are working in a function to perform IBD with individuals which hopefully will be available very soon. In the meantime and if it is convenient for your study, you could group your individuals by population and then run the gl.ibd again. You can assign populations to individuals as follows:

> gl.make.recode.pop(gl, outfile = "new_pop_assignments.csv",outpath = getwd()) 

This will generate a csv file with two columns, the first containing the existing population assignments, and the second also containing those assignments ready for editing to achieve the reassignments. This editing is best done in Excel. The population reassignments are then applied using:

> glnew <- gl.recode.pop(gl, pop.recode = "new_pop_assignments.csv") 

You can check that the new assignments have been applied with:  

> levels(pop(gl)) 

If you decide to perform the isolation by distance analysis with individuals, you can try:

> test <- your_data
> Dgen <- gl.dist.ind(test)
> Dgeo <- dist(test$other$latlong)
> manteltest <- mantel(Dgen, Dgeo, na.rm = TRUE, permutations = 999)

# to plot
> plot(Dgeo, Dgen)
> abline(lm(Dgen ~ Dgeo))
> title("Isolation by distance”)

Regards,
Luis

Kirsten M. Thompson

unread,
Mar 1, 2021, 3:01:32 PM3/1/21
to da...@googlegroups.com
Hi Jose,

My genind has populations assigned to it. Am I not using the correct code? When I do something like a DAPC things do group by population.

Jose Luis Mijangos

unread,
Mar 1, 2021, 9:35:22 PM3/1/21
to dartR
Hi Kirsten,

I think the problem is that you have missing values in your data. The following commands will remove those populations with missing values and then try again with the filtered dataset.

library(StAMPP)
test <- your_data
Dgen <- as.matrix(stamppFst(test,nboots = 1))
# identifying pops with missing values
pop_missing_value <- names(which(apply(is.nan(Dgen),2,sum)>0))
# removing pops with missing values
test_without_miss <- gl.keep.pop(test,pop.list=popNames(test)[popNames(test)!=pop_missing_value])
# try ibd function with filtered dataset
test_results <- gl.ibd(test_without_miss)

Regards,
Luis
Reply all
Reply to author
Forward
0 new messages