Merge datasets

148 views
Skip to first unread message

Lyndal Hulse

unread,
Jun 26, 2025, 1:26:42 AM6/26/25
to dartR
Hi dartR Team,

Firstly, it was wonderful meeting most of the dartR team at the ICCB 2025 workshop.

Secondly, I was hoping you could help me with my query.  I have two vcf files (with associated metadata files) containing loci generated from the same SNP panel which I have uploaded into dartRverse.  The two vcf files contain genotype data from different populations but each file has a different number of loci, although there will be some overlapping loci between the two files.

Is it possible to merge the two vcf files (with attached individual metrics) to make one file and then identify sex-linked markers based on sex indentification metadata of some individuals?  I then want to use gl.infer.sex to determine individuals with unknown sex.

If it helps, I can email you the files.  I really hope this is feasible...

Kind regards,
Lyndal

nanisrobledo

unread,
Jun 26, 2025, 10:52:36 PM6/26/25
to dartR
Hi Lyndal,

It was lovely meeting you at ICCB!

I think there are a couple options for what you want to do: (1) merging the vcf files first, with bcftools merge for example, then transform the global vcf into a dartR genlight object with gl.read.vcf, or (2) transforming each vcf into a dartR genlight object and merging them with dartR.base::gl.join. Remember to apply gl.keep.sexlinked before gl.infer.sex. For example:

LBP_sexLinked <- dartR.sexlinked::gl.keep.sexlinked(x = LBP, system = "xy", plot.display = TRUE, ncores = 1) inferred.sexes <- dartR.sexlinked::gl.infer.sex(gl_sexlinked = LBP_sexLinked, system = "xy", seed = 100) inferred.sexes # The new sexes will be in this data frame

Best,
Diana

Lyndal Hulse

unread,
Jul 1, 2025, 7:49:15 PM7/1/25
to dartR
Hi Diana,

Thankyou for the advice.  However, I'm having issues merging two dartR genlight objects.
This is the error I'm receiving:

> Watergum_new <- gl.join(Watergum, Val) Starting gl.join Error in gl.join(Watergum, Val) :
Fatal Error: the two genlight objects do not have data for the same individuals in the same order
 
My two genlight objects have different individuals, but the same loci.  I want to merge the individuals from both genlight objects into one genlight object.  Is this possible?

Cheers,
Lyndal

Jose Luis Mijangos

unread,
Jul 1, 2025, 8:08:09 PM7/1/25
to da...@googlegroups.com

Hi Lyndal,

 

Could you please send me your datasets to my personal e-mail (luis.m...@gmail.com) so I can have a closer look.

 

Cheers,

Luis

 

--
You received this message because you are subscribed to the Google Groups "dartR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dartr+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dartr/ed8ffdf8-e6cd-4edf-a612-6425ddf5d31cn%40googlegroups.com.

Jose Luis Mijangos

unread,
Jul 3, 2025, 12:21:04 AM7/3/25
to dartR
Hi Lyndal,

The issue has now been resolved. To use the updated version of the function, please install the development version of dartR.base—see the first line in the code snippet below.

I've also included an example of how to join two datasets with the same loci but different individuals.

Let me know if you run into any issues.

Cheers,
Luis

devtools::install_github("green-striped-gecko/dartR.base@dev")
library(dartRverse)
# loading datasets
t1 <- readRDS("dataset1.rds")
t2 <- readRDS("dataset2.rds")
# getting common loci for t1
loc_common_t1 <- which(locNames(t1) %in% locNames(t2) == TRUE)
t1a <- gl.keep.loc(t1,loc.list = locNames(t1)[loc_common_t1])
# getting common loci for t2
loc_common_t2 <- which(locNames(t2) %in% locNames(t1) == TRUE)
t2a <- gl.keep.loc(t2,loc.list = locNames(t2)[loc_common_t2])
# oredring by loci
t1a <- t1a[,order(t1a$loc.names)]
t2a <- t2a[,order(t2a$loc.names)]
# joining datasets by loci in common
t3 <- gl.join(x1 = t1a,
              x2 = t2a,
              method = "join.by.loc")
Message has been deleted

Rizki Awaludin

unread,
Dec 16, 2025, 10:54:42 PM12/16/25
to dartR

Hi, 

I’m trying to merge multiple genlight datasets at once using gl.join(x1, x2, method = "join.by.loc"), but I think it can only handle merging a maximum of two datasets. When I need to merge more than two, I have to do it step-by-step.

Is there an efficient way to merge several genlight objects in one go, especially when they share the same marker set? In some cases, I also need to use the code Luis shared to check whether the marker order is consistent.

I really appreciate everyone’s help. Thanks.

Warm regards,

Rizki

Jose Luis Mijangos

unread,
Dec 17, 2025, 5:31:31 AM12/17/25
to dartR

Hi Rizki,

You are right that gl.join() is designed to merge two genlight objects at a time. However, you can efficiently extend this to multiple objects by applying it iteratively over a list using Reduce(). This avoids manual, step-by-step joins and works well when all datasets share the same marker set. For example:

devtools::install_github("green-striped-gecko/dartR.base@dev")
library(dartRverse)

g1 <- platypus.gl
g2 <- platypus.gl
g3 <- platypus.gl

genlight_list <- list(g1, g2, g3)

f1 <- Reduce(
  f = \(x, y) gl.join(x, y, method = "join.by.loc"),
  x = genlight_list
)

Cheers,
Luis

Krishna Pavan Kumar Komanduri

unread,
May 5, 2026, 2:17:28 AM (9 days ago) May 5
to dartR
Hi Luis,

My dataset consists of three SNP files co-analysed by DArT and contains the same individuals but different loci (split due to file size, I guess).

Following up on your recommendation to use gl.join(..., method = "join.by.loc") for split genlight files, I get a warning that the method parameter is deprecated, followed by an error:

genlight_list <- list(gl_11646_1, gl_11646_2, gl_11646_3)

> f1 <- Reduce(
+   f = \(x, y) gl.join(x, y, method = "join.by.loc"),
+   x = genlight_list
+ )
Starting gl.join
  Processing SNP data
  Warning: The parameter method is deprecated, no longer required  Concatenating two genlight objects, x and y with the same loci, different individuals
Error in rbind(deparse.level, ...) :
  Objects have different numbers of loci (31654 vs 28696).
Called from: rbind(deparse.level, ...)

When I remove the method argument and try running again, the first pairwise join succeeds, but the second iteration crashes with a row.names error during the locus metric concatenation:

> f1 <- Reduce(
+   f = \(x, y) gl.join(x, y),
+   x = genlight_list
+ )
Starting gl.join
  Processing SNP data
  Concatenating two genlight objects, x and y with the same individuals, different loci
  Concatenating the locus metrics
  Adding the individual metrics
  Setting the locus metrics flags
  Adding the locus metrics
  Adding the history
Completed: gl.join
Starting gl.join
  Processing SNP data
  Concatenating two genlight objects, x and y with the same individuals, different loci
Error in `.rowNamesDF<-`(x, value = value) : invalid 'row.names' length
Called from: `.rowNamesDF<-`(x, value = value)


I tried using cbind instead which joins the genlights but I am not sure if that is the correct approach. Could you please let me know if using cbind is acceptable? Or if there is a better way. 

> genlight_list <- list(gl_11646_1, gl_11646_2, gl_11646_3)
> f1 <- Reduce(cbind, genlight_list)
> f1 <- gl.compliance.check(f1)
Starting gl.compliance.check
  Processing genlight object with SNP data
  Warning: data include loci that are scored NA across all individuals.
  Consider filtering using gl <- gl.filter.allna(gl)
  Checking coding of SNPs
    SNP data scored NA, 0, 1 or 2 confirmed
  Checking for population assignments
    Population assignments confirmed
  Checking locus metrics and flags
  Recalculating locus metrics
  Checking for monomorphic loci
    Dataset contains monomorphic loci
  Checking for loci with all missing data
    Dataset contains loci with all missing dat
  Checking whether individual names are unique.
  Checking for individual metrics
    Individual metrics confirmed
  Spelling of coordinates checked and changed if necessary to
            lat/lon
Completed: gl.compliance.check

Thanks heaps!

Krishna

Jose Luis Mijangos

unread,
May 5, 2026, 3:30:10 AM (9 days ago) May 5
to da...@googlegroups.com
Hi  Krishna,

Can you please try the below:

  1. Clear your environment.
  2. Restart your R session
  3. Install dratR.base as follows:
    1. devtools::install_github("green-striped-gecko/dartR.base@main")

If that doesn’t work please get back to me. 

Cheers,
Luis 

Krishna Pavan Kumar Komanduri

unread,
May 5, 2026, 8:03:05 AM (9 days ago) May 5
to dartR
Hi Luis,

Thanks for the quick response. I gave it a try, but the error persists. I have attached a picture of the suggested debug highlight in case that helps.

> genlight_list <- list(gl_11646_1, gl_11646_2, gl_11646_3)
> f1 <- Reduce(
+   f = \(x, y) gl.join(x, y),
+   x = genlight_list
+ )
Starting gl.join
  Processing SNP data
  Concatenating two genlight objects, x and y with the same individuals, different loci
  Concatenating the locus metrics
  Adding the individual metrics
  Setting the locus metrics flags
  Adding the locus metrics
  Adding the history
Completed: gl.join
Starting gl.join
  Processing SNP data
  Concatenating two genlight objects, x and y with the same individuals, different loci
Error in `.rowNamesDF<-`(x, value = value) : invalid 'row.names' length
Called from: `.rowNamesDF<-`(x, value = value)


Browse[1]>

Screenshot 2026-05-05 215740.png


Cheers,
Krishna

Jose Luis Mijangos

unread,
May 11, 2026, 9:17:47 PM (2 days ago) May 11
to dartR

Hi Krishna,

Thank you for reporting that bug and sending a subset of your dataset, that makes debugging much easier. I fixed the bug, please install the developing version of dartR.base as shown below and try again. 

# Clean your environment
# RStudio > Menu > Session > Clear workspace
# Restart R Session
# RStudio > Menu > Session > Restart R
# installing developing version of dartR.base
devtools::install_github("green-striped-gecko/dartR.base@dev")

Cheers,
Luis 

Krishna Pavan Kumar Komanduri

unread,
1:07 AM (1 hour ago) 1:07 AM
to da...@googlegroups.com
Hi Luis,

Thanks a lot for helping out and fixing the issue. It's working now. 

Cheers,
Krishna Komanduri


You received this message because you are subscribed to a topic in the Google Groups "dartR" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dartr/LSwElu7vdUk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dartr+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dartr/6d8afc7c-a3d4-4b06-870b-84e32f4d79afn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages