Merge datasets

82 views
Skip to first unread message

Lyndal Hulse

unread,
Jun 26, 2025, 1:26:42 AMJun 26
to dartR
Hi dartR Team,

Firstly, it was wonderful meeting most of the dartR team at the ICCB 2025 workshop.

Secondly, I was hoping you could help me with my query.  I have two vcf files (with associated metadata files) containing loci generated from the same SNP panel which I have uploaded into dartRverse.  The two vcf files contain genotype data from different populations but each file has a different number of loci, although there will be some overlapping loci between the two files.

Is it possible to merge the two vcf files (with attached individual metrics) to make one file and then identify sex-linked markers based on sex indentification metadata of some individuals?  I then want to use gl.infer.sex to determine individuals with unknown sex.

If it helps, I can email you the files.  I really hope this is feasible...

Kind regards,
Lyndal

nanisrobledo

unread,
Jun 26, 2025, 10:52:36 PMJun 26
to dartR
Hi Lyndal,

It was lovely meeting you at ICCB!

I think there are a couple options for what you want to do: (1) merging the vcf files first, with bcftools merge for example, then transform the global vcf into a dartR genlight object with gl.read.vcf, or (2) transforming each vcf into a dartR genlight object and merging them with dartR.base::gl.join. Remember to apply gl.keep.sexlinked before gl.infer.sex. For example:

LBP_sexLinked <- dartR.sexlinked::gl.keep.sexlinked(x = LBP, system = "xy", plot.display = TRUE, ncores = 1) inferred.sexes <- dartR.sexlinked::gl.infer.sex(gl_sexlinked = LBP_sexLinked, system = "xy", seed = 100) inferred.sexes # The new sexes will be in this data frame

Best,
Diana

Lyndal Hulse

unread,
Jul 1, 2025, 7:49:15 PMJul 1
to dartR
Hi Diana,

Thankyou for the advice.  However, I'm having issues merging two dartR genlight objects.
This is the error I'm receiving:

> Watergum_new <- gl.join(Watergum, Val) Starting gl.join Error in gl.join(Watergum, Val) :
Fatal Error: the two genlight objects do not have data for the same individuals in the same order
 
My two genlight objects have different individuals, but the same loci.  I want to merge the individuals from both genlight objects into one genlight object.  Is this possible?

Cheers,
Lyndal

Jose Luis Mijangos

unread,
Jul 1, 2025, 8:08:09 PMJul 1
to da...@googlegroups.com

Hi Lyndal,

 

Could you please send me your datasets to my personal e-mail (luis.m...@gmail.com) so I can have a closer look.

 

Cheers,

Luis

 

--
You received this message because you are subscribed to the Google Groups "dartR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dartr+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dartr/ed8ffdf8-e6cd-4edf-a612-6425ddf5d31cn%40googlegroups.com.

Jose Luis Mijangos

unread,
Jul 3, 2025, 12:21:04 AMJul 3
to dartR
Hi Lyndal,

The issue has now been resolved. To use the updated version of the function, please install the development version of dartR.base—see the first line in the code snippet below.

I've also included an example of how to join two datasets with the same loci but different individuals.

Let me know if you run into any issues.

Cheers,
Luis

devtools::install_github("green-striped-gecko/dartR.base@dev")
library(dartRverse)
# loading datasets
t1 <- readRDS("dataset1.rds")
t2 <- readRDS("dataset2.rds")
# getting common loci for t1
loc_common_t1 <- which(locNames(t1) %in% locNames(t2) == TRUE)
t1a <- gl.keep.loc(t1,loc.list = locNames(t1)[loc_common_t1])
# getting common loci for t2
loc_common_t2 <- which(locNames(t2) %in% locNames(t1) == TRUE)
t2a <- gl.keep.loc(t2,loc.list = locNames(t2)[loc_common_t2])
# oredring by loci
t1a <- t1a[,order(t1a$loc.names)]
t2a <- t2a[,order(t2a$loc.names)]
# joining datasets by loci in common
t3 <- gl.join(x1 = t1a,
              x2 = t2a,
              method = "join.by.loc")
Message has been deleted

Rizki Awaludin

unread,
Dec 16, 2025, 10:54:42 PM (16 hours ago) Dec 16
to dartR

Hi, 

I’m trying to merge multiple genlight datasets at once using gl.join(x1, x2, method = "join.by.loc"), but I think it can only handle merging a maximum of two datasets. When I need to merge more than two, I have to do it step-by-step.

Is there an efficient way to merge several genlight objects in one go, especially when they share the same marker set? In some cases, I also need to use the code Luis shared to check whether the marker order is consistent.

I really appreciate everyone’s help. Thanks.

Warm regards,

Rizki

Jose Luis Mijangos

unread,
5:31 AM (10 hours ago) 5:31 AM
to dartR

Hi Rizki,

You are right that gl.join() is designed to merge two genlight objects at a time. However, you can efficiently extend this to multiple objects by applying it iteratively over a list using Reduce(). This avoids manual, step-by-step joins and works well when all datasets share the same marker set. For example:

devtools::install_github("green-striped-gecko/dartR.base@dev")
library(dartRverse)

g1 <- platypus.gl
g2 <- platypus.gl
g3 <- platypus.gl

genlight_list <- list(g1, g2, g3)

f1 <- Reduce(
  f = \(x, y) gl.join(x, y, method = "join.by.loc"),
  x = genlight_list
)

Cheers,
Luis

Reply all
Reply to author
Forward
0 new messages