Hi Nigus,
In most situations where a reference genome is required, such as LD decay, sliding-window summaries, selection scans, or any analysis that relies on physical distance, it only makes sense to use one reference genome. The validity of these analyses depends on having a coherent genomic context, so mixing scaffold-level and chromosome-level coordinates usually introduces inconsistencies. However, if you can tell me more about your specific downstream application, we can give more tailored suggestions.
In general, I recommend using the reference genome with the highest number of SNPs successfully mapped and then filtering out SNPs that were not placed onto chromosomes. Below is example code showing how to:
- Assign chromosome and position information,
- Plot SNP density per chromosome, and
- Remove SNPs that were not mapped (i.e., unmapped or chr_blank).
For the SNP density plot, you’ll need the development version of dartR.base. Before installing it, remember to clean your R environment: Session → Clear Workspace and then Session → Restart R.
Hope this helps, and I’m happy to assist further if needed.
Cheers,
Luis
# Install developing version of dartR.base
devtools::install_github("green-striped-gecko/dartR.base@dev")
library(dartRverse)
# Example dataset
t1 <- platypus.gl
# ---- Assign chromosome information ----
# In this dataset, chromosome info is stored here:
t1@chromosome <- as.factor(t1$other$loc.metrics$Chrom_Platypus_Chrom_NCBIv1)
# ---- Assign chromosome positions ----
# Position information is stored here:
t1@position <- as.integer(t1$other$loc.metrics$ChromPos_Platypus_Chrom_NCBIv1)
# ---- Plot SNP density per chromosome ----
gl.plot.snp.density(
t1,
bin.size = 1e6, # 1 Mb bins
min.snps = 50,
min.length = 1e6
)
# ---- Remove SNPs not mapped (pos = 0 or NA) ----
t2 <- gl.filter.locmetric(
t1,
metric = "ChromPos_Platypus_Chrom_NCBIv1",
lower = 1,
upper = max(t1@position, na.rm = TRUE),
keep = "within"
)
# Number of loci after filtering
nLoc(t2)