SNP FILTERING

6 views
Skip to first unread message

Nigus Belay

unread,
Dec 8, 2025, 2:43:00 AM (6 days ago) Dec 8
to dartR
Hello, I am trying to filter a DArTseq SNP data. The SNP data mapped in to two reference genome, one is based on scaffold level and the second one based on chromosome level (SNP assigned in to A and B subgenomes). Based on chromosome level, some SNPs are not assigned in to A and B subgenomes , instead it appears as chr_blank with 0 pos for some SNPs while others with   chr_blank with  pos different from 0. Is it possible to use scaffold and chromosome level filtering simultaneously. How could proceed with chr_blank for downstream analysis?

Thanks

Nigus

Jose Luis Mijangos

unread,
Dec 8, 2025, 4:55:18 PM (5 days ago) Dec 8
to dartR

Hi Nigus,

In most situations where a reference genome is required, such as LD decay, sliding-window summaries, selection scans, or any analysis that relies on physical distance, it only makes sense to use one reference genome. The validity of these analyses depends on having a coherent genomic context, so mixing scaffold-level and chromosome-level coordinates usually introduces inconsistencies. However, if you can tell me more about your specific downstream application, we can give more tailored suggestions.

In general, I recommend using the reference genome with the highest number of SNPs successfully mapped and then filtering out SNPs that were not placed onto chromosomes. Below is example code showing how to:

- Assign chromosome and position information,

- Plot SNP density per chromosome, and

- Remove SNPs that were not mapped (i.e., unmapped or chr_blank).

For the SNP density plot, you’ll need the development version of dartR.base. Before installing it, remember to clean your R environment: Session → Clear Workspace and then Session → Restart R.

Hope this helps, and I’m happy to assist further if needed.

Cheers,
Luis

# Install developing version of dartR.base
devtools::install_github("green-striped-gecko/dartR.base@dev")
library(dartRverse)
# Example dataset
t1 <- platypus.gl
# ---- Assign chromosome information ----
# In this dataset, chromosome info is stored here:
t1@chromosome <- as.factor(t1$other$loc.metrics$Chrom_Platypus_Chrom_NCBIv1)
# ---- Assign chromosome positions ----
# Position information is stored here:
t1@position <- as.integer(t1$other$loc.metrics$ChromPos_Platypus_Chrom_NCBIv1)
# ---- Plot SNP density per chromosome ----
gl.plot.snp.density(
  t1,
  bin.size  = 1e6,   # 1 Mb bins
  min.snps  = 50,
  min.length = 1e6
)
# ---- Remove SNPs not mapped (pos = 0 or NA) ----
t2 <- gl.filter.locmetric(
  t1,
  metric = "ChromPos_Platypus_Chrom_NCBIv1",
  lower = 1,
  upper = max(t1@position, na.rm = TRUE),
  keep = "within"
)
# Number of loci after filtering
nLoc(t2)

Reply all
Reply to author
Forward
0 new messages