Hi Nigus,
In most situations where a reference genome is required, such as LD decay, sliding-window summaries, selection scans, or any analysis that relies on physical distance, it only makes sense to use one reference genome. The validity of these analyses depends on having a coherent genomic context, so mixing scaffold-level and chromosome-level coordinates usually introduces inconsistencies. However, if you can tell me more about your specific downstream application, we can give more tailored suggestions.
In general, I recommend using the reference genome with the highest number of SNPs successfully mapped and then filtering out SNPs that were not placed onto chromosomes. Below is example code showing how to:
- Assign chromosome and position information,
- Plot SNP density per chromosome, and
- Remove SNPs that were not mapped (i.e., unmapped or chr_blank).
For the SNP density plot, you’ll need the development version of dartR.base. Before installing it, remember to clean your R environment: Session → Clear Workspace and then Session → Restart R.
Hope this helps, and I’m happy to assist further if needed.
Cheers,
Luis
# Install developing version of dartR.base
devtools::install_github("green-striped-gecko/dartR.base@dev")
library(dartRverse)
# Example dataset
t1 <- platypus.gl
# ---- Assign chromosome information ----
# In this dataset, chromosome info is stored here:
t1@chromosome <- as.factor(t1$other$loc.metrics$Chrom_Platypus_Chrom_NCBIv1)
# ---- Assign chromosome positions ----
# Position information is stored here:
t1@position <- as.integer(t1$other$loc.metrics$ChromPos_Platypus_Chrom_NCBIv1)
# ---- Plot SNP density per chromosome ----
gl.plot.snp.density(
t1,
bin.size = 1e6, # 1 Mb bins
min.snps = 50,
min.length = 1e6
)
# ---- Remove SNPs not mapped (pos = 0 or NA) ----
t2 <- gl.filter.locmetric(
t1,
metric = "ChromPos_Platypus_Chrom_NCBIv1",
lower = 1,
upper = max(t1@position, na.rm = TRUE),
keep = "within"
)
# Number of loci after filtering
nLoc(t2)
> t2 <- gl.filter.locmetric(
+ nig_gl,
+ metric = "Pos",
+ lower = 1,
+ upper = max(nig_gl@position, na.rm = TRUE),
+ keep = "within"
+ )
> # Number of loci after filtering
> nLoc(t2)
[1] 10557
> nInd(t2)
[1] 274
> gl1 <- gl.filter.callrate(t2,
+ method = "loc",
+ threshold = 0.80,
+ recalc = TRUE)
Completed: gl.filter.callrate
> nLoc(gl1)
[1] 9510
> gl2 <- gl.filter.maf(gl1, threshold = 0.05)
Completed: gl.filter.maf
> nLoc(gl2)
[1] 5577
> nInd(gl2)
[1] 274
> gl3 <- gl.filter.callrate(gl2,
+ method = "ind",
+ threshold = 0.70,
+ recalc = TRUE)
Completed: gl.filter.callrate
> nLoc(gl3)
[1] 5577
> nInd(gl3)
[1] 260
> gl4 <- gl.filter.reproducibility(gl3, threshold = 0.95)
Completed: gl.filter.reproducibility
> nLoc(gl4)
[1] 4452
> gl5 <- gl.filter.monomorphs(gl4, v=0)
> nLoc(gl5)
[1] 4452
> gl6 <- gl.filter.heterozygosity(x = gl5,
+ t.upper = 0.15,
+ t.lower = 0,
+ verbose = 3)
Starting gl.filter.heterozygosity
Processing genlight object with SNP data
Retaining individuals with heterozygosity in the range 0 to 0.15
Minimum individual heterozygosity 0.0363
Maximum individual heterozygosity 0.4819
Completed: gl.filter.heterozygosity
> nLoc(gl6)
[1] 4452
> nInd(gl6)
[1] 189
> excess_report <- gl.filter.excess.het(x = gl6,
+ Yates = FALSE, verbose = 3)
Starting gl.filter.excess.het
Processing genlight object with SNP data
Calculating loci statistics from observed data Testing loci for high heterozygosity...
Error in `$<-.data.frame`(`*tmp*`, "En0", value = NA) :
replacement has 1 row, data has 0
Best Regards,
Nigus
--
You received this message because you are subscribed to the Google Groups "dartR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dartr+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dartr/1161df97-8c06-49d4-aa42-9491d0986704n%40googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dartr/2ce87ff2-afd3-413d-a3a1-92b6fb61dbc4n%40googlegroups.com.