Based on the dataset you shared with me, the contrasting patterns between your SilicoDArT and SNP datasets can be attributed to the presence of individuals from two different species/subspecies and low DNA quality/quantity in some samples.
1. Low DNA quality/quantity.
2. Mutations at restriction enzyme recognition sites. Restriction enzymes and DArT's calling pipelines are designed to target one species. Individuals from a different species/subspecies will have mutations at these recognition sites, causing restriction enzymes to not bind to the target sites, resulting in missing data in the SNP dataset.
In contrast, in SilicoDArT data, these mutations at the recognition sites are shown as absences rather than missing data.
Given the patterns in your dataset, which result from a combination of missing data due to low DNA quality and the presence of individuals from different species/subspecies, simply using a threshold of missing data to remove individuals is not ideal.
Instead, we can use the presence/absence data from the SilicoDArT to set a threshold to identify individuals from different species/subspecies, as demonstrated in the code below and the attached smearplot in which the individuals fro m a different species/subspecies are shown at the bottom of the plot in blue.
After removing individuals from different species/subspecies, you can see that the PCAs from SilicoDArT and SNP data are now quite similar (attached).
Another option to improve your data quality would be to ask DArT to re-analyse your data separately for each of the two species/subspecies.
Cheers,
Luis
library(dartRverse)
# reading SilicoDArT
t2 <- gl.read.silicodart(your_silicoDArT)
# calculating mean presence/absence by individual
ind_abs_pres <- round(rowMeans(as.matrix(t2),na.rm = T),2)
t2a <- t2
# concatenating presence/absence with individual names to visualise in smearplot
indNames(t2a) <- paste0(ind_abs_pres,"_",indNames(t2a))
# subsampling loci
t2a <- gl.subsample.loci(t2a,n=5000)
# using ind.labels = T to plot individuals in alphabetical order in a smearplot
gl.smearplot(t2a,ind.labels = TRUE)
# histogram and table to decide on a threshold to drop individuals
hist(ind_abs_pres,breaks = 50)
table(ind_abs_pres)
# getting individual names to drop with leass than 0.9 presence/absence
ind_drop <- indNames(t2)[which(ind_abs_pres < 0.9)]
# dropping individuals
t2wo <- gl.drop.ind(t2,ind.list =ind_drop )
# filtering on call rate
t2wo <- gl.filter.callrate(t2wo,threshold = 1)
# pca
pca_t2wo <- gl.pcoa(t2wo)
# pca plot
gl.pcoa.plot(pca_t2wo,t2wo)
# reading SNP data
t1 <- gl.read.dart(your_DArT_report)
# dropping individuals
t1wo <- gl.drop.ind(t1,ind.list = ind_drop)
# filtering on call rate
t1wo <- gl.filter.callrate(t1wo,threshold = 1)
# pca
pcoa_t1wo <- gl.pcoa(t1wo)
# pca plot
gl.pcoa.plot(pcoa_t1wo,t1wo)