I’m subsetting the original set of 600k bi-allelic loci by applying quality filters
(geno, mind, maf, and pruning using PLINK) down to ca. 55k loci. However, with 50k burnin +
50k MCMC cycles based on my estimation the analyses for one run with 55k loci and 4000 individuals would
take approximately 40 days, which is not feasible given the time constraints of
my project. I have seen other threads in this forum that suggest reduction of the
loci and claim that, if the loci are not under selection and are not in linkage
equilibrium, the result should not differ much (are there also quantitative
studies that support this statement?). My question is also how to best subset
such set of ca. 55k loci and how far down should I subset it? I was thinking
about using smartPCA that reports most informative SNPs for each principal component (so called eigbestsnp)
and to reduce down to ca. 10k loci. Have you ever encountered such approach and
would this be something you would recommend?
On a side note, I was wondering if you are aware of concordance studies between
STRUCTURE and ADMIXTURE. In theory I could run the reduced set of 55k bi-allelic
loci with ADMIXTURE and then compare the results with STURCTURE (using the
smaller set of 150 markers). However, I would then be comparing results of two different softwares that in theory both optimize for HWE but use a different optimization algorithm. What would you think about such approach?
I’m happy to provide more information if anything is unclear.
Thank you and best regards,
Peter