Hi,
Quick question. Should flashPCA2 give near identical results compared to PLINK --pca approx? In the past I have observed identical results when using EIGENSTRAT.
I have run flashPCA2 on a data set pruned/cleaned like so:
plink --bfile ../bed/${i} --keep-allele-order --maf 0.0001 --geno 0.05 --hwe 0.000001 --remove IDsToExclude.txt --exclude $SHARE/DATA/GWAS/QC/high-LD-regions-hg19.txt --range --indep-pairwise 50 5 0.8 --out $i
Note the high LD cutoff. This was intentional as my aim is to use this snp set as input into BOLT-LMM for computing the GRM, and wanted to include as many SNPs as possible to capture relatedness within UKB.
I ran flashPCA2 like so:
flashpca --bfile eur --suffix .flashpca
I ran PLINK --pca like so:
plink2 --bfile eur --pca 20 approx --threads 20
I only realized after that flashPCA2 will return first 10 PC, where as PLINK returned 20 PC.
The first and second PC are highly correlated, but not perfectly so. I assume his may be due to the number of PC returned being different, but may also be differences in the algorithm.
Your thoughts on this are most appreciated.
Thanks,
Vince