flashPCA2 v PLINK2 pca approx

Skip to first unread message

Vince Forgetta

Aug 10, 2017, 11:39:10 AM8/10/17
to flashpca-users

Quick question.  Should flashPCA2 give near identical results compared to PLINK --pca approx? In the past I have observed identical results when using EIGENSTRAT.

I have run flashPCA2 on a data set pruned/cleaned like so:

     plink --bfile ../bed/${i} --keep-allele-order --maf 0.0001 --geno 0.05 --hwe 0.000001 --remove IDsToExclude.txt --exclude $SHARE/DATA/GWAS/QC/high-LD-regions-hg19.txt --range --indep-pairwise 50 5 0.8 --out $i

Note the high LD cutoff. This was intentional as my aim is to use this snp set as input into BOLT-LMM for computing the GRM, and wanted to include as many SNPs as possible to capture relatedness within UKB.

I ran flashPCA2 like so:

    flashpca --bfile eur --suffix .flashpca

I ran PLINK --pca like so:

    plink2 --bfile eur --pca 20 approx --threads 20

I only realized after that flashPCA2 will return first 10 PC, where as PLINK returned 20 PC.

The first and second PC are highly correlated, but not perfectly so.  I assume his may be due to the number of PC returned being different, but may also be differences in the algorithm.

Your thoughts on this are most appreciated.



Vince Forgetta

Aug 10, 2017, 1:50:11 PM8/10/17
to flashpca-users
I also generated 10 PC using plink:

    plink2 --bfile eur --pca 10 approx --threads 20

Same issue in that PC1 is not perfectly correlated between plink and flashPCA2. Correlation ~ 0.993.



Gad Abraham

Sep 25, 2017, 7:23:36 AM9/25/17
to flashpca-users
Hi Vince,

Did you solve this issue?

Reply all
Reply to author
0 new messages