flashPCA2 v PLINK2 pca approx

Vince Forgetta

unread,

Aug 10, 2017, 11:39:10 AM8/10/17

to flashpca-users

Hi,

Quick question. Should flashPCA2 give near identical results compared to PLINK --pca approx? In the past I have observed identical results when using EIGENSTRAT.

I have run flashPCA2 on a data set pruned/cleaned like so:

     plink --bfile ../bed/${i} --keep-allele-order --maf 0.0001 --geno 0.05 --hwe 0.000001 --remove IDsToExclude.txt --exclude $SHARE/DATA/GWAS/QC/high-LD-regions-hg19.txt --range --indep-pairwise 50 5 0.8 --out $i

Note the high LD cutoff. This was intentional as my aim is to use this snp set as input into BOLT-LMM for computing the GRM, and wanted to include as many SNPs as possible to capture relatedness within UKB.

I ran flashPCA2 like so:

    flashpca --bfile eur --suffix .flashpca

I ran PLINK --pca like so:

    plink2 --bfile eur --pca 20 approx --threads 20

I only realized after that flashPCA2 will return first 10 PC, where as PLINK returned 20 PC.

The first and second PC are highly correlated, but not perfectly so. I assume his may be due to the number of PC returned being different, but may also be differences in the algorithm.

Your thoughts on this are most appreciated.

Thanks,

Vince

Vince Forgetta

unread,

Aug 10, 2017, 1:50:11 PM8/10/17

to flashpca-users

I also generated 10 PC using plink:

plink2 --bfile eur --pca 10 approx --threads 20

Same issue in that PC1 is not perfectly correlated between plink and flashPCA2. Correlation ~ 0.993.

Thanks,

Vince

Gad Abraham

unread,

Sep 25, 2017, 7:23:36 AM9/25/17

to flashpca-users

Hi Vince,

Did you solve this issue?

Gad

Reply all

Reply to author

Forward