Comparing EIGENSTRAT smaprtpca and PLINK pca

1,256 views
Skip to first unread message

Vince Forgetta

unread,
Jul 2, 2016, 8:03:13 PM7/2/16
to plink2-users
Hi,

Happy PLINK user here. Particularly PLINK 1.9, great work!

I am quite impressed at the performance of --pca in PLINK 1.9, and so far I am not interested in the advanced features of EIGENSTRAT, so using PLINK --pca is highly desirable.

However,  when I compare the 10 pc from using PLINK --pca (affy.png) to those from EIGENSTRAT's smartpca.perl (affy.eigen.png) I get very different results.

Here is code I used for generating the pc:

# PLINK
plink --bfile  affy --pca --out affy --neighbor 1 5
awk '$5<=-6||$5>=6{print}' affy.nearest > affy.outliers

# EIGENSTRAT
# Sample names too long so put dummy ones for now.
awk '{ print NR,NR,$3,$4,$5,$6 }' affy.fam > affy.eigen.fam
smartpca.perl -i affy.bed -a affy.bim -b affy.eigen.fam -p affy.eigen.plot -o affy.eigen.pca -e affy.eigen.eval -l affy.eigen.log
 
So, should I expect to get comparable pair plots from using these programs, or are the pc equivalent in aggregate somehow?

Thanks,

Vince
affy.png
affy.eigen.png

Christopher Chang

unread,
Aug 15, 2016, 12:56:31 PM8/15/16
to plink2-users
smartpca normally alternates between computing PCs and removing outliers from the dataset, where outlier is defined as "more than 6 sigma from the mean along one of the top 10 just-computed PCs", until either no more outliers exist or outliers have already been removed 5 times.  This is frequently a good idea, and a reason to prefer smartpca to plink's basic --pca.  However, if you want directly comparable results, you can disable outlier removal by adding "-m 0" to your smartpca.perl invocation.
Reply all
Reply to author
Forward
0 new messages