PCA vs MDS

158 views
Skip to first unread message

wimgo...@hotmail.com

unread,
Dec 3, 2018, 10:13:52 AM12/3/18
to plink2-users
I am struggling with the way PLINK calculates PCA/MDS.
Before, I always thought PLINK looked at 'common variance/similarity' at a genotype level.
However, I've performed a small test run with 3 hypothetical IDs with each 4 SNPs (see additional files). These IDs all have the same genotype (AA, GT, TT, GT), however, they differ in A1 and A2. For example, ID1 has alleles G T for SNP2, ID 2 has alleles T G.
When I perform a pca in plink, these three IDs appear completely different (see additional file), although they have the same genotypes.
Does this imply that the --pca function in PLINK performs a PCA column by column (so allele by allele) and not by genotype?

When I perform MDS, these three IDs appear completely the same (additional file).
Can I assume for this that MDS takes genotypes into account in its calculation?

My script (in R) was as follows:

system(paste(plink," --chr-set 18 --allow-extra-chr --file ./Merge/Test --read-genome ./PCA/Test.genome --cluster --pca --out ./PCA/Test_Clean_noXY",sep=""))

bestand<-"Test"

#load eigenvalues and eigenvectors
eigenval<-read_csv(as.character(paste(getwd(),"/PCA/",bestand,"_CLEAN_noxy.eigenval", sep="")), 
                   col_names = FALSE, col_types = cols(X1 = col_number()))
eigenvec<- read_delim(as.character(paste(getwd(),"/PCA/",bestand,"_CLEAN_noxy.eigenvec", sep="")), 
                      " ", escape_double = FALSE, col_names = FALSE, 
                      trim_ws = TRUE)

eigenvec

plot(eigenvec$X3,eigenvec$X4,xlab="PC1",ylab="PC2",main="PCA_example")


#first make genome file
system(paste(plink," --chr-set 18 --allow-extra-chr --file ./Merge/Test --genome --out ./PCA/Test",sep=""))
#second do cluster
system(paste(plink," --chr-set 18 --allow-extra-chr --file ./Merge/Test --read-genome ./PCA/Test.genome --cluster --mds-plot 3 --out ./PCA/Test",sep=""))

test<-read.table("./PCA/test.mds",header=T)

plot(test$C1,test$C2,xlab="C1",ylab="C2",main="MDS_example")

Test.ped
Test.map
PCA_example.PNG
MDS_example.PNG

Christopher Chang

unread,
Dec 3, 2018, 10:57:25 AM12/3/18
to plink2-users
PCA with *exactly* identical genotypes will yield essentially random results due to floating point error (you basically have a bunch of 0/0 terms).  This never comes up in practice if you use appropriate MAF and HWE filters.
Reply all
Reply to author
Forward
0 new messages