PCA vs MDS

158 views

Skip to first unread message

wimgo...@hotmail.com

unread,

Dec 3, 2018, 10:13:52 AM12/3/18

to plink2-users

I am struggling with the way PLINK calculates PCA/MDS.

Before, I always thought PLINK looked at 'common variance/similarity' at a genotype level.

However, I've performed a small test run with 3 hypothetical IDs with each 4 SNPs (see additional files). These IDs all have the same genotype (AA, GT, TT, GT), however, they differ in A1 and A2. For example, ID1 has alleles G T for SNP2, ID 2 has alleles T G.

When I perform a pca in plink, these three IDs appear completely different (see additional file), although they have the same genotypes.

Does this imply that the --pca function in PLINK performs a PCA column by column (so allele by allele) and not by genotype?

When I perform MDS, these three IDs appear completely the same (additional file).

Can I assume for this that MDS takes genotypes into account in its calculation?

My script (in R) was as follows:

system(paste(plink," --chr-set 18 --allow-extra-chr --file ./Merge/Test --read-genome ./PCA/Test.genome --cluster --pca --out ./PCA/Test_Clean_noXY",sep=""))

bestand<-"Test"

#load eigenvalues and eigenvectors

eigenval<-read_csv(as.character(paste(getwd(),"/PCA/",bestand,"_CLEAN_noxy.eigenval", sep="")),

col_names = FALSE, col_types = cols(X1 = col_number()))

eigenvec<- read_delim(as.character(paste(getwd(),"/PCA/",bestand,"_CLEAN_noxy.eigenvec", sep="")),

" ", escape_double = FALSE, col_names = FALSE,

trim_ws = TRUE)

eigenvec

plot(eigenvec$X3,eigenvec$X4,xlab="PC1",ylab="PC2",main="PCA_example")

#first make genome file

system(paste(plink," --chr-set 18 --allow-extra-chr --file ./Merge/Test --genome --out ./PCA/Test",sep=""))

#second do cluster

system(paste(plink," --chr-set 18 --allow-extra-chr --file ./Merge/Test --read-genome ./PCA/Test.genome --cluster --mds-plot 3 --out ./PCA/Test",sep=""))

test<-read.table("./PCA/test.mds",header=T)

plot(test$C1,test$C2,xlab="C1",ylab="C2",main="MDS_example")

Test.ped

Test.map

PCA_example.PNG

MDS_example.PNG

Christopher Chang

unread,

Dec 3, 2018, 10:57:25 AM12/3/18

to plink2-users

PCA with *exactly* identical genotypes will yield essentially random results due to floating point error (you basically have a bunch of 0/0 terms). This never comes up in practice if you use appropriate MAF and HWE filters.

Reply all

Reply to author

Forward

0 new messages