I am struggling with the way PLINK calculates PCA/MDS.
Before, I always thought PLINK looked at 'common variance/similarity' at a genotype level.
However, I've performed a small test run with 3 hypothetical IDs with each 4 SNPs (see additional files). These IDs all have the same genotype (AA, GT, TT, GT), however, they differ in A1 and A2. For example, ID1 has alleles G T for SNP2, ID 2 has alleles T G.
When I perform a pca in plink, these three IDs appear completely different (see additional file), although they have the same genotypes.
Does this imply that the --pca function in PLINK performs a PCA column by column (so allele by allele) and not by genotype?
When I perform MDS, these three IDs appear completely the same (additional file).
Can I assume for this that MDS takes genotypes into account in its calculation?
My script (in R) was as follows:
system(paste(plink," --chr-set 18 --allow-extra-chr --file ./Merge/Test --read-genome ./PCA/Test.genome --cluster --pca --out ./PCA/Test_Clean_noXY",sep=""))
bestand<-"Test"
#load eigenvalues and eigenvectors
eigenval<-read_csv(as.character(paste(getwd(),"/PCA/",bestand,"_CLEAN_noxy.eigenval", sep="")), 
                   col_names = FALSE, col_types = cols(X1 = col_number()))
eigenvec<- read_delim(as.character(paste(getwd(),"/PCA/",bestand,"_CLEAN_noxy.eigenvec", sep="")), 
                      " ", escape_double = FALSE, col_names = FALSE, 
                      trim_ws = TRUE)
eigenvec
plot(eigenvec$X3,eigenvec$X4,xlab="PC1",ylab="PC2",main="PCA_example")
#first make genome file
system(paste(plink," --chr-set 18 --allow-extra-chr --file ./Merge/Test --genome --out ./PCA/Test",sep=""))
#second do cluster
system(paste(plink," --chr-set 18 --allow-extra-chr --file ./Merge/Test --read-genome ./PCA/Test.genome --cluster --mds-plot 3 --out ./PCA/Test",sep=""))
test<-read.table("./PCA/test.mds",header=T)
plot(test$C1,test$C2,xlab="C1",ylab="C2",main="MDS_example")