Dear Max, Marcel, all,
Just to share my experience with parallel analysis, in case anyone else also wants to use a systematic method for finding the number of shape modes to retain:
-I extracted the shape matrix from my registered dataset to a csv file using this small scala code (I have 33 shapes, including reference, each having 5166 points in their pointset):
package com.example
import java.io.{BufferedWriter, File, FileWriter, FilenameFilter}
import scalismo.io.MeshIO
object extractShapeMatrix {
def main(args: Array[String]): Unit = {
scalismo.initialize()
implicit val rng = scalismo.utils.Random(42)
val shapesDir = "E:\\data\\"
val shapes = new File(shapesDir).listFiles(new FilenameFilter {
override def accept(dir: File, name: String): Boolean = name.endsWith(".stl")
}).map(shapeFile=>
MeshIO.readMesh(shapeFile).get
)
var result = ""
(0 to shapes.length-1 by 1).foreach { shapeNum =>
println("processing shape #"+shapeNum)
val shapePointsIterator = shapes(shapeNum).pointSet.points.zipWithIndex.map{case(p,i) => (p.x, p.y, p.z)}
while(shapePointsIterator.hasNext){
val next = shapePointsIterator.next()
result = result + next._1 + ","+ next._2 + "," + next._3 + ","
}
result = result + "\n"
}
val outputFile = new File(shapesDir+"shapeMatrix.csv")
val bw = new BufferedWriter(new FileWriter(outputFile))
bw.write(result)
}
}
then in this piece of python code, I'm reading this shape matrix in a pandas dataframe, mean-std normalize it, then generate a random shape matrix from the same distribution and the same size and calculate the PCA for both real and random dataset.
import pandas as pd
import numpy as np
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
shapeMatrix = pd.read_csv("E:\\data\\shapeMatrix.csv")
shapeMatrix.dropna(axis=1, inplace=True)
normalized_shapeMatrix=(shapeMatrix-shapeMatrix.mean())/shapeMatrix.std()
pca = PCA(shapeMatrix.shape[0]-1)
pca.fit(normalized_shapeMatrix)
transformedShapeMatrix = pca.transform(normalized_shapeMatrix)
#np.savetxt("pca_data.csv", pca.explained_variance_, delimiter=",")
random_eigenvalues = np.zeros(shapeMatrix.shape[0]-1)
for i in range(10):
random_shapeMatrix = pd.DataFrame(np.random.normal(0, 1, [shapeMatrix.shape[0], shapeMatrix.shape[1]]))
pca_random = PCA(shapeMatrix.shape[0]-1)
pca_random.fit(random_shapeMatrix)
transformedRandomShapeMatrix = pca_random.transform(random_shapeMatrix)
random_eigenvalues = random_eigenvalues+pca_random.explained_variance_ratio_
random_eigenvalues = random_eigenvalues / 10
#np.savetxt("pca_random.csv", random_eigenvalues, delimiter=",")
plt.plot(pca.explained_variance_ratio_, '--bo', label='pca-data')
plt.plot(random_eigenvalues, '--rx', label='pca-random')
plt.legend()
plt.title('parallel analysis plot')
plt.show()
comparing the eigenvalue ratios for the first components shows till which component the eigenvalue of data is greater than eigenvalue of random samples, which is an indicator of the number of shape modes to retain (8 modes in this example). this is the chart that this code generates (figure in the attachment)
I'd be grateful if you can share your opinion with me in this regard.
Best wishes,
Saeed