About model validation

90 views
Skip to first unread message

Bereket Workie Anteneh

unread,
Jul 12, 2021, 7:57:06 AMJul 12
to scalismo
Dear Scalismo community,

I was built SSM of proximal femur with 24 training sets. I want to do model validation using PCA; (Generalization ability, specificity and compactness) of the model in Scalismo. Meanwhile, I was a little bit challenged to write the code  for performing the validation. 
So, would you have the code that simulate the validation via Generalization ability, specificity and compactness? Anybody who can share me the code is very appreciable.


With regards, 

Marcel Luethi

unread,
Jul 13, 2021, 3:51:33 AMJul 13
to Bereket Workie Anteneh, scalismo
Hi Bereket

The following code snippet computes the generalisation, specificity and the variance (which you need for computing compactness).

I hope this helps to get you started. You might also want to read the API doc of the individual methods to learn what the parameters mean.

Best regards,

Marcel
val reference : TriangleMesh[_3D] = ???
val trainingData : Seq[TriangleMesh[_3D]] = ???
val dataCollection : DataCollection[_3D, TriangleMesh, EuclideanVector[_3D]] = DataCollection.fromTriangleMesh3DSequence(reference, trainingData)
val model = PointDistributionModel.createUsingPCA(dataCollection)

val generalisation : Double = ModelMetrics.generalization(model, dataCollection).get
val specificity : Double = ModelMetrics.specificity(model, trainingData, 10)
// Variance captured in first 5 components
val variance : Double = breeze.linalg.sum(model.gp.variance(0 to 5))

--
You received this message because you are subscribed to the Google Groups "scalismo" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scalismo+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scalismo/d28b13b0-ab65-4c27-89f2-a461be5721bcn%40googlegroups.com.

Bereket Workie Anteneh

unread,
Jul 15, 2021, 8:25:40 AMJul 15
to scalismo
Thank you Marcel for sharing what I want,
But still now I have a question specially in the code line 2 that is "val trainingData : Seq[TriangleMesh[_3D]] = ???"
The path of my training data was at "C:\\Users\\user\\Documents\\minimal-scalismo-seed-master\\data\\2Projecetd Meshes\\"
When I try to declare the path as shown below
val trainingData : Seq[TriangleMesh[_3D]] = MeshIO.readMesh(new File("C:\\Users\\user\\Documents\\minimal-scalismo-seed-master\\data\\2Projecetd Meshes\\")).get
It says an error, it says required: "Seq[TriangleMesh[_3D]] found: TriangleMesh [_3D]". 
I tried a lot to fix it but it challenges me. So what would be the correct path declaration based on the training data directory that I mentioned above? 

With regards,

Marcel Luethi

unread,
Jul 16, 2021, 10:39:18 PMJul 16
to Bereket Workie Anteneh, scalismo
Hi Bereket

As the type signature says, you need a sequence of triangle meshes. To obtain such a sequence, you will need to read several meshes individually, typically within a for loop. The following code illustrates how this can be done. In the example, all files in a directory that end with .ply are read and returned as a sequence.

def readMeshesFromDirectory(directory : java.io.File) : Seq[TriangleMesh[_3D]] = {
val plyFiles = directory.listFiles().filter(file => file.getName.endsWith(".ply"))
for (plyFile <- plyFiles) yield {
MeshIO.readMesh(plyFile).get
}
}
readMeshesFromDirectory(new java.io.File("directoryContainingMeshes"))
Best regards,

Marcel

Bereket Workie Anteneh

unread,
Jul 18, 2021, 10:50:45 AMJul 18
to Marcel Luethi, scalismo
Thank you Marcel, now it works well.
After I run the code you provide, the following numbers are seen in the result panel,
generalization 1.4810818231927857E-6
specificity    1.4436597052211486
variance       437794.1382516096
What does these numbers standsfore/indicate?

With regards,

Marcel Luethi

unread,
Jul 19, 2021, 2:41:05 AMJul 19
to Bereket Workie Anteneh, scalismo
Hi Bereket

The basic units in Scalismo is mm, which means you have a generaisation of 1.48e-6 mm (so almost 0) a specificity of 1.44 mm and a variance of 437794 mm^2

The metrics are described in depth in this paper:

Note that in the paper they describe compactness as the percentage of variance captured by the first n principal components. But once you know how to compute the variance this is easy to compute.

Best regards,

Marcel

Bereket Workie Anteneh

unread,
Jul 23, 2021, 8:30:55 AMJul 23
to Marcel Luethi, scalismo
Thank you again Marcel, 
 From the code you provided " val specificity : Double = ModelMetrics.specificity(model, trainingData, 10)"
What does number 10 stand for? Is it changeable in each PC component/mode or is it taken as a constant?

Andreas Morel-Forster

unread,
Jul 27, 2021, 4:52:25 AMJul 27
to scalismo
Hi Bereket

The number 10 indicates that the metric is calculated based on 10 samples. To get a more precise and reliable information, you might have to use there a higher number. See the documentation of the function attached below for more information.

Best, Andreas



/**
* Returns the specificity metric of the Statistical Mesh Model, that is how close the model remains to the category of shapes it is supposed to represent
* @param pcaModel model to be evaluated
* @param data test data to verify specificity against
* @param nbSamples number of samples drawn to compute the average
*
* The implementation of this metric is inspired from :
* Styner, Martin A., et al. "Evaluation of 3D correspondence methods for model building." Information processing in medical imaging. Springer Berlin Heidelberg, 2003.
*
* The general idea is as follows :
*
*
* 1 - sample a shape from the mesh model
*
*
* 2- compute the average mesh distance (see [[scalismo.mesh.MeshMetrics]]) of the sample to all elements of the given sequence of meshes and select the minimum distance
*
* These steps are then repeated nbSamples times and the average value is returned.
*
*/

Elon

unread,
Jul 29, 2021, 10:37:05 PMJul 29
to scalismo
Thank you Marcel for providing the code snippet. However, when evaluating the following line, I receive an IllegalArgumentException:
val dataCollection : DataCollection[_3D, TriangleMesh, EuclideanVector[_3D]] = DataCollection.fromTriangleMesh3DSequence(reference, trainingSet)
Does anyone know why this error is occurring?
Full stack trace:
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed
at scala.Predef$.require(Predef.scala:327)
at scalismo.statisticalmodel.dataset.DataCollection$.differenceFieldToReference(DataCollection.scala:97)
at scalismo.statisticalmodel.dataset.DataCollection$.$anonfun$fromTriangleMesh3DSequence$1(DataCollection.scala:109)
at scala.collection.immutable.ArraySeq.$anonfun$map$1(ArraySeq.scala:71)
at scala.collection.immutable.ArraySeq.$anonfun$map$1$adapted(ArraySeq.scala:71)
at scala.collection.immutable.ArraySeq$.tabulate(ArraySeq.scala:288)
at scala.collection.immutable.ArraySeq$.tabulate(ArraySeq.scala:267)
at scala.collection.ClassTagIterableFactory$AnyIterableDelegate.tabulate(Factory.scala:664)
at scala.collection.immutable.ArraySeq.map(ArraySeq.scala:71)
at scala.collection.immutable.ArraySeq.map(ArraySeq.scala:35)
at scalismo.statisticalmodel.dataset.DataCollection$.fromTriangleMesh3DSequence(DataCollection.scala:108)
at ssm.robustness$.main(robustness.scala:37)
at ssm.robustness.main(robustness.scala)

Marcel Luethi

unread,
Aug 2, 2021, 4:47:16 AMAug 2
to Elon, scalismo
Dear Elon

The typical cause for this message is that your meshes are not in correspondence. You can easily check this by printing out the number of points for each mesh:
val trainingSet : Seq[TriangleMesh[_3D]] = ???
for (mesh <- trainingSet) {
println(mesh.pointSet.numberOfPoints)
}

Best regards,

Marcel

Elon

unread,
Aug 2, 2021, 2:30:39 PMAug 2
to scalismo
Hi Marcel,

Thank you, that makes sense. I managed to get everything working.

Cheers,
Elon

Michael Lennon

unread,
Aug 13, 2021, 4:02:19 AMAug 13
to scalismo
Hi Marcel,

I was looking at this previous response by you and have a question about generalisation.  In the code snippet you use dataCollection as an input, which I believe contains the meshes that were used to build the SSM. This is in effect the training set of meshes. I understand that you would use the training set on the specificity metric but my interpretation is that generalisation metric should be derived using a test set of meshes against the SSM (as per API doc). So if I want to calculate generality of a model should I use a set of meshes  (or a single mesh?) that are independent of the meshes used to train the SSM?

Best regards,

Michael

Marcel Luethi

unread,
Aug 16, 2021, 4:18:37 AMAug 16
to Michael Lennon, scalismo

Hi Michael

Thanks for pointing this out. The correct way to do it is to either have a separate test set or to use Leave-one-out cross validation. Methods for crossvalidation are implemented in Scalismo in the object
scalismo.statisticalmodel.dataset.CrossValidation
Best regards,

Marcel


Michael Lennon

unread,
Aug 18, 2021, 2:54:29 AMAug 18
to scalismo
Thank you Marcel,

I am interested in leave-one-out cross validation to compare models.  I am unable to work out how to apply this on a simple triangle mesh PDM model created using a dataCollection in Scalismo.  I believe I can execute this at a basic level by inputting a data collection and an evaluation function but I am unclear on what the evaluation function would look like so any examples you have would be appreciated. I am also cognisant that I could build different models using one data collection such as a truncated model only using first 5 principal modes.  Would your method allow me to compare a metric for full and truncated models?

val referenceMesh: TriangleMesh[_3D] = ???
val trainingMeshes: Seq[TriangleMesh[_3D]] = ???
val dataCollection = DataCollection.fromTriangleMeshSequence(referenceMesh, trainingMeshes)
val pdmTriangleMesh = PointDistributionModel.createUsingPCA(dataCollection)
val truncatedMesh = pdmTriangleMesh.truncate(5)
val generality_full = Crossvalidation.leaveOneOutCrossvalidation(dc, evalFun = ?)
val generality_truncated = Crossvalidation.leaveOneOutCrossvalidation(dc, evalFun = ?)

Best regards,

Michael

Marcel Luethi

unread,
Aug 19, 2021, 2:59:42 AMAug 19
to Michael Lennon, scalismo

Hi Michael,

The following is a basic example of how to use the crossvalidation. You can run it using either a truncated model or a full one. You should see that the truncated model has a much higher generalisation error.

val referenceMesh: TriangleMesh[_3D] = ???
val trainingMeshes: Seq[TriangleMesh[_3D]] = ???
val dataCollection = DataCollection.fromTriangleMesh3DSequence(referenceMesh, trainingMeshes)

// computes whatever metric is appropriate to compare the left out mesh with the model.
// For generalisation it is a projection into the model, followed by the distance
def evalFun(pdm : PointDistributionModel[_3D, TriangleMesh], mesh : TriangleMesh[_3D]) : Double = {
MeshMetrics.avgDistance(pdm.project(mesh), mesh)
}
val generality_full = Crossvalidation.leaveOneOutCrossvalidation(dataCollection, evalFun)

Best regards,
Marcel


Elon

unread,
Aug 19, 2021, 4:54:40 PMAug 19
to scalismo
Hi Marcel,

Thanks for the example code. The function returns a Vector of ArraySeq. Example:
Vector(ArraySeq(0.8292422412339956), ArraySeq(0.8595790418832963), ArraySeq(0.02897680945534543))
I'm assuming we need to take the median of these doubles in order to calculate the generality as described in Styner et al. Is this correct?

Much appreciated,
Elon

Marcel Luethi

unread,
Aug 20, 2021, 2:59:55 AMAug 20
to Elon, scalismo
Hi Elon

Yes, you somehow need to aggregate it. I am not sure though if Styner proposed to use the mean or the median.

Best regards,
Marcel

Michael Lennon

unread,
Sep 3, 2021, 3:18:10 AMSep 3
to scalismo
Hi Scalismo community,

Is anyone able to help me with the code to test a truncated model for generality.  The precious code from Marcel was helpful for testing the full model as the input is the dataCollection used to build that model plus the evaluation function. However if I truncate the model down to a rank 5 I am unable to work out how to include this in the Crossvalidation.  I believe this would be related to the optional "biasModelAndRank" parameter.  Any help would be greatly appreciated as I am unable to determine how to do this.

val referenceMesh: TriangleMesh[_3D] = ???
val trainingMeshes: Seq[TriangleMesh[_3D]] = ???
val dataCollection = DataCollection.fromTriangleMesh3DSequence(referenceMesh, trainingMeshes)
val pdmTriangleMesh = PointDistributionModel.createUsingPCA(dataCollection)
val truncatedMesh = pdmTriangleMesh.truncate(5)

// computes whatever metric is appropriate to compare the left out mesh with the model.
// For generalisation it is a projection into the model, followed by the distance
def evalFun(pdm : PointDistributionModel[_3D, TriangleMesh], mesh : TriangleMesh[_3D]) : Double = {
MeshMetrics.avgDistance(pdm.project(mesh), mesh)
}
val generality_full = Crossvalidation.leaveOneOutCrossvalidation(dataCollection, evalFun)
val generality_truncated = Crossvalidation.leaveOneOutCrossvalidation(dataCollectionevalFun, biasModelAndRank???)

Marcel Luethi

unread,
Sep 5, 2021, 4:12:57 AMSep 5
to Michael Lennon, scalismo
Dear Micheal

Leave one out cross validation works by virtue of the datacollection. It just selects all but one datasets and builts from them a model. The left out one is then used to test. It is not possible to specify the rank of the model directly. But you can do the truncation in the eval fun.

The parameter biasModelAndRank is used to provide an optional model, with which the model built from the data is augmented.

Best regards,
Marcel

Reply all
Reply to author
Forward
0 new messages