Procrustes Variance calculations from scratch

11 views
Skip to first unread message

Ricardo Ely

unread,
Oct 23, 2025, 8:30:34 AMOct 23
to geomorph R package
Dear all, 

To better my understanding of how morphological disparity is estimated for geometric morphometrics, I wanted to attempt coding calculation of Procrustes variances from scratch. Calculation of Procrustes variance via morphol.disparity() in geomorph compared to directly taking the trace of the estimated covariance matrix divided by the number of observations is yielding huge discrepancies in both approaches, however. In the code below, the estimation of Procrustes variance is inspired by the definition provided in the help file for morphol.disparity: "Procrustes variance is the same sum of the diagonal elements of the group covariance matrix divided by the number of observations in the group (e.g., Zelditch et al. 2012)." When I divide the covariance matrix trace by the number of observations (assuming that's the number of specimens/species/etc), this results in a very low estimate compared to the value provided by morphol.disparity. When just using the trace, the values are much closer to each other. So then, which is the correct approach? 

Below I'm using the simplest case where we're estimating disparity for a whole landmark set, with no groups specified.  


# simulate 20 3D landmarks for 50 specimens
coords <- array(rnorm(50 * 20 * 3), dim = c(20, 3, 50))

gdf <- list(coords = coords)

# Morphological disparity for entire data set
proc.var1 <- morphol.disparity(coords ~ 1, groups = NULL, data = gdf,
                  print.progress = FALSE)

# manual procrustes variance

proc.lm <- procD.lm(coords ~ 1, iter = 10000, data = gdf)
residual.lms <- arrayspecs(proc.lm$residuals, p = 20, k = 3)
cov.mat <- paleomorph::covar(residual.lms)
cov.mat <- as.matrix(Matrix::nearPD(cov.mat)$mat)

proc.var2 <- sum(diag(cov.mat))/dim(coords)[3] # divided by number of observations

# procrustes variances are nearly equal when only taking the sum of the covariance matrix trace
proc.var3 <- sum(diag(cov.mat))

Best, 

Ricardo 

Mike Collyer

unread,
Oct 23, 2025, 9:24:18 AMOct 23
to geomorph-...@googlegroups.com
Ricardo,

I do not understand the concern but it looks like you could be using different covariance matrices?

This is what i get - Mike

> proc.var1 <- morphol.disparity(coords ~ 1, groups = NULL, data = gdf,
+                                print.progress = FALSE)
No factor in formula or model terms from which to define groups.
Procrustes variance:
57.39704 
> proc.lm <- procD.lm(coords ~ 1, iter = 10000, data = gdf)
> n <- proc.lm$LM$n
> Cov <- getResCov(proc.lm)
> sum(diag(Cov))
[1] 58.56841
> Cov <- Cov * (n - 1) / n # not n - 1 df
> sum(diag(Cov))
[1] 57.39704
> residual.lms <- arrayspecs(proc.lm$residuals, p = 20, k = 3)
> cov.mat <- paleomorph::covar(residual.lms)
Warning message:
In paleomorph::covar(residual.lms) :
  CVM has negative eigenvalue -4.72260357166595e-16
> cov.mat <- as.matrix(Matrix::nearPD(cov.mat)$mat)
> sum(diag(cov.mat))
[1] 58.56841
> Cov[1:4, 1:4]
             [,1]         [,2]        [,3]       [,4]
[1,]  0.839641260 -0.004745748 -0.06877096 0.02690825
[2,] -0.004745748  1.054376854  0.03973566 0.04577489
[3,] -0.068770957  0.039735657  0.62869160 0.17361860
[4,]  0.026908251  0.045774886  0.17361860 1.27237192
> cov.mat[1:4, 1:4]
             [,1]         [,2]        [,3]       [,4]
[1,]  0.856776796 -0.004842599 -0.07017444 0.02745740
[2,] -0.004842599  1.075894748  0.04054659 0.04670907
[3,] -0.070174445  0.040546586  0.64152204 0.17716184
[4,]  0.027457400  0.046709068  0.17716184 1.29833869
> cov.mat2 <-nearPD(cov.mat)$mat
> cov.mat2[1:4, 1:4]
4 x 4 Matrix of class "dsyMatrix"
             [,1]         [,2]        [,3]       [,4]
[1,]  0.856776789 -0.004842599 -0.07017444 0.02745740
[2,] -0.004842599  1.075894744  0.04054659 0.04670907
[3,] -0.070174444  0.040546586  0.64152203 0.17716184
[4,]  0.027457399  0.046709068  0.17716184 1.29833869
> sum(diag(cov.mat2))
[1] 58.56841


--
You received this message because you are subscribed to the Google Groups "geomorph R package" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-pack...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/geomorph-r-package/50388e5a-436c-49e6-9563-99f352aaa91an%40googlegroups.com.

Ricardo Ely

unread,
Oct 23, 2025, 9:44:05 AMOct 23
to geomorph R package
Mike, 

I guess I needed clarification concerning whether we have to divide the trace of the covariance matrix by the number of specimens (procD.lm$LM$n) to get Procrustes variance, based on the definition provided in the help file for morphol.disparity() (see Details, first paragraph). But it looks like you actually multiply the trace by (n-1)/n, rather than simply dividing by n, although that's not what the documentation page for morphol.disparity apparently suggests, although perhaps I'm reading it incorrectly.

Best, 

Ricardo    

Miriam Zelditch

unread,
Oct 23, 2025, 9:50:08 AMOct 23
to geomorph-...@googlegroups.com
There are two formulas for morphological disparity that differ in whether the sum of squares are divided by N or N-1. The metric that we used was from Foote, who used N -1 (because he viewed it as an estimate of the parameter).  If you sum the diagonals of the covariance matrix, that uses the other formula. It sounds to me like you are getting very low values because you are using the formula: (sum of squares)/N/N. 

--
You received this message because you are subscribed to the Google Groups "geomorph R package" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-pack...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/geomorph-r-package/50388e5a-436c-49e6-9563-99f352aaa91an%40googlegroups.com.


--
Miriam Zelditch
Associate Research Scientist
Museum of Paleontology
University of Michigan
Ann Arbor, MI  48109-1079

Mike Collyer

unread,
Oct 23, 2025, 9:56:01 AMOct 23
to geomorph-...@googlegroups.com
There are multiple ways this could be done.  One can get the trace of the SSCP matrix and divide by n (or n - 1).  One can divide the SSCP matrix by n (or n -1) and get the trace.  One can get PC scores of the Covariance matrix, square them, and then sum them.  One can get the PC scores of the SSCP matrix, square them, sum them, and divide by n (or n -1).  All of these techniques should lead to the same result.

Mike

Mike Collyer

unread,
Oct 23, 2025, 4:11:02 PMOct 23
to geomorph-...@googlegroups.com
Sorry, I forgot to clarify this.  Multiplying the trace by (n - 1) / n change the covariance matrix from one found using n - 1 degrees of freedom to one with n degrees of freedom.  If R is a matrix of residuals, then either 

S <- crossprod(R) / n

or

S <- crossprod(R) / (n - k)

and k = 1 if the linear model only has an intercept (mean).  The getResCov function does the latter and I corrected it to the former.

Mike
Reply all
Reply to author
Forward
0 new messages