Procrustes Variance calculations from scratch

Ricardo Ely

unread,

Oct 23, 2025, 8:30:34 AM10/23/25

to geomorph R package

Dear all,

To better my understanding of how morphological disparity is estimated for geometric morphometrics, I wanted to attempt coding calculation of Procrustes variances from scratch. Calculation of Procrustes variance via morphol.disparity() in geomorph compared to directly taking the trace of the estimated covariance matrix divided by the number of observations is yielding huge discrepancies in both approaches, however. In the code below, the estimation of Procrustes variance is inspired by the definition provided in the help file for morphol.disparity: "Procrustes variance is the same sum of the diagonal elements of the group covariance matrix divided by the number of observations in the group (e.g., Zelditch et al. 2012)." When I divide the covariance matrix trace by the number of observations (assuming that's the number of specimens/species/etc), this results in a very low estimate compared to the value provided by morphol.disparity. When just using the trace, the values are much closer to each other. So then, which is the correct approach?

Below I'm using the simplest case where we're estimating disparity for a whole landmark set, with no groups specified.

# simulate 20 3D landmarks for 50 specimens
coords <- array(rnorm(50 * 20 * 3), dim = c(20, 3, 50))

gdf <- list(coords = coords)

# Morphological disparity for entire data set
proc.var1 <- morphol.disparity(coords ~ 1, groups = NULL, data = gdf,
print.progress = FALSE)

# manual procrustes variance

proc.lm <- procD.lm(coords ~ 1, iter = 10000, data = gdf)
residual.lms <- arrayspecs(proc.lm$residuals, p = 20, k = 3)
cov.mat <- paleomorph::covar(residual.lms)
cov.mat <- as.matrix(Matrix::nearPD(cov.mat)$mat)

proc.var2 <- sum(diag(cov.mat))/dim(coords)[3] # divided by number of observations

# procrustes variances are nearly equal when only taking the sum of the covariance matrix trace
proc.var3 <- sum(diag(cov.mat))

Best,

Ricardo

Mike Collyer

unread,

Oct 23, 2025, 9:24:18 AM10/23/25

to geomorph-...@googlegroups.com

Ricardo,

I do not understand the concern but it looks like you could be using different covariance matrices?

This is what i get - Mike

> proc.var1 <- morphol.disparity(coords ~ 1, groups = NULL, data = gdf,

+ print.progress = FALSE)

No factor in formula or model terms from which to define groups.

Procrustes variance:

57.39704

> proc.lm <- procD.lm(coords ~ 1, iter = 10000, data = gdf)

> n <- proc.lm$LM$n

> Cov <- getResCov(proc.lm)

> sum(diag(Cov))

[1] 58.56841

> Cov <- Cov * (n - 1) / n # not n - 1 df

> sum(diag(Cov))

[1] 57.39704

>

> residual.lms <- arrayspecs(proc.lm$residuals, p = 20, k = 3)

> cov.mat <- paleomorph::covar(residual.lms)

Warning message:

In paleomorph::covar(residual.lms) :

CVM has negative eigenvalue -4.72260357166595e-16

> cov.mat <- as.matrix(Matrix::nearPD(cov.mat)$mat)

> sum(diag(cov.mat))

[1] 58.56841

>

> Cov[1:4, 1:4]

[,1] [,2] [,3] [,4]

[1,] 0.839641260 -0.004745748 -0.06877096 0.02690825

[2,] -0.004745748 1.054376854 0.03973566 0.04577489

[3,] -0.068770957 0.039735657 0.62869160 0.17361860

[4,] 0.026908251 0.045774886 0.17361860 1.27237192

> cov.mat[1:4, 1:4]

[,1] [,2] [,3] [,4]

[1,] 0.856776796 -0.004842599 -0.07017444 0.02745740

[2,] -0.004842599 1.075894748 0.04054659 0.04670907

[3,] -0.070174445 0.040546586 0.64152204 0.17716184

[4,] 0.027457400 0.046709068 0.17716184 1.29833869

>

> cov.mat2 <-nearPD(cov.mat)$mat

> cov.mat2[1:4, 1:4]

4 x 4 Matrix of class "dsyMatrix"

[,1] [,2] [,3] [,4]

[1,] 0.856776789 -0.004842599 -0.07017444 0.02745740

[2,] -0.004842599 1.075894744 0.04054659 0.04670907

[3,] -0.070174444 0.040546586 0.64152203 0.17716184

[4,] 0.027457399 0.046709068 0.17716184 1.29833869

> sum(diag(cov.mat2))

[1] 58.56841

--
You received this message because you are subscribed to the Google Groups "geomorph R package" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-pack...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/geomorph-r-package/50388e5a-436c-49e6-9563-99f352aaa91an%40googlegroups.com.

Ricardo Ely

unread,

Oct 23, 2025, 9:44:05 AM10/23/25

to geomorph R package

Mike,

I guess I needed clarification concerning whether we have to divide the trace of the covariance matrix by the number of specimens (procD.lm$LM$n) to get Procrustes variance, based on the definition provided in the help file for morphol.disparity() (see Details, first paragraph). But it looks like you actually multiply the trace by (n-1)/n, rather than simply dividing by n, although that's not what the documentation page for morphol.disparity apparently suggests, although perhaps I'm reading it incorrectly.

Best,

Ricardo

Miriam Zelditch

unread,

Oct 23, 2025, 9:50:08 AM10/23/25

to geomorph-...@googlegroups.com

There are two formulas for morphological disparity that differ in whether the sum of squares are divided by N or N-1. The metric that we used was from Foote, who used N -1 (because he viewed it as an estimate of the parameter). If you sum the diagonals of the covariance matrix, that uses the other formula. It sounds to me like you are getting very low values because you are using the formula: (sum of squares)/N/N.

--

You received this message because you are subscribed to the Google Groups "geomorph R package" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-pack...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/geomorph-r-package/50388e5a-436c-49e6-9563-99f352aaa91an%40googlegroups.com.

--

Miriam Zelditch
Associate Research Scientist
Museum of Paleontology
University of Michigan
Ann Arbor, MI 48109-1079

cell: 734-353-1671
phone: 734-764-0489
fax: 734-936-1380

Mike Collyer

unread,

Oct 23, 2025, 9:56:01 AM10/23/25

to geomorph-...@googlegroups.com

There are multiple ways this could be done. One can get the trace of the SSCP matrix and divide by n (or n - 1). One can divide the SSCP matrix by n (or n -1) and get the trace. One can get PC scores of the Covariance matrix, square them, and then sum them. One can get the PC scores of the SSCP matrix, square them, sum them, and divide by n (or n -1). All of these techniques should lead to the same result.

Mike

To view this discussion, visit https://groups.google.com/d/msgid/geomorph-r-package/b4fa72a2-b1e2-4805-a37b-5a3b664d8f76n%40googlegroups.com.

Mike Collyer

unread,

Oct 23, 2025, 4:11:02 PM10/23/25

to geomorph-...@googlegroups.com

Sorry, I forgot to clarify this. Multiplying the trace by (n - 1) / n change the covariance matrix from one found using n - 1 degrees of freedom to one with n degrees of freedom. If R is a matrix of residuals, then either

S <- crossprod(R) / n

or

S <- crossprod(R) / (n - k)

and k = 1 if the linear model only has an intercept (mean). The getResCov function does the latter and I corrected it to the former.

Mike

Reply all

Reply to author

Forward