Interpreting group discrimination when p > n: CVA vs bgPCA

12 views
Skip to first unread message

Dominika Bujnakova

unread,
Sep 22, 2025, 4:29:52 AM (6 days ago) Sep 22
to geomorph R package

Hi all,

I have a question regarding group discrimination tests in morphometrics.

I’m working with a dataset where p > n (many landmarks, roughly half the number are specimens). In a test for differences between two groups, only ~3% of shape variation is explained by the groups, which is statistically significant with a large effect size (Z ≈ 4.5–5.5).

  • When I plot PC1 vs PC2, there is strong overlap between groups, and together these two axes explain only about 30% of variation. Overlap remains apparent with PC3 and PC4 combinations.
  • I performed CVA to assess discriminatory power. In the CVA histogram, the groups appear well separated, with >80–90% of specimens correctly classified.
  • In contrast, using bgPCA, the histogram shows overlapping groups, with ~73% correctly classified, which seems more consistent with what is observed in the PCA.

Questions:

  1. Given that bgPCA appears to reflect the observed overlap in PCA, is it more trustworthy than the CVA results, or should the classification accuracy of CVA be prioritized? I understand that CVA will always show better discriminatory power than bgPCA, but which one is generally recommended?
  2. How does p > n influence the results of CVA, bgPCA, or even procD.lm?

Thanks in advance for any guidance!

Domi

Mike Collyer

unread,
Sep 22, 2025, 6:19:44 AM (6 days ago) Sep 22
to geomorph-...@googlegroups.com
Hi Domi,

Questions:

  1. Given that bgPCA appears to reflect the observed overlap in PCA, is it more trustworthy than the CVA results, or should the classification accuracy of CVA be prioritized? I understand that CVA will always show better discriminatory power than bgPCA, but which one is generally recommended?
There are a few things to unpack here.  First, the separation of groups in CVA of bgPCA plots is something of an illusion.  The vectors are linear combinations of variables that best separate groups and because there are so many more variables than observations, there is some combination of variables that appear to make the groups separate.  It is circular reasoning to ask the analysis to show the axes that best separate groups and then assess group differences by their separation on these “biased” axes.  PCA reveals the axes with the most shape variation.  There could be other factors than group differences that explain some of the shape variation.  Groups can be different in shape but it can be difficult to see without explicit rotation to show group differences.  This is what bgPCA attempts to do — rotate the data space to axes that best characterize group differences — but bgPCA can do this too well if p > n, without some accounting of the dimensional disparity with something like cross-validation.

One of the unfortunate things about our discipline is the merging of different analyses into one by name.  There is no classification performed with bgPCA.  This is an eigen decomposition of the fitted values of a linear model (which has groups as an effect) and projection of mean-centered datasets onto the eigen vectors.  It is similar to redundancy analysis, which performs eigen decomposition on both fitted values and residuals.  CVA is eigen decomposition of a matrix product: the inverse of the residual covariance matrix (from the same linear model used in bcPCA) times the fitted values covariance (from the same linear model used in bcPCA), followed by projection of mean-centered data onto the eigenvectors.  CVA DOES NOT PERFORM CLASSIFICATION.  However, most software that performs CVA also performs a form of classification, although there are other ways it could be performed.

Classification should find the posterior probabilities of group association based on the prior probabilities of group association.  The form of classification that is typically presented with CVA assumes equal priority probabilities (which might be silly if group sizes vary greatly).  By doing this, posterior probabilities are directly proportional to Mahalanobis distances (smaller distance to a group mean means higher posterior probability to be associated with that group).  A post-CVA classification analysis might not provide the actual probabilities, but rather, whether individuals were correctly classified.  Keep in mind that with 10 groups, for example, a posterior probability of 11% (larger than 1/10) means correct classification to the correct group, even if underwhelming support.  But Mahalanobis distance requires inverting a covariance matrix, which is impossible if p > n, as the matrix is singular.  Something would have to be done to contrive the distance, whether it means using a generalized inverse or fewer PCs than n - 1.  

To be clear, CVA and bgPCA find eigenvectors (canonical vectors and principal components, respectively) and project data onto these vectors.  These analyses stop at this point.  I have no idea what classification with bgPCA is but if it means using projected scores from bgPCA for classification, this is simply a wrong thing to do and should be avoided.


  1. How does p > n influence the results of CVA, bgPCA, or even procD.lm?
Technically, it does not influence CVA or pbPCA at all.  It influences classification. If p > n, it means finding an alternative for the covariance matrix so that it can be inverted.  Any choice to do this is arbitrary.  If you know of software or a function that does classification along with bgPCA, make sure to know what it is doing.  Something is amiss here.

Because there is no matrix inversion to worry about, procD.lm is unaffected by p > n.

We have a function, prep.lda in RRPP, which allows one to take control of the decision for how many PCs to use for classification and choice of prior probabilities before performing LDA (linear discriminant analysis, same as CVA) using the MASS::lda function in R.  By doing this, one can take ownership of their classification rather than rely on the contrivances some other functions might use.  The take-home point is to know exactly how classification is performed in any function that provides it.  Along the way, some assumptions are made, and they might not align with your analytical goals.

Hope that helps!
Mike




Reply all
Reply to author
Forward
0 new messages