There are a few things to unpack here. First, the separation of groups in CVA of bgPCA plots is something of an illusion. The vectors are linear combinations of variables that best separate groups and because there are so many more variables than observations, there is some combination of variables that appear to make the groups separate. It is circular reasoning to ask the analysis to show the axes that best separate groups and then assess group differences by their separation on these “biased” axes. PCA reveals the axes with the most shape variation. There could be other factors than group differences that explain some of the shape variation. Groups can be different in shape but it can be difficult to see without explicit rotation to show group differences. This is what bgPCA attempts to do — rotate the data space to axes that best characterize group differences — but bgPCA can do this too well if p > n, without some accounting of the dimensional disparity with something like cross-validation.
One of the unfortunate things about our discipline is the merging of different analyses into one by name. There is no classification performed with bgPCA. This is an eigen decomposition of the fitted values of a linear model (which has groups as an effect) and projection of mean-centered datasets onto the eigen vectors. It is similar to redundancy analysis, which performs eigen decomposition on both fitted values and residuals. CVA is eigen decomposition of a matrix product: the inverse of the residual covariance matrix (from the same linear model used in bcPCA) times the fitted values covariance (from the same linear model used in bcPCA), followed by projection of mean-centered data onto the eigenvectors. CVA DOES NOT PERFORM CLASSIFICATION. However, most software that performs CVA also performs a form of classification, although there are other ways it could be performed.
Classification should find the posterior probabilities of group association based on the prior probabilities of group association. The form of classification that is typically presented with CVA assumes equal priority probabilities (which might be silly if group sizes vary greatly). By doing this, posterior probabilities are directly proportional to Mahalanobis distances (smaller distance to a group mean means higher posterior probability to be associated with that group). A post-CVA classification analysis might not provide the actual probabilities, but rather, whether individuals were correctly classified. Keep in mind that with 10 groups, for example, a posterior probability of 11% (larger than 1/10) means correct classification to the correct group, even if underwhelming support. But Mahalanobis distance requires inverting a covariance matrix, which is impossible if p > n, as the matrix is singular. Something would have to be done to contrive the distance, whether it means using a generalized inverse or fewer PCs than n - 1.
To be clear, CVA and bgPCA find eigenvectors (canonical vectors and principal components, respectively) and project data onto these vectors. These analyses stop at this point. I have no idea what classification with bgPCA is but if it means using projected scores from bgPCA for classification, this is simply a wrong thing to do and should be avoided.