Interpretation of phylogenetic signal: phyPCA versus PACA

333 views
Skip to first unread message

mariekev...@gmail.com

unread,
Jan 10, 2023, 5:44:37 AM1/10/23
to geomorph R package
Hi everyone,

I'm trying to interpret the results on phylogenetic signal of my ulnar shape data. First, Kmult = 0.90 (p = 0.0001) which means that closely related species show a similar distal ulnar shape, though slightly less than expected under a BM model of evolution, despite the large effect size (4.08). But K on the first component was quite large (1.79, p = 0.001) so Kmult = 0.90 is not a weak phylogenetic signal (but concentrated in some but not all variables of the original data).

When I investigate phylogenetic signal along the components of a phyPCA and PACA, I find the following results:
- phyPCA comp 1: K = 0.58 (p = 0.04)
- phyPCA comp 2: K = 1.60 (p = 0.001)
- phyPCA comp 3: K = 0.58 (p = 0.06)

- PACA comp 1: K = 0.66 (p = 0.03)
- PACA comp 2: K = 2.14 (p = 0.001)
- PACA comp 3: K = 1.03 (p = 0.009)

I would have expected that the phylogenetic signal in the first component of PACA to be larger because PACA normally aligns with phylogenetic signal. The phylogenetic signal on the phyPCA is low for the first component which is expected as it is mostly independent of phylogenetic signal. Additionally, the shape variations described along the second component are the same for both phyPCA and PACA (a wider ulnar head + larger distal projection), and the phylogenetic signal is the highest in this second component. Does the larger phylogenetic signal along the second component mean that these shape variations are more aligned with phylogeny than those on the first component?

When looking at the summary of both the phyPCA and PACA, this is the outcome (see image attached). I'm also not sure how I need to interpret these results, even after reading various publications (e.g. the RV, or what this information can tell me about the ancestors, ...).

Thanks for your help! I really appreciate it.

Marie.

Summary PACA and phyPCA.jpg

Mike Collyer

unread,
Jan 10, 2023, 7:19:29 AM1/10/23
to geomorph-...@googlegroups.com
Dear Marie,

Your results sure are challenging!  I’m not sure I can help, but I’ll start with the things that are obvious.

PACA (and to a certain extent, PCA) is like 2B-PLS, maximizing covariance between two matrices.  The data are found in the first matrix and the second matrix is the covariance matrix based on phylogenetic covartiances.  (For PCA, the second matrix is an identity matrix.)  So let’s start with the stats that are more naïve regarding phylogenetic signal.  The singular values are like the square root of eigenvalues in PCA, explaining the component strength of association.  One can add up the squared singular values and divide by the single squared values for each component to get a percentage of the overall co-variation between matrices expressed by each component.  You can see that the first component explains 96.8% of the overall covariation, which is quite impressive.  However, what if there is little overall covariation between matrices?  Would 96.8% of practically nothing be all that impressive?  The RV explains that covariation as portion of the joint variation between matrices (more like a portion of variance).  It is like a squared correlation coefficient, or a coefficient of determination. The first component explains 63.7% of the joint variation, which is also impressive.  The second 1.2%, the third 0.2%, etc.  So, the analysis is doing what it is supposed to do, maximizing covariation between the two matrices in the first dimension.

However, when looking at the dispersion among points after projecting data on components, there is more dispersion among points on the second component, whether looking at tips or ancestors, which is why K is probably larger on the second component.  It is difficult to picture a PAC plot in my mind that shows the phylogeny most aligned with the first PAC but more dispersion on the second PAC.  Maybe there is a tree that has two subclades, split from a really deep node but not much morphological difference between these subclades compared to within them, even though the branch lengths separating species within subclades are small.  This is a guess — it’s hard to visualize — but I believe you might have an interesting tree?

Maybe if you could provide an illustration of your tree, along with PC and PAC plots, it would be easier to offer an explanation.  It is truly bizarre that your first PAC explains so much of the covariation with your data set but more dispersion is found along the second component.  I just cannot reconcile this with hypothetical mental pictures.

Best,
Mike

--
You received this message because you are subscribed to the Google Groups "geomorph R package" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-pack...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/geomorph-r-package/71ebda25-9b44-4cb2-9d4d-4c9ee1dc8bben%40googlegroups.com.
<Summary PACA and phyPCA.jpg>

mariekev...@gmail.com

unread,
Jan 10, 2023, 7:49:52 AM1/10/23
to geomorph R package
Hi Mike,

Yes, it's quite challenging to interpret these results. Attached you can find a phyPCA and PACA, and the general phylogenetic tree that is implemented in the analyses. What we do see, for example, is a shape shift in the hominids (green colour on the plots) relative to the other primates but I still find it hard to implement the results on phylogenetic signal etc ... Especially because this signal is largest on component 2 for both the phyPCA and PACA.

Maybe the 'larger' difference between Ateles and Cebus (both belong to the platyrrhine clade, purple colour) might cause these results? The sample size of both species is quite small (2 for Ateles and 3 for Cebus, I did not have access to more ...), can this also influence the overall result?

Thanks for taking a look! :-) 

Marie.

pPCA plot.jpg
PACA plot.jpg
Primate tree.jpg

mariekev...@gmail.com

unread,
Jan 12, 2023, 8:18:42 AM1/12/23
to geomorph R package
Hi Mike,

Sorry to bother you again, but I was wondering if you've already had the time to take a look at my phyPCA and PACA plots (to help interpret the results)?
The resubmission deadline is on Sunday this week, and it would really be helpful to have some input from someone who has experience on phylogenetic analyses :-) 

Thanks!

Marie.

Op dinsdag 10 januari 2023 om 13:49:52 UTC+1 schreef mariekev...@gmail.com:

mariekev...@gmail.com

unread,
Aug 9, 2023, 9:46:33 AM8/9/23
to geomorph R package
Hi Mike,

Would you be able to help me with the additional interpretation of my data?

Thank you so much!

Marie.

Op donderdag 12 januari 2023 om 14:18:42 UTC+1 schreef mariekev...@gmail.com:

Mike Collyer

unread,
Aug 9, 2023, 11:45:24 AM8/9/23
to geomorph R package
Dear Marie,

If you are referring to your previous email in this thread, where you attached some figures, I apologize for not having seen them before.  For some reason, gmail has a way of sequestering emails from this google group and placing them in an “important” folder, but not in my inbox.  I have tried several times to change settings but to no avail.  I often miss emails, as a result.

My main interpretation is to use a plotting technique that preserves the aspect ratio.  The one you are using stretches the second axis, unnecessarily, and inappropriately too much.  This gives the illusion that there is more variation along the second axis.  For PCA, this is an impossibility.  For phy-PCA it is possible, because the first axis is aligned to minimize association with phylogeny, and if phylogeny is associated with the main axis of variation (in PCA, not phy-PCA), then it is possible to get more variation along the second axis.  With PACA, it is trickier.  The first few axes will align most with phylogenetic signal but — especially using Kmult as a measure — it could be possible to get more signal along the second axis if there was an early divergence between two clades and then within-clade divergence that had different rates of evolution within than between clades.  The first PAC will align most with variation among ancestral character states, but it is possible that after this axis is found, there could be greater association of evolutionary divergence with a BM model along the second axis.

Also, are you looking at phylogenetic signal by axis or the cumulative phylogenetic signal by axis (where for axis 2 it would be the phylogenetic signal in axes 1 and 2, combined)?  That might help explain an odd result.

Hope this was helpful in some way.

Mike

--
You received this message because you are subscribed to the Google Groups "geomorph R package" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-pack...@googlegroups.com.
Message has been deleted

mariekev...@gmail.com

unread,
Aug 10, 2023, 8:44:32 AM8/10/23
to geomorph R package
Hi Mike,

No problem! Thank you for this extra clarification. I must say that it can be quite hard sometimes to fully understand what is going on during these analyses.

So regarding your last sentence, the cumulative proportion on axis 2 is actually the one of axis 1 and 2 combined. What information does that give us then? 

Thanks!

Marie.


Op woensdag 9 augustus 2023 om 17:45:24 UTC+2 schreef mlco...@gmail.com:

Mike Collyer

unread,
Aug 10, 2023, 9:24:37 AM8/10/23
to geomorph R package
Hi Marie,

One could get PACA scores and use geomorph::physignal, for example, axis by axis, which seems to be what you have done.  One could also interpret physignal output for an analysis done on all variables.  The $PACA output would be the same as performing PACA with RRPP::ordinate or geomorph::gm.prcomp but the K.by.p output finds the phylogenetic signal in 1 PAC, then 1:2 PACs, then 1:3 PACs, etc.  The reason for this is that K can bounce around by dimension, but knowing the correlation between phylogeny and data is concentrated in the first few dimensions, using this cumulative dimension approach should show where K starts to attenuate by adding only noise to its calculation.  

Your original email indicated that the RV coefficients was 0.637 in one dimension, 0.651 (cumulative) in the third dimension, and pretty much stayed flat from there through all dimensions, suggesting that phylogenetic signal should be most concentrated in the first three dimensions.  Thus, I would expect K to reach its zenith somewhere in those first three dimensions and then taper toward 0.90 (the result you got in all dimensions), when viewing the distribution of K.by.p.  This could explain why K appears to be higher in the second dimension, if these were the results you reported.

If this is not what you reported it might be what you wish to look at, because with an RV of 0.637 in the first dimension and only 0.012 in the second dimension, the strong K in the second dimension might be misleading in a dimension with comparatively weaker association between data and phylogeny.  

Nevertheless, your figure demonstrates what I hypothesized before seeing it.  If you have a phylogeny that has an early split into two distinct subclades and then within one of those, you have large divergence, as you seem to have with hominids, it might be possible to have more phylogenetic signal in the second dimension.  This would be consistent, perhaps, with an adaptive radiation.  The large shape differences between humans and two Pan species, and between the two species of Gorilla, despite small branch lengths, which are almost as large as between some lemur and baboon distances (at least in two PACA dimensions) is consistent with such an interpretation.

I had not previously thought that PACA might reveal an adaptive radiation like this but I guess it is possible.  I am sure a disparity through time analysis would also confirm that.

Cheers!
Mike


Reply all
Reply to author
Forward
0 new messages