Is it required to transform a compositional matrix to its distance in RRPP?

2 views
Skip to first unread message

iEdison

unread,
Jan 26, 2026, 6:20:19 PM (11 days ago) Jan 26
to geomorph R package
I am working with a data set of a matrix of diet categories by percentage of diet. I am hoping to use lm.RRPP with the diet matrix as a predictor, for example: `tongue_length ~ skull_volume + diet_matrix`. 

The diet matrix is compositional and has abundant zeros. In a GLS context, the unit-sum, double zeros, and abundant zeros issues require you to transform the matrix to a distance/dissimilarity matrix. Bray-Curtis is often used for the way it handles these issues for ecological data where clr is not an option.

Does RRPP require this sort of transformation on the matrix, or can I input the raw, compositional diet matrix into the formula?

Thank you! 

Mike Collyer

unread,
Jan 26, 2026, 8:23:46 PM (11 days ago) Jan 26
to geomorph R package
One could either use the original variables from the diet matrix or ordinate scores (say from PCoA on the Bray-Curtis dissimilarity matrix) and the function will work.  The function will not work if one tries to use a dissimilarity matrix as an independent variable (it does work if a dissimilarity matrix is the response variable, though).  Although the function would work with either original variables or PCoA scores, I would not expect commensurate results.  The reason for using Bray-Curtis followed by PCoA would be to develop a set of continuous variables from bounded, skewed, frequency data.  
If that is worth doing, it is probably better to use PCoA scores as independent variables, although one would have to use PCoA loadings perhaps to make sense of, e.g., ANOVA results.  (If PCoA1 and PCoA2 are significant, what does it mean?)

Perhaps a bit of uncertainty arises from the vegan::adonis(2) function, which allows one to define the transformation within the function.  That is for response data though, not independent variables.

Best,
Mike

--
You received this message because you are subscribed to the Google Groups "geomorph R package" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-pack...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/geomorph-r-package/f7312883-f5af-4140-89e9-55999613bbd9n%40googlegroups.com.

iEdison

unread,
Jan 27, 2026, 1:32:18 PM (10 days ago) Jan 27
to geomorph R package
Excellent, thank you. Yes, I should've been more clear that I did mean Bray-Curtis followed by PCoA. When you say, "one would have to use PCoA loadings perhaps to make sense of, e.g., ANOVA results," you mean that if you use the PCoA scores in the RRPP model, you would need to use the loadings to interpret the RRPP ANOVA if there's a positive signal, correct?

As a follow up, is there any difference if the matrix is binary instead of compositional (using Jaccard or Bray-Curtis/Sorensen-Dice)?

Ian


Mike Collyer

unread,
Jan 27, 2026, 1:55:25 PM (10 days ago) Jan 27
to geomorph R package
Dear Ian,

 When you say, "one would have to use PCoA loadings perhaps to make sense of, e.g., ANOVA results," you mean that if you use the PCoA scores in the RRPP model, you would need to use the loadings to interpret the RRPP ANOVA if there's a positive signal, correct?

Yes.  For example, if the coefficient for PCoA1 is -0.682 and is highly significant, you would have to use the loadings to figure out what negative scores along this axis indicate, in terms of diet variables.


As a follow up, is there any difference if the matrix is binary instead of compositional (using Jaccard or Bray-Curtis/Sorensen-Dice)?

Binary variables might be similar to dummy variables used for factors, which are all 0 or 1.  However, for dummy variables, there is usually some level of balancing (approximately equal frequencies of 0s and 1s among columns of the matrix).  If 0s and 1s indicate presence or absence of diet types (columns), one might risk issues with multicollinearity that would not be an issue with computational scores.  The important question one has to ask is whether the transformation of variables to scores is a fundamental need for the question asked.  If diet types can be based on the relative portions of various taxa consumed in a diet, and tongue length might co-vary with diet type, then using scores in the linear model might be better.  If one wishes to know if tongue length varies based on whether species X is consumed, then the original variable is probably better.

I hope that makes sense.

Mike


Ian


On Monday, January 26, 2026 at 5:23:46 PM UTC-8 mlco...@gmail.com wrote:
One could either use the original variables from the diet matrix or ordinate scores (say from PCoA on the Bray-Curtis dissimilarity matrix) and the function will work.  The function will not work if one tries to use a dissimilarity matrix as an independent variable (it does work if a dissimilarity matrix is the response variable, though).  Although the function would work with either original variables or PCoA scores, I would not expect commensurate results.  The reason for using Bray-Curtis followed by PCoA would be to develop a set of continuous variables from bounded, skewed, frequency data.  
If that is worth doing, it is probably better to use PCoA scores as independent variables, although one would have to use PCoA loadings perhaps to make sense of, e.g., ANOVA results.  (If PCoA1 and PCoA2 are significant, what does it mean?)

Perhaps a bit of uncertainty arises from the vegan::adonis(2) function, which allows one to define the transformation within the function.  That is for response data though, not independent variables.

Best,
Mike

On Jan 26, 2026, at 6:20 PM, iEdison <iante...@gmail.com> wrote:

I am working with a data set of a matrix of diet categories by percentage of diet. I am hoping to use lm.RRPP with the diet matrix as a predictor, for example: `tongue_length ~ skull_volume + diet_matrix`. 

The diet matrix is compositional and has abundant zeros. In a GLS context, the unit-sum, double zeros, and abundant zeros issues require you to transform the matrix to a distance/dissimilarity matrix. Bray-Curtis is often used for the way it handles these issues for ecological data where clr is not an option.

Does RRPP require this sort of transformation on the matrix, or can I input the raw, compositional diet matrix into the formula?

Thank you! 

-- 
You received this message because you are subscribed to the Google Groups "geomorph R package" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-pack...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/geomorph-r-package/f7312883-f5af-4140-89e9-55999613bbd9n%40googlegroups.com.


-- 
You received this message because you are subscribed to the Google Groups "geomorph R package" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-pack...@googlegroups.com.

iEdison

unread,
Jan 27, 2026, 2:40:57 PM (10 days ago) Jan 27
to geomorph R package
Yes, that makes sense to me. Thanks for your help!
Reply all
Reply to author
Forward
0 new messages