procD.lm & pairwise

1,035 views
Skip to first unread message

moccett...@gmail.com

unread,
Dec 1, 2020, 6:21:46 AM12/1/20
to geomorph R package
Hello everyone,

I'm a morphometrics/geomorph beginner and I'm comparing the morphology of fish from different rivers. 
I'm a bit uncertain about the method/test I should use to detect any differences in shape. At the moment I've simply used  procD.lm first  (shape~River) and I got a significant P-value. Then, I used pairwise (test.type = "dist) to see where the differences were (i.e. which rivers are different). Is this all correct? Can I say there is a difference in fish body shape between the rivers highlighted by pairwise

Many thanks

Paolo

Adams, Dean [EEOB]

unread,
Dec 1, 2020, 11:20:21 AM12/1/20
to geomorph-...@googlegroups.com

Your analytical pipeline will perform the anova and subsequent tests of group differences in pairwise fashion.


Dean

 

Dr. Dean C. Adams

Director of Graduate Education, EEB Program

Professor

Department of Ecology, Evolution, and Organismal Biology

Iowa State University

https://www.eeob.iastate.edu/faculty/adams/

phone: 515-294-3834

--
You received this message because you are subscribed to the Google Groups "geomorph R package" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-pack...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/geomorph-r-package/71a4c88b-a8f9-4a07-9220-f0b93453fd23n%40googlegroups.com.

Paolo Moccetti

unread,
Dec 2, 2020, 9:33:25 AM12/2/20
to geomorph-...@googlegroups.com
Hi Dean,

Many thanks for answering. Glad to hear what I was doing was right then.
Best wishes

Paolo

Faysal Bibi

unread,
Dec 9, 2020, 9:47:32 AM12/9/20
to geomorph R package
Following up on this question, say we want to know whether two populations of pupfish have different size-allometric trajectories (different allometric slopes):

data("Pupfish")
Pupfish$logSize <- log(Pupfish$CS) 
Pupfish$Y <- prcomp(Pupfish$coords)$x[, 1:3]
fit1 <- procD.lm(Y ~ logSize*Pop, data = Pupfish)
summary(fit1)
PW1 <- pairwise(fit1, groups = Pupfish$Pop, covariate = Pupfish$logSize)
summary(PW1)

Summary(fit1) gives a p value of 0.012 for the interaction logSize:Pop, which I interpret to mean that these difference populations have statistically different responses of shape to size (ie different slopes)
Summary(PW1) gives a p of 0.009, which confirms the differences in slope (and would give more pairwise differences had there been more Pop categories in this data).

Is my usage of the covariate option in pairwise correct in order to test differences in the slopes of the groups? I understand that leaving it out leads pairwise to test differences in the group means instead (equivalent to testing the y-intercept?).
And is summary(PW1) without a test.type option correct for this type of comparison (ie using the difference between vector lengths), or should one use the slope correlations instead: 

summary(PW1, test.type = "VC", angle.type = "deg")

The latter results in a non-significant p-value, which would then contradict the results of summary(fit1)

Thanks,

Faysal Bibi

Mike Collyer

unread,
Dec 9, 2020, 11:29:44 AM12/9/20
to geomorph R package
Faysal,

The analyses you performed are correct for comparing slopes, up to your first summary.

The one you did by removing the covariate in the summary and choosing “VC” for test.type finds the vector correlation between pairs of vectors where the vectors point from the origin to the LS means of the groups.  That these are not significant does not mean much considering how you constructed your model to focus on pop:size allometry vector variation.  

There are probably considerably fewer cases where “VC” would be interesting to explore rather than the distance (“dist”) between LS means or the correlation between allometry vectors, as you did first.  

Hope that helps!
Mike

Faysal Bibi

unread,
Dec 10, 2020, 8:29:16 AM12/10/20
to geomorph R package
Thank you very much Mike, that's very helpful.
However for a few analyses I'm doing, I'm getting a highly significant p value on the interaction effect (p=0.007), but then the pairwise comparison reveals no significant differences among any of the categories being considered (with very high p values,  > 0.5). What could be the explanation for this?

Mike Collyer

unread,
Dec 10, 2020, 9:09:19 AM12/10/20
to geomorph R package
Faysal,

It is not possible to answer your question without more detail, much like you posted in your last question.  It is probably a result of the pairwise analyses not doing what you think they are doing, but this is only speculative without knowing what the interaction is about, what the groups were that are chosen for the pairwise test, and the type of pairwise test used.  Choice of a null model could also have implications for the results.

What pairwise does is find (iter + 1) permutations of fitted values from the null model, and then based on the groups and/or covariate used, finds differences between the LS estimates or slopes for the groups you choose (which can be different than the ones you used in your model fit).  If for example your interaction is group:covariate and you intend to compare slopes because this interaction is significant, but in your pairwise set up you do not include a covariate, then the function is forced to merely estimate LS means.  The test statistics compare means, not slopes, irrespective of whether you use “dist” or “VC” as the test.type, and regardless of whether you had a group:covariate interaction in your full model.

You might consider updating your question with the onscreen printed code and results if you would like a more precise answer.

Mike

Faysal Bibi

unread,
Dec 10, 2020, 9:19:57 AM12/10/20
to geomorph R package
Here's the script and results - probably some error in my implementation. 
This is a test of whether species in different dietary categories have a different size-allometric relationship along PC2. procD suggests they do, while pairwise suggests not.

> # PC2 by Size*Diet (interaction)
> fit0 <- procD.pgls(Comp2 ~ LogCSize * Diet, phy=Tree, dat= data1, print.progress = F)
> summary(fit0) #highly sig

Analysis of Variance, using Residual Randomization
Permutation procedure: Randomization of null model residuals 
Number of permutations: 1000 
Estimation method: Generalized Least-Squares (via OLS projection) 
Sums of Squares and Cross-products: Type I 
Effect sizes (Z) based on F distributions

              Df       SS        MS     Rsq      F      Z Pr(>F)   
LogCSize       1 0.002477 0.0024769 0.06767 14.064 1.7589  0.001 **
Diet           3 0.017350 0.0057833 0.47402 32.839 3.9929  0.001 **
LogCSize:Diet  3 0.002334 0.0007779 0.06376  4.417 1.8637  0.007 **
Residuals     82 0.014441 0.0001761 0.39455                        
Total         89 0.036602                                          
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Call: procD.lm(f1 = f1, iter = iter, seed = seed, RRPP = TRUE, SS.type = SS.type,  
    effect.type = effect.type, int.first = int.first, Cov = Cov,  
    data = data, print.progress = print.progress)

> # PC2 by Size*Diet (pairwise)
> pair0 <- pairwise(fit0, groups=data1$Diet, covariate = data1$LogCSize)
> summary(pair0) #none sig!

Pairwise comparisons

Groups: Browser Frugivore Grazer Mixed feeder 

RRPP: 1000 permutations

Slopes (vectors of variate change per one unit of covariate change, by group):
Vectors hidden (use show.vectors = TRUE to view)

Slope vector lengths
     Browser    Frugivore       Grazer Mixed feeder 
  0.03502621   0.03430746   0.03591133   0.04615646 

Pairwise absolute difference (d) between vector lengths, plus statistics
                                  d  UCL (95%)          Z Pr > d
Browser:Frugivore      0.0007187491 0.05510166 -1.1468575  0.978
Browser:Grazer         0.0008851209 0.03455993 -1.1899405  0.957
Browser:Mixed feeder   0.0111302483 0.03725275 -0.3384589  0.539
Frugivore:Grazer       0.0016038700 0.05668969 -1.0593515  0.946
Frugivore:Mixed feeder 0.0118489974 0.05582035 -0.5029020  0.607
Grazer:Mixed feeder    0.0102451274 0.04171484 -0.4707288  0.608

Mike Collyer

unread,
Dec 10, 2020, 9:44:49 AM12/10/20
to geomorph R package
OK, here is what I think the issue is.  Vectors have two properties: length and direction.  You correctly have a covariate in your analysis, which means slope vectors are used rather than vectors of LS means, but by using the default for the summary (“dist”), you test difference in the vector lengths.

Below is a summary of possible analyses one might consider with a model like yours

No covariate, test.type = “dist”: Would compare LS means (estimated response at mean LogCSize) among Diet levels
No covariate, test.type = “VC”: Would compare the directions of the vectors that point to the LS mans.  (This is likely seldom an interest but one might consider if two groups are similarly oriented in the data space, from the origin, but one extends further from the origin.)
Covarate supplied, test.type = “dist”: Would compare the length of slope vectors.  This would be of interest to know if groups differ in the amount of response change per unit change in LogCSize, among Diet types.
Covariate supplied, test.type = “VC”: Would compare whether the response is inherently different per unit change in LogCSize (orientated differently), among Diet types.

Notice I use “response” rather than “shape” because I am not sure why you use PC2.  That is a different matter, altogether, if PC2 is meant to characterize shape (it should not be).

Chances are your interaction was significant but the groups did not differ in the amount of response change per unit LogCSize, but they differ in direction, which would be revealed by vector correlations, “VC”.  A summary would convert vector correlations to angles, if that help perceive the differences.

Best,
Mike

Faysal Bibi

unread,
Dec 10, 2020, 10:08:35 AM12/10/20
to geomorph R package

Thanks,  your explanation of test.type / covariate options is very useful  (consider including in a future version of summary.pairwise help?). And PC2 is here highly correlated with both size and diet, so I'm testing whether diet could also have an influence on size-allometry along PC2 (I'm also doing these tests on total shape as represented by the coordinates or by multiple PCs, but those results largely make sense).
So it seems I should additionally be using test.type="VC" (covariate supplied) in order to distinguish cases where slopes may differ, but amount of response may not. In this case, this does bring the p values closer to significance, but they are still far from the p=0.007 found for the interaction effect, which I still find difficult to understand. Any further thoughts?

> summary(pair0, test.type = "VC")

Pairwise comparisons

Groups: Browser Frugivore Grazer Mixed feeder 

RRPP: 1000 permutations

Slopes (vectors of variate change per one unit of covariate change, by group):
Vectors hidden (use show.vectors = TRUE to view)

Pairwise statistics based on slopes vector correlations (r) and angles, acos(r)
The null hypothesis is that r = 1 (parallel vectors).
This null hypothesis is better treated as the angle between vectors = 0
                        r    angle UCL (95%)          Z Pr > angle
Browser:Frugivore       1 0.000000  3.141593 -0.7330656     0.7625
Browser:Grazer          1 0.000000  3.141593 -0.6276234     0.7420
Browser:Mixed feeder   -1 3.141593  3.141593  1.4869450     0.1170
Frugivore:Grazer        1 0.000000  3.141593 -0.7971126     0.7685
Frugivore:Mixed feeder -1 3.141593  3.141593  1.2952239     0.1470
Grazer:Mixed feeder    -1 3.141593  3.141593  1.4562802     0.1250




Mike Collyer

unread,
Dec 10, 2020, 11:22:49 AM12/10/20
to geomorph-...@googlegroups.com
I probably should have been less general and more specific in my previous email.  Because you are using a single variable, the test is seeking to find the correlation between vectors that exist on a line, so naturally all correlations are either 1 or -1.  Unfortunately, these tests are not going to make any sense for univariate responses.

I’ll check to see if there is something not working correctly with the “dist” test, since that might be the only one that would make sense for univariate data.  I simulated some data that also have a large and significant group:covariate effect but did not get any significant pairwise results.  I’m not sure if this means it is not possible or there is a glitch because the function is mired in multivariate-only application.

I’ll update if I find something.

Mike

Faysal Bibi

unread,
Dec 11, 2020, 6:23:39 AM12/11/20
to geomorph R package
Ok thank you. I'll be careful with my use of pairwise with univariate data then (and stick to the dist option for those comparisons). 
In case the function is updated for univariate use in future it would be great; I have not found another easily implementable pairwise test that can take phylogeny into account.

Mike Collyer

unread,
Dec 11, 2020, 7:40:10 AM12/11/20
to geomorph-...@googlegroups.com
Faysal,

Your dilemma has forced me to reflect a bit on conceptual limits.  Yours is a case where I think a multivariate perspective is simpler than a univariate one (it’s usually the other way around, meaning multivariate generalizations are challenging for univariate analyses).  The reason for this is that for univariate data, a vector of coefficients reduces to one coefficient, a scalar.  Because vectors have a length and direction, for any two vectors, characterizing their direction does not mean much but characterizing their difference in direction is simple.  (This is not too different than shape analysis.  What is a shape?  Hard to define.  But a shape difference between two objects is easier, or at least more precise.)  For two or more variables, obtaining length of vectors, independent of direction is simple; obtaining directional difference, independent of vector length is also simple.  However, for univariate data, there is no way to decompose easily for a test the length (absolute value) and direction (merely negative or positive) of the slope.  It can be done mathematically — this is why the pairwise test works — but the results cannot make sense.  For univariate data, length and direction are strictly confounded.

Here is an example.  Let’s say you have two slopes, one +0.5 and one - 0.5.  The length of each (distance from origin) is 0.5.  The direction of the unitized vectors is +1 and -1, respectively.  As test statistics, the absolute difference in lengths is 0.  The angle — because this is only a line — is 180 degrees.  (The problem here is that in any random permutation, the angle is 0 or 180 degrees, only.)  In other words, one simply cannot find a case where these statistics would be significant, even though the slopes are so different.

For the univariate case, the test statistics you would want is | b1 - b2 |, which is |+0.5 -(-0.5)| = 1.  This statistic conflates the magnitude of the slope and its direction, the two attributes we wish to decompose in a multivariate test.

It would be possible to add another test statistic that finds the “locational” difference between slope vectors.  As a multivariate generalization, this would be the distance between vector endpoints.  This is not trivial to pull off quickly, but I will think about it for the future.

In the meantime, here is a script you can use, assuming you have set up your pairwise object to have a covariate, like before, calling this object PW:

slope.diff <- sapply(PW$slopes, function(x) as.vector(dist(x)))
rownames(slope.diff) <- rownames(summary(PW)$summary.table)
apply(slope.diff, 1, RRPP:::pval) # P-values
apply(slope.diff, 1, RRPP:::effect.size) # Z-scores

Thanks for the thought-provoking issue.  I still do not agree in trying to represent shape by single components, but there will be cases where people use RRPP::pairwise for truly univariate data analysis.

Cheers!
Mike

Faysal Bibi

unread,
Dec 14, 2020, 5:49:06 AM12/14/20
to geomorph R package
Thank you Mike for the detailed explanation and the workaround script!
I'm glad this topic was worth some further consideration.

Best
Faysal

Faysal Bibi

unread,
Dec 14, 2020, 6:17:18 AM12/14/20
to geomorph R package
FYI, here's the p values of the two methods compared (default summary(PW) in first column, your script in second column). The results are  very similar for half the comparisons and wildly different for the other half. The new results however are now congruent with a finding of a significant interaction effect in the procD.pgls(Comp2 ~ LogCSize * Diet...) test.

                                 Pr > d                   apply(slope.diff, 1, RRPP:::pval)
Browser:Frugivore  0.824                             0.896
Browser:Grazer         0.826                             0.886
Browser:Mixed feeder   0.578                             0.003
Frugivore:Grazer       0.936                             0.962
Frugivore:Mixed feeder 0.536                             0.050
Grazer:Mixed feeder    0.472                             0.003

Thanks again!
Reply all
Reply to author
Forward
0 new messages