procD.lm with Multiple Variables, SS Types and Effect Types

Fatih Aydık

unread,

Mar 12, 2025, 7:07:19 PMMar 12

to geomorph R package

Dear All,

My questions will be mostly on statistics but I'd be very happy if you can help me understand clearly.

I am running procD.lm as shape coordinates are DV and 5 of IV's as below.

procD.lm(Coords ~ Score1 + Score2 + Score3 + Score4 + Score 5 + CS, SS.type = c("III")

My main goal is to assess if those scores explain any variation of shape when controlling the size (CS).

My independent variables are scores of a psychometric questionnaire. These sub-scores aim to explain different aspects of same psychological domain. Thus, they are moderately intercorrelated. When I check, maximum VIF is about 2.5 (when CS is removed from the model).

When I run separate analyses of these sub-scores, I get significant results for Score1 and Score3. When I check shape change based on TPS grids, I get expected specific shape change patterns supporting the main hypothesis.

Also, naturally, when I run with multiple IV using SS Type I, I get significant results for Score1 or Score3 as long as they are ordered first. There are no hierarchy between scores, thus, I need to use type II or type II of SS. Yet, this time, I get insignificant results.

Given the situation, I have some questions.

1- Is this appropriate to report these results as "When other sub-scores controlled, none of variables explained shape variation significantly. However, when tested separately, Score1 and Score3 explained shape variation significantly"

2- When I use lm or lmperm, I get an overall significance value as well but not with procD.lm. That could be useful as well. What am I missing here?

3- To make things even more confusing for me, when I add composite score to IV's, all scores become significant. Since the composite score is essentially the mean of all elements in the questionnaire, shouldn’t this increase multicollinearity and remove even more shared variance?

4- There are different effect types for argument "effect.type". I use "Rsq" since I aim to assess if there are significantly explained variance. Is it suitable for this purpose or should I stick with the "F" since I am testing a hypothesis.

I know these might be simple questions for some of you, but I’d really appreciate any thoughts or advice. Also, I can switch to a different approach if you think is more suitable for checking this relationship. All recommendations are appreciated.

Thanks in advance for your time!

Fatih

Mike Collyer

unread,

Mar 13, 2025, 8:31:17 AMMar 13

to geomorph R package

Dear Faith,

A partial response to some of your questions (I’m not sure I understand Question 3):

On Mar 12, 2025, at 7:07 PM, Fatih Aydık <fatih...@gmail.com> wrote:

Dear All,

My questions will be mostly on statistics but I'd be very happy if you can help me understand clearly.

I am running procD.lm as shape coordinates are DV and 5 of IV's as below.

procD.lm(Coords ~ Score1 + Score2 + Score3 + Score4 + Score 5 + CS, SS.type = c("III”)… When I run separate analyses of these sub-scores, I get significant results for Score1 and Score3. When I check shape change based on TPS grids, I get expected specific shape change patterns supporting the main hypothesis.

Also, naturally, when I run with multiple IV using SS Type I, I get significant results for Score1 or Score3 as long as they are ordered first. There are no hierarchy between scores, thus, I need to use type II or type II of SS. Yet, this time, I get insignificant results.

Given the situation, I have some questions.
1- Is this appropriate to report these results as "When other sub-scores controlled, none of variables explained shape variation significantly. However, when tested separately, Score1 and Score3 explained shape variation significantly"

My answer to this would be, no, it is not appropriate, as stated. What you do with type III SSCPs is test every effect in your model, conditioned on all other terms in the model. What you do with the separate analyses is test every effect with no other term in the model (conditioned only on the overall mean). If you explicitly and clearly establish that the null models are different, then maybe you can report both outcomes. Testing effects “separately” really means changing the null models used, so that should be clear.

2- When I use lm or lmperm, I get an overall significance value as well but not with procD.lm. That could be useful as well. What am I missing here?

There are two ways to do this. The first is create a null model fit, along with your full model fit

fit.null <- procD.lm(Coords ~ 1, …)

fit.full <- procD.lm(Coords ~ Score1 + Score2 + Score3 + Score4 + Score 5 + CS, SS.type = c("III”)…)

and then

anova(fit.null, fit.full)

This is demonstrated in the help file for procD.lm, although in a slightly different way.

The other is not as straightforward but because a procD.lm object is also a lm.rrpp object, this will work.

summary.lm.rrpp(fit.full)

This way mimics the summary.lm approach to a certain extent. Unfortunately, summary.procD.lm defaults to anova(), because that was historically how it worked.

4- There are different effect types for argument "effect.type". I use "Rsq" since I aim to assess if there are significantly explained variance. Is it suitable for this purpose or should I stick with the "F" since I am testing a hypothesis.

If your goal is to have P-values, they should be either approximately or exactly the same.

Best,

Mike

--
You received this message because you are subscribed to the Google Groups "geomorph R package" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-pack...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/geomorph-r-package/679ffe15-3614-4c75-864c-254e64f6ba72n%40googlegroups.com.

Juan Esteban Vrdoljak

unread,

Mar 13, 2025, 10:06:33 AMMar 13

to geomorph R package

In my opinion, there is another possible analysis you could report, but it's important to explain your approach clearly.

There is often a statistical trade-off between the parameters being estimated (due to the number of factors in your model) and the amount of available information (which depends on factors such as data variability, distribution types, and, most importantly, sample size). If there isn't enough information to reliably assess the effect of all five scores together, you could run a procD.lm separately for each score (including centroid size) and report all these models (i.e., ~ CS + score1, ~ CS + score2, etc.), while also presenting the full model that includes all scores.

By reporting both sets of results, you balance two key considerations: (1) using a single model accounts for the lack of independence between scores but results in overparameterization, and (2) using multiple models ensures sufficient information but ignores the potential dependence among scores. If you choose this approach, be sure to apply a multiple testing correction to the p-values.

In short, it is crucial to provide a detailed description of your methods and to interpret your conclusions carefully, considering the conditional nature of the results. For example, if score1 shows a significant effect when analyzed individually, the conclusion should be framed as: "The effect of score1 on morphology when considered independently of the other scores."

If you determine that your model is not overparameterized—meaning the parameters are properly estimated—and that the scores cannot be considered independent in any way, then this approach would not be appropriate.

I suppose there are many other statistical approaches you could apply to your data, since statistics is flexible. However, it is essential to report each approach clearly and accurately to ensure that the results are properly understood within their context.