Question about procD.lm reports "linear variable redundant" warning and summary table missing factors

29 views
Skip to first unread message

Chi Zhang

unread,
Jul 17, 2024, 5:00:09 PM7/17/24
to geomorph R package
Hi all,

I am trying ProcD.lm to look at group differences between two ontogenetic stages of mice.  The gdf that contains the dataset is here: https://drive.google.com/file/d/1_X-j5ORSmfQcOPsuQk5cJI276yZi9nIM/view?usp=sharing

if you load the data, it would be a dataframe 'gdf' that contains Procrustes aligned coordinates of 14 specimens, 8 are P28 and 6 are P60 stages. 'gdf$groups' indicate groups by stages, 'gdf$ID' indicate each specimen with two landmark trials for checking individual variations, and 'gdf$replica' indicates two landmark trials.

If I ran:
fit_test1 <- procD.lm(coords ~ ID + groups+replica,
                data = gdf, iter = 1000, turbo = FALSE,
                RRPP = TRUE, print.progress = FALSE)
summary(fit_test1)

It would report an warning: 
Because variables in the linear model are redundant,
the linear model design has been truncated (via QR decomposition).
Original X columns: 9
Final X columns (rank): 8
Check coefficients or degrees of freedom in ANOVA to see changes.

The printed out anova table is;
          Df        SS        MS     Rsq      F       Z   Pr(>F)    
ID         6 0.0236239 0.0039373 0.82309 5.2274  4.5967 0.000999 ***
replica    1 0.0005585 0.0005585 0.01946 0.7415 -0.3831 0.635365    
Residuals  6 0.0045192 0.0007532 0.15745                            
Total     13 0.0287016  

You can see "groups" factor is missing Furthermore, why is the Z score of "replica" is negative? Isn't Z-score supposed to be SS/variance of the model effect?

Also, as long as I include an interaction between "ID*group" to check whether individual variations across, such as "coords ~ ID*groups", the same warning is reported and the anova table will miss "groups" factor and the interaction factor completely. If I reverse the order and do "coords~groups*ID", only the interaction would miss.

The only thing that I can make it work is "coords~replica*groups", but the Z-scores for "replica" and interaction would still be negative.

I'm not sure if there is any problem with my dataset or grouping. I thought the grouping is pretty straightforward. Thank you!

Adams, Dean [EEOB]

unread,
Jul 17, 2024, 5:12:43 PM7/17/24
to geomorph-...@googlegroups.com

Chi Zhang,

 

The first issue is a data coding issue. If you look at gdf$ID and gdf$groups, there is near perfect redundancy there. That is, the IDs are a combination of ‘groups’ and some other ID components (D2, N3, N5, N6, N7, Z1). However, with the exception of N3, all of the latter are found in only 1 group. This redundancy is why you are not obtaining multiple factors in the model, as there are not really 2 factors as you have coded them.

 

As for the negative Z-score, your interpretation of these is incorrect. They are not SS, and thus do not have to be positive. Instead, they represents the location of the observed test statistic on its normalized RRPP permutation distribution (see help file of procD.lm and lm.rrpp where this is described). Having a negative Z-score simply means that the observed value is less than expected relative to the mean of that permutation distribution. That implies that the effect for that term in the model is not overly explanatory, and would correspond with a nonsignificant p-value as you have for the replica term.


Dean

 

Dr. Dean C. Adams

Distinguished Professor of Evolutionary Biology

Department of Ecology, Evolution, and Organismal Biology

Iowa State University

https://faculty.sites.iastate.edu/dcadams/

phone: 515-294-3834

--
You received this message because you are subscribed to the Google Groups "geomorph R package" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-pack...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/geomorph-r-package/3ff46cdb-f84f-4496-9a48-7632eb47bdd1n%40googlegroups.com.

Chi Zhang

unread,
Jul 17, 2024, 5:56:37 PM7/17/24
to geomorph-...@googlegroups.com
Hi Dean,

Thank you very much for the speedy reply and thanks for the explanation.

I got the redundancy now and the meaning of Z-score.

If I want to compare the amount of individual variations within two "groups", should I code the same ID like "ind1, ind1, ind2, ind2, ...." for specimens of group 1, then "ind1, in1, ind2, ind2, ..." for group2? When I did this, it still reported the warning but the interaction of group*ID showed up.

The problem is that I have an unbalanced sample: group1 has 4 specimens, group 2 only has 3 specimens (I'm waiting for more data). Can you give some suggestions of coding the individuals?

Thank you!

Chi



Adams, Dean [EEOB]

unread,
Jul 17, 2024, 7:47:12 PM7/17/24
to geomorph-...@googlegroups.com

Chi,

 

Your objective is not entirely clear to me. Do you mean you want to know whether the variation among replicates for each individual is greater in one group versus the other? Do you mean the variation among individuals within replicates? Or some other notion of variation?  A precise definition of the problem is usually quite helpful to sorting out what one wishes to do.


For instance, differences among groups is often referred to as ‘variation’ among groups. That is an ANOVA type question, which in the multivariate world is a question evaluating the ‘location’ of group means in the multivariate space (are the group means close to one another or far away, relative to within-group variability).

 

By contrast, one could ask a question of differing variation within groups, which is whether one group displays greater variation as compared with another (regardless of the group mean location). That is a question about the size of the cloud of variation, typically handled by a disparity analysis (disparity is a measure of within-group variation).

 

Your question “I want to compare the amount of individual variations within two "groups"” could also imply something else: that you wish to measure individual disparity, and then determine whether the disparity of these disparities differs in group 1 and 2. 


As you can see, how the word variation is used matters a great deal for how one might suggest an analysis procedure.  I am a bit unclear as to where you wish your analysis to go. Perhaps others in the group can chime in for enlightenment.

Mike Collyer

unread,
Jul 17, 2024, 8:37:41 PM7/17/24
to geomorph R package
Dear Chi,

Unless Ind1, Ind2, etc., are the same individuals put into different groups (like clones), they should not be labeled the same.  for example, group 1 might comprise individuals 1-5, group 2 individuals 6-10, etc.  Your attempted linear model will behave as if every level of group could be crossed with every individual.

With respect to term order, redundancies, and ANOVA output, there is a systematic way that redundant parameters are removed from a linear model design matrix.  Once a second parameter is found redundant, it is removed but the first one is retained.  Here is an example with two perfectly redundant factors to illustrates this:

PastedGraphic-1.png
PastedGraphic-2.png
PastedGraphic-3.png

You can see that the interaction disappears because there is no interaction of factors to be considered, since they are the same.  The term that shows up in ANOVA just happens to be the one asked for first.  As Dean pointed out, because you have redundancy in your terms, you get seemingly different ANOVAs when you switch them around.   However, the biggest issue is that R accommodates your wish to attempt an ANOVA, even though the linear model design you use is illogical.   Because your ID levels are unique to each replica level, there are no interactions to consider.

Mike

Chi Zhang

unread,
Jul 17, 2024, 9:55:36 PM7/17/24
to geomorph R package
Hi Dean,

Thank you very much for the explanation of the concepts and suggestions of tests!

I think I confused the meaning of ID*group interaction factor. I did want to test whether patterns of variations in ID, representing individuals, are different across two groups (eventually I'll get a knockout and control group). I thought that ID*group interaction would quantify whether responses of IDs (individuals) are different cross groups. I also wanted to figure out the methodological issue of unbalanced group sizes. I realized that this won't work for my case because IDs are all distinct individuals in two groups, and I totally ignored that. 

I hope my understanding of interaction is appropriate. It is from looking at the univariate two-way ANOVA cases, such as gender responses differ across treatment groups. something similar to here: https://pages.uoregon.edu/stevensj/interaction.pdf.  

I did want to quantify the amount of variations in different groups, and disparity analysis should definitely be done. I also did need better designed tests. Since the test and control groups have two ontogenetic stages, a ontogenetic stage/group interaction should also be examined, plus analysis such as testing homogeneity of allometric slopes and trajectory analysis for comparing allometric trends in two groups.

Chi

Chi Zhang

unread,
Jul 17, 2024, 10:11:19 PM7/17/24
to geomorph R package
Hi Michael,

Thank you very much for the explanation and examples! I have a better understanding of the redundancy issue and interaction design. 

Best,
Chi

Adams, Dean [EEOB]

unread,
Jul 17, 2024, 10:48:21 PM7/17/24
to geomorph-...@googlegroups.com

Got it. See Mike’s email, as his further explains the test design challenges with what you had originally attempted.


Good luck!

Dean

Reply all
Reply to author
Forward
0 new messages