What nested designs will work with procD.lm and nested.update?

473 views
Skip to first unread message

Daniel Svozil

unread,
Jan 23, 2017, 9:56:34 PM1/23/17
to geomorph R package
Hi all,

After reading Milena's post about nested ANOVA designs, I went and dug into the help files for nested.update and both procD.lm and advanced.procD.lm functions.

Something stood out for me in the nested.update examples provided in its help file.

One example states the following:

tailANOVA <- proc.D(coord ~ Csize * Treatment/Family, iter=99, RRPP=TRUE, data=gdf)

# This will not work: tailANOVA<- nested.update ( tailANOVA, ~ Treatment/Family)

Does this mean that a nested design cannot include an interaction effect? 

If my data originate from a two factor experimental design (e.g. one random factor which is not of interest, (site) nested within a fixed factor (species)) could I specify a model that takes this hierchical design into account, and tests an interaction of species with another fixed factor such as sex? 

In other words, would the following model work?

coords ~ (species/site) * sex

If not, how could one specify such a design that would work with procD,lm and nested.update? And can this be done using an ANOVA based analysis at all?

Looking forward to your response.


Regards,

Mike Collyer

unread,
Jan 23, 2017, 10:31:51 PM1/23/17
to Daniel Svozil, geomorph R package
Daniel,

It’s not that other designs cannot be done, in principle, but currently, nested.update cannot do something more complicated than a single factor nested within another.  The programming required to account for all manner of interactions is a Herculean task to accomplish.  

One thing to point out, the model you suggested looks simple, but is really: coords ~ species + sex + species:site + species:sex + species:site:sex.

So my question is, what should be the random effect for each F value in ANOVA?  Is it really straightforward?

If someone wants to tangle with that, the beautiful thing is that procD.lm produces the SS for each term, for every random permutation.  So, there is the opportunity to calculate the F values, as desired, provided one can manage doing it “by hand”.  It is perhaps only a few lines of code.  For the few cases where somebody has this need, it is easier to offer suggestions for such programming than it is to anticipate all permutations of interactions with the nested.update function.  (Consider this an optimal efficiency model.)

So, specifically for your example, if you want the F value for species:sex to be MS species:sex / MS species:site:sex, something like this can be done

fit <- procD.lm(coords ~ (species/site) * sex)

fit$random.SS # all random outcomes
fit$df # df for each effect

random.MS.ranef <- fit$random.SS[5,]/fit$df[5]
random.MS.fixed <- fit$random.SS[4,]/fit$df[4]
random.Fs <- random.MS.fixed/random.MS.ranef

random.Fs[1] # observed F
geomorph:::pval(random.Fs) # P-value
geomorph:::effect.size(random.Fs) # Z score

You could then replace the F, Z, and P values in the procD.lm ANOVA output with these values.  You could follow the same procedure for any row in the ANOVA table divided by any other row.

With any single factor nested by another factor, it is easy to search the terms of the linear model and find which row gets divided by which other row.  Once there is an interaction, the programming is sufficiently difficult and is not something I have yet worked up the courage to attempt.  This email response is frankly much easier.

Hope that makes sense,

Mike


--
You received this message because you are subscribed to the Google Groups "geomorph R package" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-pack...@googlegroups.com.
To post to this group, send email to geomorph-...@googlegroups.com.
Visit this group at https://groups.google.com/group/geomorph-r-package.
To view this discussion on the web, visit https://groups.google.com/d/msgid/geomorph-r-package/56bd1feb-77e5-45e8-ad92-6ee17c720add%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daniel Svozil

unread,
Jan 23, 2017, 11:58:59 PM1/23/17
to geomorph R package, dpsv...@gmail.com
Thanks for that Michael,

So if I am only interested in the main effects and their interaction (species, sex, species:sex) in the context of this nested model, does that require that I perform that same computation you described for each main effect also?

And does that mean I do not report the F-values for the interactions involving the random factor (i.e. species:site, species:site:sex) since they are only used to compute the F-values for the terms of interest? 

Regards,

Daniel.

To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-package+unsub...@googlegroups.com.

Mike Collyer

unread,
Jan 24, 2017, 7:13:39 AM1/24/17
to Daniel Svozil, geomorph R package
Daniel,

If it were me, I would take a look at the model. e.g., 

terms(~(species/site) * sex)

And I would try to determine what the expected mean squares (EMS) is for each term.  I would then make sure my F-values were calculated with respect to the EMS (choose the right denominator).  This is pretty simple for single interactions in mixed models (i.e., one nested factor).  If A is fixed and B is random, then F values are calculated as F for A = MSA/MSAB, F for B = MSB/MSE, and F for AB  = MSAB/MSE.  Once there is a three way interaction, things get interesting.

I found this via a quick search.  It is from Jorge Dubcovsky, a plant scientist at UC-Davis.  I believe these are notes from an experimental design course.


If you follow the logic to the point where he discusses three-way interactions, you will see that “what to do?” becomes less straightforward.  The simplest answer appears to be if you have a single fixed effect, calculate F by dividing by MS for a two-way interaction including the fixed effect with the random effect.  If you have a two-way interaction between fixed effects, divide by the MS for the three-way interaction (like I did in the example).  But there is another example with adding the MS for two-way interactions and subtracting the MS for the three-way interaction.  I’ll cite exactly what Dubcovsky said in this document: "In factorial experiments with three or more factors, at least one of which is a random factor, and certain other, more complex designs, there are frequently no exact test statistics for certain effects in the models.”  In other words, you have strayed into territory where there is no single correct approach.

If I were in your shoes, I would be inclined (mostly by ease in approach and interpretation) to calculate F values as

species: MS species / MS species:site:sex (probably easiest) or MS species / MS species:site 
site: MS site/ MSE
sex: MS sex / MS species:site:sex
species:sex: MS species:sex / MS species:site:sex
species:site:sex: MS species:site:sex / MSE

Note that all random effects are divided by MSE, all fixed effects by the MS for the highest interaction.

The important thing is that you have to do this across all random permutations and recalculate effects sizes and p-values.  This is not easy, but it can be done.

Again, I hope this demonstrates the increased complexity with more than a single two-way interaction and why nested.update has only a “primitive” approach (say compared to proc MIXED in SAS or lmer in R, which both are designed for univariate data analysis).

Mike

PS I have visualized a future new function or modification of nested.update where users can use the function and indicate the EMS with something like,

error = c(“species:site:sex”, “Residual”, “species:site:sex”, “species:site:sex”, “Residual”)

as an argument, for example, with the terms I used above.  This would only appeal to advanced users who can envision what the EMS should be, and it is not a trivial programming hurdle.  Maybe when I get a sabbatical…


To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-pack...@googlegroups.com.

To post to this group, send email to geomorph-...@googlegroups.com.
Visit this group at https://groups.google.com/group/geomorph-r-package.
Message has been deleted

Mike Collyer

unread,
Jan 25, 2017, 6:13:17 AM1/25/17
to Milena Stefanovic, geomorph R package, dpsv...@gmail.com
Milena,

Yes, that is correct.

The random.SS in procD.lm (or random.Fs in procD.pgls) is a matrix with dimensions of (Number of effects x Number of random iterations + 1).  The first column of values are the observed SS; the rest of the columns are the SS values for each random permutation.  The observed values are considered the first iteration, as any permutation procedure has to consider the observed case to be one possible random outcome.  We just slot that in position 1.

Mike

On Jan 25, 2017, at 5:20 AM, Milena Stefanovic <milen...@gmail.com> wrote:

Mike,

Thank you for this detailed explanation.

Does following line of code repesent F-value of 1st iteration?

random.Fs[1] # observed F

Best,
Milena
To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-package+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "geomorph R package" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geomorph-r-pack...@googlegroups.com.
To post to this group, send email to geomorph-...@googlegroups.com.
Visit this group at https://groups.google.com/group/geomorph-r-package.

Daniel Svozil

unread,
May 12, 2023, 8:24:04 PM5/12/23
to geomorph R package
Hi all, 

I'm attempting to manually calculate F-values for random and fixed effects in procD.lm for a 3 factor (or larger) nested mixed effects model, as I did in the thread to which I am responding, from 2017.

At the time of this thread (using geomorph v3.0.5 and R v3.5.1), I went ahead and calculated the F-values by extracting the SS values and df for each effect, then dividing them to obtain their respective MS, F and Z values, etc. (see example I was provided below):

fit <- procD.lm(coords ~ (species/site) * sex)

fit$random.SS # all random outcomes
fit$df # df for each effect

random.MS.ranef <- fit$random.SS[5,]/fit$df[5]
random.MS.fixed <- fit$random.SS[4,]/fit$df[4]
random.Fs <- random.MS.fixed/random.MS.ranef

random.Fs[1] # observed F
geomorph:::pval(random.Fs) # P-value
geomorph:::effect.size(random.Fs) # Z score

Fast forward to yesterday, and I attempted the same thing after consulting the user manual for geomorph (I may be accessing it in the wrong place?) which indicates on pages 121-122 that all the those attributes of procD.lm listed above are still available in the current version of procD.lm in geomorph (4.0.5). However, when running for example:

fit$random.SS I got NULL. 

I updated R, R studio and geomorph to try and fix this (after doing so, no change, still NULL out of fit$random.SS). 

So I figured then that random.SS (and other such attributes of procD.lm at least in that form) must be deprecated and there is a new approach to extracting SS for all random perms and manually choosing how to calculate one's MS, F and Z values, etc for a given model.

I did notice in the example on page 123 of the manual, that there is now an argument called 'error' in the anova function.
Taken from the page:  

anova(fit2, error = c("species:site", "species:site", "Residuals"))  

This is much like what Mike suggested might be available one day if a sabbatical were an option, in the thread above.

Quoted here:
"PS I have visualized a future new function or modification of nested.update where users can use the function and indicate the EMS with something like,

error = c(“species:site:sex”, “Residual”, “species:site:sex”, “species:site:sex”, “Residual”)"

I did see that this argument, if all model terms and interactions are specified, yields a table of SS, MS and R sq but with F of 1, Z = 0 and P = 0.5 (as an example, see below with my own data):
Df SS MS Rsq F Z Pr(>F) habitat 1 0.02382 0.0238201 0.04131 1 0 0.5005 sex 1 0.00745 0.0074457 0.01291 1 0 0.5005 logcs 1 0.01178 0.0117770 0.02042 1 0 0.5005 habitat:location 9 0.14259 0.0158431 0.24727 1 0 0.5005 habitat:sex 1 0.00127 0.0012736 0.00221 1 0 0.5005 habitat:sex:location 9 0.00826 0.0009174 0.01432 1 0 0.5005 Residuals 646 0.38148 0.0005905 0.66156 Total 668 0.57665

What is the difference with this and one where the tests of significance/effect size are generated when error= is left blank?

Anyway, the aim is to calculate manually or through a more automated fashion if possible in the current version of geomorph, the MS, and test statistics for the model above for all the random permutations as suggested previously in this thread. It just looks to me like it is now done differently to how it was done 6 years ago. 

Looking forward to your suggestions.

Regards,
 
Daniel Svozil

Mike Collyer

unread,
May 13, 2023, 3:27:13 AM5/13/23
to geomorph R package
Dear Daniel,

Although it did not require a sabbatical, “one day” came a year later in 2018, when we released the RRPP package.  Indeed, nested.update was deprecated and the anova.lm.rrpp function made it possible to use the error argument in anova(…).  There is a section in the help file (manual you linked) called, "Notes for geomorph 3.1.0 and subsequent versions”, which links to the functions in RRPP.  The lm.rrpp function help shows how to account for nested/random effects.  Any lm.rrpp example can be run for procD.lm objects, as the procD.lm function is now a wrapper for the lm.rrpp function, allowing one to use array-formatted data rather than a matrix.

As for the missing $random.SS output, that is a (typo) bug that we will have to fix.  However, that feature of output, along with some others, was retained (or intentioned to be retained), just in case we had some users who maybe did not use geomorph for perhaps six years and upon return might like their code to still work.  The procD.lm output is large and often redundant, because in addition to the newer format of lm.rrpp (to shunt results into $LM, $ANOVA, and $PermInfo subunits), we attempted to make sure the output one would expect to see before the change was still there, after.  You can find the random.SS results by also extracting $ANOVA$SS results.

You happen to be the first person who reported this bug, so to be honest, we had not noticed it.  Internally, random SS is captured by the $ANOVA$SS output, as geomorph functions wrap RRPP functions that know no other output format.  You could update your code to use $ANOVA$SS rather than $random.SS, wait until we fix the code and push updates to github (maybe later today), or just use anova(fit, …) with its post-2017 capabilities made possible via anova.lm.rrpp.

Cheers!
Mike


Daniel Svozil

unread,
May 20, 2023, 8:54:33 AM5/20/23
to geomorph R package
Hi Mike,

I figured things would have changed in the 6 years since I last used geomorph. Thanks for the comprehensive explanation. Much appreciated.

Regards,

Daniel

Reply all
Reply to author
Forward
0 new messages