different sample sizes for groups in perMANOVA

2,155 views
Skip to first unread message

Peter Nelson

unread,
Jan 16, 2014, 8:54:53 PM1/16/14
to pc-...@googlegroups.com
Greetings multivariate masters,

While reviewing a manuscript, I came across an application of
perMANOVA, using the software PASA by Hammer (2001), where the number
of sample units in the two groups being compared were not equal. My
recollection was that perMANOVA required the same number of sample
units in each group. Before I made a critical comment on this
perceived error in the manuscript, I consulted "Analysis of Ecological
Communities" by McCune and Grace only to find there was no mention of
such a group-size equivalence requirement. Indeed, the equation for
the within-group sum of squares for perMANOVA reported therein only
indicates that sum of squared distances for a group is divided by the
number of sample units in that group but nothing about the number of
sample units between groups needing to be the same. A quick check in
Anderson (2001), where perMANOVA was originally described, confirmed
that there was no such equal sample size between group assumption.
However, when I did a test in PC-ORD using a very small matrix with
two groups, one with 3 samples units and the other with 2, confirmed
that PC-ORD will not let you do perMANOVA with different sample sizes
between groups.

I guess it makes intuitive sense that the variability between two
groups could get quite different simply because one group has very few
sample units and the other has many. However, I don't see why this
would stop one from conducting the test. Does anyone have insight into
why PC-ORD requires equal group sample sizes for perMANOVA?

Peter R. Nelson, PhD
Post-doctoral Researcher
Department of Botany and Plant Pathology
Cordley Hall 2082, Oregon State University
Corvallis, Oregon 97331-2902
Phone: 541-231-5584
http://peterrnelson.weebly.com/

Bruce McCune

unread,
Jan 18, 2014, 12:25:09 PM1/18/14
to pc-...@googlegroups.com
Peter,

If you have a simple one-way design, you do not need a balanced
design for perMANOVA, but PC-ORD has insisted on this for perMANOVA
in general. The reason for this is that all the other designs in
perMANOVA do require a balanced design, if one takes the purely
nonparametric approach, where equal-size slices of the design are
permuted. As I recall, after Anderson's original paper on NPMANOVA,
in which she mentions the balance requirement, she later started
using a semi-parametric approach, which allowed imbalanced designs
and more complex experimental designs. I'm not familiar with the
software PASA, so can't comment on that.

- Bruce
>--
>You received this message because you are subscribed to the Google
>Groups "PC-ORD" group.
>To unsubscribe from this group and stop receiving emails from it,
>send an email to pc-ord+un...@googlegroups.com.
>For more options, visit https://groups.google.com/groups/opt_out.
>

Arthur APA

unread,
Jan 22, 2014, 12:00:38 PM1/22/14
to pc-...@googlegroups.com
Peter and Bruce,
 
It looks pretty reasonable to assume that "PASA" should be "PAST" instead. The latter software package does list Hammer 2001 as the reference for citation. It also has PerMANOVA offered and in a way that does not require equal sample counts in the groups.
 
Per the answer and explanation from Bruce, for an analysis that has only two groups being compared this should be fine (i.e., it must be a one-way design and so not limited to equal group sizes to be fully nonparametric).
 
Thanks for the posts.
 
Art

Peter Nelson

unread,
Mar 2, 2017, 9:57:00 AM3/2/17
to PC-ORD
All,

I am currently re-confronted with the unequal sample size issue with perMANOVA with more than one group and upon searching for discussion about this on the internet, I came across my own question from a few years ago in this group. The earlier conversations on this thread about this indicated perMANOVA with only two groups would be ok but the multiple group issue remains unclear to me.

My data are monthly average climate stats (eg. temp and humid) from a series of data loggers. I want to test for differences between clusters of sensors (sites) as well as the position of each sensor within each site (position), switching the blocking factor between these two to test for the influence of site or position, respectively. Some sites have fewer dataloggers and some data loggers within a site have missing months (although every site has data for every month), hence the unbalanced design when blocking by either site or position. 

Any suggestions? 

Bruce McCune

unread,
Mar 4, 2017, 4:28:16 PM3/4/17
to pc-...@googlegroups.com
I'm not sure I understand the setup, but a couple of thoughts come to mind. First, with these climate variables, do you need a nonparametric approach? And do you need to do this in a multivariate way? (considering the climate variables simultaneously). Maybe doing them one at a time would help alleviate some of the missing data problem.
-Bruce

To unsubscribe from this group and stop receiving emails from it, send an email to pc-ord+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Peter A. Nelson

unread,
Mar 23, 2017, 12:30:05 PM3/23/17
to PC-ORD
Hi folks...

I've got two questions regarding the application of PERMANOVAs to an unbalanced design. 

The first is an extension of the other Peter Nelson's (!) previous: I'm comparing fish, algal and invertebrate communities from 4 different estuaries (two sites each estuary). While the sampling effort was identical across all sites, for some taxonomic groups under some conditions our samples would have 0 individuals for all species. Dropping these all zero events results in unequal sample sizes. For example, the number of successful samples or rows for each estuary and site w/in estuary are as follows: Estuary A (site 1: 11, site 2: 15), Estuary B (site 1: 5, site 2: 30), Estuary C (site 1: 6, site 2: 0), and Estuary D (site 1: 25, site 2: 24). 

My second question assumes that it's reasonable to proceed, despite such unequal sample sizes: Sites within some estuaries are different (greater freshwater influence, consistently higher temperatures, etc) and I'd like to be able to test the hypothesis that the communties are different across sites (ie Estuary A site 1 differs from Estuary A site 2). Going into this, I assumed that I start by comparing sites nested within estuaries. Reasonable? If so, how do you construct a model with nested factors here? If not, can I use PERMANOVA to do multiple pairwise comparisons within estuaries and then do a post-hoc test (eg Bonferroni)...

Thanks for your help!

Pete
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages