ANOVA or battery of t-tests, how to identify subgroups that differ from average across subgroups.

ge...@seznam.cz

unread,

Sep 28, 2008, 10:04:02 PM9/28/08

to

Hello,

I have toxicity measurements for objects which belong to 9
categories (various species of plants)(n=6,6,7,8,5,9,7,2,8). I would
like to know if there is some difference between objects of a category
and the rest of the data. I am not really interested in comparison
category A vs category B but rather to comparison category A vs
pooled(categories B,C,D....) - I would like to have a conclusion like
species of categories X, Y are resistant, species of category Z
issensitive.

What I was experimenting with:
1. simply t-tests for all combinations "objects of category X" vs.
"pooled objects of all other categories". After bonferroni to correct
for 9 t-tests, no results are significant.
2. anova - here I get significant difference but I don t know how to
show which are those categories that are different. I infered that you
can use contrasts for various comparisons but in examples I found in
books and over internet comparison vs rest of dataset is not used
(moreover it would require one-more contrast to have all category vs.
rest comparisons). I found tukey test in R, but it contrasts
individual categories with each other not category vs. average...

what is the right way of doing such things? is the category with just
2 members and problem and should be removed?

Please help. Thank you
(I used free R software during my tries; I would welcome a
recommendation of some free/cheap statistics software, that is not
command based).

My data(logarithmed):
values<-c(656.5, 668.5, 598, 689.5, 769, 717.5, 615, 713.5, 625, 641,
718.5, 686, 740.5, 641, 701, 697, 604.5, 668.5, 650.5, 489, 721.5,
660.5, 658.5, 643, 752, 695, 686, 645, 655, 686.5, 603, 707.5,
677.5, 627, 697.5, 637.5, 716.5, 675, 650, 736, 659, 678, 684,
618, 581, 584.5, 674.5, 735, 632, 676, 718.5, 680, 726.5, 723.5,
628, 572, 646, 691)

categories<-c(9, 9, 4, 9, 1, 9, 3, 8, 4, 3, 3, 3, 5, 4, 4, 1, 3, 7, 5,
3,
6, 6, 6, 1, 1, 6, 1, 5, 4, 4, 4, 4, 7, 7, 7, 7, 7, 8, 5, 9, 2,
2, 2, 6, 6, 6, 7, 9, 2, 2, 5, 3, 1, 9, 2, 6, 6, 9)

Ray Koopman

unread,

Sep 29, 2008, 4:26:23 PM9/29/08

to

Those values are surprisingly large for logs. Have they been scaled?
Here are the means and standard deviations, along with the data.

cat n mean s.d. sorted data
1 6 712.250 46.3052 643,686,697,726.5,752,769
2 6 659.500 24.3454 628,632,659,676,678,684
3 7 633.429 75.5681 489,604.5,615,641,680,686,718.5
4 8 652.125 42.8409 598,603,625,641,655,686.5,701,707.5
5 5 680.900 45.0935 645,650,650.5,718.5,740.5
6 9 637.444 52.5312 572,581,584.5,618,646,658.5,660.5,695,721.5
7 7 671.286 31.3340 627,637.5,668.5,674.5,677.5,697.5,716.5
8 2 694.250 27.2236 675,713.5
9 8 702.188 30.2772 656.5,668.5,689.5,691,717.5,723.5,735,736

The first value in category 3 looks suspiciously low.
If that value is dropped then the mean rises to 657.500
and the standard deviation drops to 44.5578.

When you compare one mean to the pool of the others, do you want to
use the pool of the means (i.e., the unweighted mean of the other
means) or the pool of the data (which would weight each mean by the
corresponding n)?

ge...@seznam.cz

unread,

Sep 29, 2008, 7:02:29 PM9/29/08

to

1. Yes, the data are logged (and negative taken) and scaled.
2. In fact the first value in category 3 might be even lower, the
object was simply resistant to the highest level of the treatment. I
have other datasets that are "poisoned" by such resistence values to
much bigger extent....
3. I want to compare samples of one category (e.g. A) with pooled
samples of other categories (all without A)- seems most logical to me,
variability information would be lost. Mean seems best to me in this
case, because the data behave in my opinion quite good, especially
when compared with my other datasets (poissoned by resistance).
Moreover I am not even aware of methods how to compare medians which
might be more apropriate... Some reference to a solution that a
biologist can grasp would be of real help.

Thank you for interest in my problem !

Ray Koopman

unread,

Oct 1, 2008, 2:34:06 AM10/1/08

to

On Sep 29, 4:02 pm, ge...@seznam.cz wrote:
> 1. Yes, the data are logged (and negative taken) and scaled.
> 2. In fact the first value in category 3 might be even lower, the
> object was simply resistant to the highest level of the treatment. I
> have other datasets that are "poisoned" by such resistence values to
> much bigger extent....
> 3. I want to compare samples of one category (e.g. A) with pooled
> samples of other categories (all without A)- seems most logical to me,
> variability information would be lost. Mean seems best to me in this
> case, because the data behave in my opinion quite good, especially
> when compared with my other datasets (poissoned by resistance).
> Moreover I am not even aware of methods how to compare medians which
> might be more apropriate... Some reference to a solution that a
> biologist can grasp would be of real help.
>
> Thank you for interest in my problem !

The usual alternatives to the traditional 2-group t-test and k-group F-
test of mean differences are the Wilcoxon-Mann-Whitney and Kruskal-
Wallis tests of shift, which use only ordinal information and do not
assume that the data have been sampled from any particular family of
distributions. However, they do assume that a monotonic transformation
exists such that the distributions of the transformed variable differ
by only a shift (i.e., an additive constant).

It turns out that the WMW and KW tests behave much like t and F if the
distributions are heteroscedastic (in a particular ordinally-defined
way that I would rather not go into now), and the existence of the
"poisoned" cases suggests that you may be sampling from such
distributions. I am not aware of a heteroscedastic version of KW that
would be analogous to the Brown-Forsythe or Welch versions of the
anova F. Regarding heteroscedasticity and WMW, I wrote on Jan 25,
2005, in the sci.stat.consult thread "Wilcoxon-comparison of median":
"The current best ordinal comparison of two independent groups is
given by eq 5.12, p 140, in Norm Cliff's 1996 book Ordinal Methods for
Behavioral Data Analysis. For a corroborating opinion, see the
comments re FPC3 on p 500 of H.Delaney & A.Vargha, Comparing several
robust tests of stochastic equality with ordinally scaled variables
and small to moderate sized samples, Psychological Methods, 7 (2002),
485-503." However, I haven't paid much attention to the problem since
then, and there may be something better now. The small n's are a
problem.

The fallback is to dichotomize the data at the overall median and do a
Pearson chi-square on the resulting 9 x 2 contingency table, testing
the hypothesis that all the groups have the same median.

Whatever test you use, I see no alternative to stepwise-Bonferroni-
adjusted tests of each category against the pooled others.

Jeff Miller

unread,

Oct 7, 2008, 7:29:45 PM10/7/08

to

On Sep 29, 3:04 pm, ge...@seznam.cz wrote:

> 2. anova - here I get significant difference but I don t know how to
> show which are those categories that are different. I infered that you
> can use contrasts for various comparisons but in examples I found in
> books and over internet comparison vs rest of dataset is not used

You could use Scheffe contrasts to compare each group against the
average of the other groups. The contrast coefficients would be
+1 (for the single group) and -1/8 (for each of the other groups).