Alternative to two-way ANOVA for data with unequal variances?

Thomas Brunner

unread,

Oct 21, 2014, 7:29:56 AM10/21/14

to statforli...@googlegroups.com

Dear all,

I've been struggling with a quite basic problem for a while, and I was wondering if anyone can help.

My problem is fairly simple:

• I'm analysing the lengths of postmodifiers in a corpus.

• I'd like to tease out the effects (and possibly interactions?) of two independent variables: VARIETY (i.e. British, Singaporean, and Kenyan English) and TEXT_CLASS (spoken or written)

• The variances of the length data are unequal according to a Bartlett test (p = 3.755e-08). If understand that correctly, this means that I am not supposed to compute a two-way ANOVA.

To add insult to injury, I'd be in need of a rather quick and easy solution...

Can anyone think of a workaround? Would you, for instance...

• compute the ANOVA anyway and accept its results (which are quite plausible, by the way) "with a grain of salt"

• use a Kruskal-Wallis test kruskal.test(list(LENGTHS,VARIETY,TEXT_CLASS)), even though it does not really enable you to tease out the effects of the individual variables

• compute individual pairwise.wilcox.test() for LENGTHS by TEXT_CLASS and LENGTHS by VARIETY

• do not do a multifactorial test at all but rely on the intepretation of an interaction.plot(TEXT_CLASS,VARIETY, LENGTHS, type="b" ) (see attachment) as the effects are rather clear anyway (obviously, on the whole, TEXT_CLASS is a better predictor than VARIETY, even though the effects are not purely additive)

• Or is there any other solution you would favour?

Thank you so much!

Best

Thomas

Interact_plot_LENGTHS_VAR.pdf

Thomas Brunner

unread,

Oct 21, 2014, 8:46:26 AM10/21/14

to statforli...@googlegroups.com

A quick follow-up question: I've come across kruskalmc() {pgirmess} and kruskal {agricolae} allowing for TukeyHSD-style comparison of the data by factor levels and multiple Kruskal-Wallis tests with p corrected for multiple testing. Would you recommend those?

Stefan Th. Gries

unread,

Oct 21, 2014, 10:47:12 AM10/21/14

to StatForLing with R

How about robust ANOVA for at least p-values of Variety, TextClass,
and their interaction?

##########################
rm(list=ls(all=TRUE)); set.seed(2113)
TC <- factor(sample(c("S","W"), 600, replace=TRUE))
VA <- factor(sample(c("G","E", "S"), 600, replace=TRUE))
LE <- floor(rnorm(600, 1,7)) # I know these are not good lengths ;-)

interaction.plot(TC, VA, LE, ylim=range(LE)); grid()
bartlett.test(LE ~ TC) # ***
bartlett.test(LE ~ VA)

# Wilcox, Rand. 2012. Modern statistics for the social and behavioral
sciences: a practical introduction. Boca Raton, FL: Chapman &
Hall/CRC, Chapter 11.
library(WRS)
qwe <- tapply(LE, list(TC, VA), c)
x <- list(qwe[[1]], qwe[[2]], qwe[[3]], qwe[[4]], qwe[[5]], qwe[[6]])

t2way(3, 2, x, tr=0.2) # approach 1
pbad2way(3, 2, x) # approach 2 (with bootstrapping)
t2waybt(3, 2, x, tr=0.2) # approach 3

# all return the same results: everything's ns
##########################

Best,
STG
--
Stefan Th. Gries
----------------------------------
Univ. of California, Santa Barbara
http://tinyurl.com/stgries
----------------------------------

Thomas Brunner

unread,

Oct 21, 2014, 11:05:05 AM10/21/14

to statforli...@googlegroups.com

Dear Stefan, thanks so much for your quick reply, this looks really promising!

Martin Schweinberger

unread,

Oct 21, 2014, 11:07:02 AM10/21/14

to statforli...@googlegroups.com

Hey there,

Stefan is of course right but in case you would like to read up on robust alternatives for factorial ANOVA, have a look at chapter 12 in Field, Andy and Miles, Jeremy and Field, Zoe. 2012. Discovering Statistics Using R. SAGE. (particularly pages 534-541; attached). R functions for various robust alternatives can be downloaded from Rand Wilcox page (http://dornsife.usc.edu/labs/rwilcox/software/).

Best,
Martin

=====================================

Martin Schweinberger

Gählerstraße 11

22767 Hamburg

Fon.: ++49 (0)176 387 48 283

Home: http://www.martinschweinberger.de/blog/

--
You received this message because you are subscribed to the Google Groups "StatForLing with R" group.
To unsubscribe from this group and stop receiving emails from it, send an email to statforling-wit...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Field Miles Field 2012 Discovering statistics with R.pdf

Stefan Th. Gries

unread,

Oct 21, 2014, 11:19:51 AM10/21/14

to StatForLing with R

Thanks Martin, that is exactly the package I was using (and there's a new one, WRS2, that is now under development, too).

Best

STG
--
Stefan Th. Gries
----------------------------------
Univ. of California, Santa Barbara
http://tinyurl.com/stgries
----------------------------------

Thomas Brunner

unread,

Oct 21, 2014, 1:18:27 PM10/21/14

to statforli...@googlegroups.com

Thanks Martin, I'll have a look at this chapter right away! Thanks for the PDF file, this is so helpful!

To unsubscribe from this group and stop receiving emails from it, send an email to statforling-with-r+unsub...@googlegroups.com.

Thomas Brunner

unread,

Oct 22, 2014, 5:48:18 AM10/22/14

to statforli...@googlegroups.com

Dear all,

I've had a go at WRS now, and a problem has cropped up – just in case anyone has come across something similar.

For the "classic" WRS package the data need to be melted/cast into wide format, as described by Field/Miles/Field, pp. 535–7. The difference between my and their dataset, however, is that I don't have the same amount of length measurements for all variable configurations: I have only 961 NPs from spoken British English, but 1615 from written Kenyan English, for instance. My wide dataframe, thus, inevitably contains NAs.

I have a hunch that this may be the reason for the error message I keep getting when invoking

t2way(3,2, DATA_WIDE, tr = .2)

I keep being told:

Error in `[[<-.data.frame`(`*tmp*`, grp[i], value = c(4L, 3L, 6L, 12L, : replacement has 961 rows, data has 1615

Curiously, the problem doesn't arise when I use the formula notation

t2way(NP_SMALL$LENGTH_NP_TOTAL ~ NP_SMALL$VAR * NP_SMALL$TEXT_CLASS, tr = .2)

provided by the new {WRS2} wrapper by Mair/Schoenbrodt/Wilcox (http://cran.r-project.org/web/packages/WRS2/index.html). I am not too happy with {WRS2}, though, since its t2way output is less detailed than the {WRS} one plus it is doesn't provide any post-hoc method (such as mcp2atm() in {WRS})

Best

Thomas

Stefan Th. Gries

unread,

Oct 22, 2014, 1:03:28 PM10/22/14

to StatForLing with R

Just quick comment: it seems to be true that WRS needs the
vectors/data frames to be of equal length as they would be in a data
frame, but that also means that you can just fill up the shorter
vectors/factors with NAs to make them have the same length and then it
should work. See how in the following snippet, VA has one NA and thus
only 599 data points, not 600 like the other two variables involved,
but it still works:

##########################
rm(list=ls(all=TRUE)); set.seed(2113)
TC <- factor(sample(c("S","W"), 600, replace=TRUE))

VA <- factor(c(NA, sample(c("G","E", "S"), 599, replace=TRUE)))
LE <- floor(rnorm(600, 1,7))

Thomas Brunner

unread,

Oct 25, 2014, 5:40:58 AM10/25/14

to statforli...@googlegroups.com

Thanks, Stefan, for getting back to me so quickly.

Just a quick update: I've succeeded in getting t2way() to work on my data. What did the trick, though, was not adding NAs to the shorter vectors, but, in fact, getting rid of them. I just deleted all NAs from the (wide) list using the following line of code:

DATA_WIDE_CLEAN <- lapply(DATA_WIDE, na.exclude)

Ever since then the robust ANOVAs have run like clockwork – what a useful test, thanks so much for putting me to it! Obviously, t2way() doesn't mind differing lengths of input vectors as long as there are no NAs (Stefan's wide data frame created in the previous post, incidentally, doesn't contain NAs either – as far as I can see, they get lost during the transformation from long to wide format).