Need help doing two-way anova with interaction

13 views
Skip to first unread message

mh...@ucdavis.edu

unread,
Apr 10, 2018, 2:56:49 AM4/10/18
to Davis R Users' Group
Hey guys, 
sorry for the simple question but I'm very new to this.
For this scenario, there were 23 judges tasting 6 different samples (133, 671, 692, 973, 382, 415) and gave them ratings (which is attached in the ratings.xls file).
I'm been trying to do a two-way anova where i wanted to see if the ratings values changed due to judge and samples treatments (interaction between judge and sample groupings).
I've tried putting it in longform where it looked kinda like the screenshot i attached, where "ID" is the judge number and "variable" is the sample and "value" is the rating.

1) should i put "ID" and "variable" as factors? Originally, when i do >str(myDataLong), the "ID" shows up as integers

2) even if i ditch 1) and try to go straight to 
>model1<-aov(value~ID*variable,data=myDataLong)
>summary(model1)
I only get Df, Sum Sq, and Mean Sq, with no F value or Pr(>F)
I don't understand why it won't give me F or Pr(>F)

3)even if i try
>model2<-lm(value~ID*variable,data=myDataLong)
>Anova(model2,type="III")
I get an error code saying it's impossible because residual df=0. What have i done wrong to get residual df=0?

I'm very lost using R and i tried getting information from google and youtube but they all seem to have different strategies. That's why i'm all over the map in trying to figure this out. I would greatly appreciate it if someone could help me out with this. There's no need to answer all 3 questions I had as i'm sure if i understood the solution to one of my questions, i could eventually figure this out.


Ratings.xls
Screen Shot 2018-04-09 at 11.44.56 PM.png

Evan Eskew

unread,
Apr 10, 2018, 11:14:09 AM4/10/18
to davi...@googlegroups.com
Hi,

I'm looking at this quickly, so sorry for the short reply. But in regards to your question 1), you definitely want both "ID" and "variable" to be factors in your analysis. Your "ID" variable is simply identifying the judge in question, and the "variable" variable is identifying the sample tasted. Neither of these have any meaning in your analysis if you treat them as integers or numeric variables, so they should be factors in the ANOVA. Hope that helps and perhaps it will clear up some of your other issues.

Best,
Evan

--
Check out our R resources at http://d-rug.github.io/
---
You received this message because you are subscribed to the Google Groups "Davis R Users' Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/davis-rug.
For more options, visit https://groups.google.com/d/optout.

Evan Batzer

unread,
Apr 10, 2018, 11:38:42 AM4/10/18
to davi...@googlegroups.com
To add to Evan's response:

I agree that you need to set both "ID" and "variable" to factors in this case.

For questions 2 and 3, I think you might be getting your df = 0 error based on the way you are setting up the model. Calling aov(value ~ ID * variable) or anova(lm(value ~ ID * variable)) estimates parameters for all the "ID" values, all of the "variable" values, and all of the interaction terms between them. If you don't have any replication, where a judge tasted the same sample multiple times, the total number of parameters you fit in this model equals the number of observations, so there's no error and you can't calculate an F-statistic.

Instead, I think you'll likely want to construct an additive model that doesn't contain the interaction terms. In R, the command would be either:

aov(value ~ ID + variable) 

or

anova(lm(value ~ ID + variable)

If you want to still construct a model with interaction terms, there are more complicated ways to do so in cases like this, where you only have one observation for each unique combination of factors.

Best,
Evan

PhD Student, Eviner Lab
University of California, Davis

Hanna Kahl

unread,
Apr 11, 2018, 8:01:03 PM4/11/18
to davi...@googlegroups.com
I think you should have a seperate column for each term in your model including ID, variable, and value. 

Best, 
Hanna 

--
Check out our R resources at http://d-rug.github.io/
---
You received this message because you are subscribed to the Google Groups "Davis R Users' Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+...@googlegroups.com.

Jaime Ashander

unread,
Apr 11, 2018, 10:36:15 PM4/11/18
to davi...@googlegroups.com
Evan B points out the key issue here:

> If you don't have any replication, where a judge tasted the same sample multiple times, the total number of parameters you fit in this model equals the number of observations

To see the issue, read in your data (I specify the ID is txt and data is numeric for convenience, and called the sample and rating by those names):
library(readxl)
library(tidyr)
d <- read_excel('Ratings.xls', col_types=c("text", rep("numeric", 6)))

Then reshape the data and use xtabs to see that there is only one observation per factor level. (You could see this from your original data but this method works with larger data sets and more complex interactions of factors.)

dl <-d %>%
    gather(sample, rating, -ID)
xtabs(~ sample + ID, dl)
#>       ID
#> sample 1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 23 3 4 5 6 7 8 9
#>    133 1  1  1  1  1  1  1  1  1  1  1 1  1  1  1  1 1 1 1 1 1 1 1
#>    382 1  1  1  1  1  1  1  1  1  1  1 1  1  1  1  1 1 1 1 1 1 1 1
#>    415 1  1  1  1  1  1  1  1  1  1  1 1  1  1  1  1 1 1 1 1 1 1 1
#>    671 1  1  1  1  1  1  1  1  1  1  1 1  1  1  1  1 1 1 1 1 1 1 1
#>    692 1  1  1  1  1  1  1  1  1  1  1 1  1  1  1  1 1 1 1 1 1 1 1
#>    973 1  1  1  1  1  1  1  1  1  1  1 1  1  1  1  1 1 1 1 1 1 1 1

With only one observation you can't estimate an interaction in a 2-way anova because there are no degrees of freedom to estimate the residual variance (error). To see why this is, consult the example of how to do an ANOVA by hand and note that without replication within each combination of factor levels, K=1.

What can you do instead?  I'm not sure if Evan B meant to suggest that there are ways to model (which implies, to me, a way to estimate a parameter) an interaction term. I don't know if that's possible with this type of data however, you can test various null hypotheses about an interaction via one of many tests for non-additivity, some of which are provided in the r package additivityTests. The exact test you should use depends on what you want to assume about the structure of the interaction (read the linked review).

Note that even if there is no indication of an interaction there is still another potential issue with your data and interpretation of the two-way ANOVA, which is that the outcomes are ratings and thus non-normal. This may matter or not depending on your exact question, for example if you are making a quantitative interpretation of coefficients this would matter. It doesn't matter for simply testing whether there is an effect of sample. If you want method designed for this kind of outcome variable, see the r package ordinal.

Reply all
Reply to author
Forward
0 new messages