> fixef(object = outputREMLf90)
$factor1
value s.e.
A 254.1394 41.35154
B 259.0089 41.35154
C 0.0000 0.00000
D 0.0000 0.00000
$factor2
value s.e.
S 0 0
T 0 0
U 0 0
V 0 0
W 0 0
X 0 0
Y 0 0
Z 0 0
$factor1_factor2
value s.e.
A_S 0.0000 0.00000
B_T 0.0000 0.00000
C_U 210.8600 85.53294
C_V 387.1200 85.53294
C_W 152.2100 85.53294
C_X 321.3567 85.53294
C_Y 238.0267 85.53294
C_Z 244.4800 85.53294
D_U 349.6333 85.53294
D_V 314.5500 85.53294
D_W 286.3400 85.53294
D_X 206.3833 85.53294
D_Y 267.8600 85.53294
D_Z 100.0700 85.53294
This looks half-right to me, as the means by factor level I see using aggregate function in R look like this:
> aggregate(response ~ factor1,data=fake_data,FUN=mean)
factor1 response
1 A 254.1394
2 B 259.0089
3 C 259.0089
4 D 254.1394
> aggregate(response ~ factor2,data=fake_data,FUN=mean)
factor2 response
1 S 254.1394
2 T 259.0089
3 U 280.2467
4 V 350.8350
5 W 219.2750
6 X 263.8700
7 Y 252.9433
8 Z 172.2750
> aggregate(response ~ factor1 + factor2,data=fake_data,FUN=mean)
factor1 factor2 response
1 A S 254.1394
2 B T 259.0089
3 C U 210.8600
4 D U 349.6333
5 C V 387.1200
6 D V 314.5500
7 C W 152.2100
8 D W 286.3400
9 C X 321.3567
10 D X 206.3833
11 C Y 238.0267
12 D Y 267.8600
13 C Z 244.4800
14 D Z 100.0700
I was wondering how that could be happening? Is there a better way for me to specify the factors and formula?
Dear Jared,
The issue is that your design matrix is singular. I.e., the arrangement of the observations does not allow to identify all the effects that you are specifying in the model.
You can see this in the contingency table of the number of observations by categorical factor:
> xtabs( ~ factor1+factor2, fake_data)
factor2
factor1 S T U V W X Y Z
A 18 0 0 0 0 0 0 0
B 0 18 0 0 0 0 0 0
C 0 0 3 3 3 3 3 3
D 0 0 3 3 3 3 3 3
from where you can see that you cannot possibly estimate separate effects for A, for S and for AxS, and the same occurs with the effects of B, T and BxT. They are "confounded" and are not identifiable.
For the block with 3 observations per group, the situation is
trickier to see, but the underlying problem is the same. You can,
for instance, estimate the effect of C by aggregating observations
from the third row, and the effects of U..Z by aggregating the
respective columns. But then you ran out of observations and
cannot assess the effect of D nor any of the interactions.
What breedR does is to arbitrarily put some of the estimates at 0, and give results for those that can be estimated. It is similar to what function lm() does:
summary(
lm(response ~ 1 + factor1 + factor2 + factor1_factor2, data = fake_data)
)
#>
#> Call:
#> lm(formula = response ~ 1 + factor1 + factor2 + factor1_factor2,
#> data = fake_data)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -267.54 -110.19 20.55 97.99 238.12
#>
#> Coefficients: (10 not defined because of singularities)
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 254.139 34.676 7.329 8.2e-10 ***
#> factor1B 4.869 49.039 0.099 0.9212
#> factor1C -9.659 91.743 -0.105 0.9165
#> factor1D -154.069 91.743 -1.679 0.0985 .
#> factor2T NA NA NA NA
#> factor2U 249.563 120.120 2.078 0.0422 *
#> factor2V 214.480 120.120 1.786 0.0794 .
#> factor2W 186.270 120.120 1.551 0.1264
#> factor2X 106.313 120.120 0.885 0.3798
#> factor2Y 167.790 120.120 1.397 0.1678
#> factor2Z NA NA NA NA
#> factor1_factor2B_T NA NA NA NA
#> factor1_factor2C_U -283.183 169.875 -1.667 0.1009
#> factor1_factor2C_V -71.840 169.875 -0.423 0.6739
#> factor1_factor2C_W -278.540 169.875 -1.640 0.1065
#> factor1_factor2C_X -29.437 169.875 -0.173 0.8630
#> factor1_factor2C_Y -174.243 169.875 -1.026 0.3093
#> factor1_factor2C_Z NA NA NA NA
#> factor1_factor2D_U NA NA NA NA
#> factor1_factor2D_V NA NA NA NA
#> factor1_factor2D_W NA NA NA NA
#> factor1_factor2D_X NA NA NA NA
#> factor1_factor2D_Y NA NA NA NA
#> factor1_factor2D_Z NA NA NA NA
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 147.1 on 58 degrees of freedom
#> Multiple R-squared: 0.1517, Adjusted R-squared: -0.03843
#> F-statistic: 0.7979 on 13 and 58 DF, p-value: 0.6595
Unfortunately, there is not much you can do about it as the data simply does not contain the information necessary for identifying those effects (it does not mean that the effects do not exist). You can always use the model for prediction. But estimating the individual effects seems unfeasible with these data.
Hope it helps
ƒacu.-
--
Report issues at https://github.com/famuvie/breedR/issues
---
You received this message because you are subscribed to the Google Groups "breedR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to breedr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/breedr/6cffae9f-3e6d-4db6-b535-67fbc9f4b8fan%40googlegroups.com.
Jared,
There is a toy example [1] in breedR's "Overview" which briefly explains how to introduce interactions. You already discovered that. Although simple, the example works correctly.
I have used interactions to illustrate Genotype x Environment effects in the slides [2]. These are from the workshop materials [3], which I also include in case of interest.
Hope it helps.
ƒacu.-
[1] https://github.com/famuvie/breedR/wiki/Overview#interactions
To view this discussion on the web visit https://groups.google.com/d/msgid/breedr/fb18097a-1a2b-4cc0-be87-c130febcfd0an%40googlegroups.com.