ANOVA with heteroscedasticity

228 views
Skip to first unread message

Timothée Flutre

unread,
Dec 6, 2016, 7:05:58 AM12/6/16
to R-inla discussion group
Hello,
I would like to fit an ANOVA model with heteroscedasticity. How can I do it with INLA? I tried with a model="iid" for each treatment level, but the results look very strange.
See a small, reproducible example below.
(In the end, I would like to add other "random effects", but first I would like to understand this "simpler" case.)
Thanks in advance,
Tim

## model:
## goal: apply a treatment with B levels, and compare their means, while allowing for different error variances
## B: number of treatment levels
## N.b: number of subjects (one per treatment level)
## y_sb = mu + beta_b + e_sb with e_sb ~ N(0, sigma_b^2)

## simul
set.seed(1859)
B <- 3
N.b <- 50
N <- B * N.b
ids <- sprintf(fmt="id%03i", 1:N.b) # identifiers of the subjects
dat <- data.frame(block=rep(LETTERS[1:B], each=N.b),
                  id=rep(ids, B))
X <- model.matrix(~ 1 + block, data=dat)
mu <- 50
(beta <- c(mu, rnorm(n=B-1, mean=0, sd=3)))
(sigma.b <- setNames(sample(1:8, size=B, replace=ifelse(8 < B, TRUE, FALSE)),
                     levels(dat$block)))
epsilon <- do.call(c, lapply(sigma.b, function(sd.b){
  rnorm(n=N.b, mean=0, sd=sd.b)
}))
dat$y <- X %*% beta + epsilon
str(dat)
do.call(rbind, tapply(dat$y, dat$block, function(x){
  c("mean"=mean(x), "sd"=sd(x))
}))
boxplot(y ~ block, data=dat, varwidth=T, notch=T) # we can clearly see the different error variances

## fit via OLS
fit.ols <- lm(formula=y ~ 1 + block, data=dat)
summary(fit.ols)
dat2 <- cbind(dat,
              e.hat.ols=residuals(fit.ols))
dat2$e.hat.std.ols <- dat2$e.hat.ols / summary(fit.ols)$sigma
do.call(rbind, tapply(dat2$e.hat.std.ols, dat2$block, function(x){
  c("mean"=mean(x), "sd"=sd(x))
}))
boxplot(e.hat.std.ols ~ block, data=dat2, varwidth=T, notch=T) # assumption of homoscedasticity clearly violated

## fit via GLS
library(nlme)
fit.gls <- gls(model=y ~ 1 + block, data=dat,
               weights=varIdent(form=~1|block))
summary(fit.gls)
dat2 <- cbind(dat,
              e.hat.gls=residuals(fit.gls))
sigma.b.hat <- c(1, exp(coef(fit.gls$modelStruct$varStruct))) * fit.gls$sigma
dat2$e.hat.std.gls <- dat2$e.hat.gls / rep(sigma.b.hat, each=N.b)
do.call(rbind, tapply(dat2$e.hat.std.gls, dat2$block, function(x){
  c("mean"=mean(x), "sd"=sd(x))
}))
boxplot(e.hat.std.gls ~ block, data=dat2, varwidth=T, notch=T) # looks fine

## fit via INLA ?
library(INLA)
dat2 <- cbind(dat, model.matrix(~-1+dat$block))
colnames(dat2) <- c(colnames(dat), c("blockA", "blockB", "blockC"))
fit.inla <- inla(formula=y ~ 1 + f(blockA, model="iid") +
                   f(blockB, model="iid") +
                   f(blockC, model="iid"),
                 data=dat2)
summary(fit.inla) # results look very strange...

INLA help

unread,
Dec 6, 2016, 7:11:48 AM12/6/16
to Timothée Flutre, R-inla discussion group
On Tue, 2016-12-06 at 04:05 -0800, Timothée Flutre wrote:
> Hello,
> I would like to fit an ANOVA model with heteroscedasticity. How can I
> do it with INLA? I tried with a model="iid" for each treatment level,
> but the results look very strange.
> See a small, reproducible example below.
> (In the end, I would like to add other "random effects", but first I
> would like to understand this "simpler" case.)
> Thanks in advance,
> Tim
>
> ## model:
> ## goal: apply a treatment with B levels, and compare their means,
> while allowing for different error variances 
## B: number of treatment levels
>



if you know the scale between the variances, it easier. like knowing
that var_1 = 2*var_2

H
--
Håvard Rue
he...@r-inla.org

Timothée Flutre

unread,
Dec 6, 2016, 7:24:48 AM12/6/16
to R-inla discussion group, timf...@gmail.com, he...@r-inla.org
I can fit one model per treatment level, say lm(y ~ 1, data=dat[,dat$block=="A"]) for block "A", and so on for each block. And then I can compare the estimated error variances, estim.var.errors.

But how can I give this info to inla?
Do you mean to use weights, something like fit.inla <- inla(y = 1 + f(block, model="iid"), data=dat, weights=1/rep(estim.var.errors,each=N.b))?
I am not sure how to directly use "the scale between the varainces", as you say.

INLA help

unread,
Dec 7, 2016, 4:26:32 AM12/7/16
to Timothée Flutre, R-inla discussion group
On Tue, 2016-12-06 at 04:24 -0800, Timothée Flutre wrote:



I have looked more into the code. you have three likelihoods, one for
each block, so its coded with a matrix as the response, one column for
each family.
http://www.r-inla.org/models/tools#TOC-Models-with-more-than-one-type-o
f-likelihood





n = nrow(dat2)
Y = matrix(NA, n, 3)
k = 1
for(b in c("A",  "B",  "C")) {
    j = which(dat2$block == b)
    Y[j, k] = dat2$y[j]
    k = k+1
}
dat3 = list(Y=Y, block = dat2$block)
fit.inla <- inla(formula=Y ~ 1 + block, data=dat3,
                 family = rep("gaussian", 3))
summary(fit.inla)

--
Håvard Rue
he...@r-inla.org

Timothée Flutre

unread,
Dec 9, 2016, 10:07:10 AM12/9/16
to he...@r-inla.org, R-inla discussion group
Thanks a lot, that's perfect!

Timothée Flutre

Facundo Muñoz

unread,
Dec 9, 2016, 3:54:37 PM12/9/16
to r-inla-disc...@googlegroups.com

Hi Tim,

there are three issues in your call to inla:

1. The model you are fitting in inla has 4 variance components: one for each block and the residual variance which is always there by default, and is provoking some confounding. You can fix it to a high precision using control.family().

2. You are not estimating group-level means. The iid models are centered at 0 by default. You could either remove this constraint or add the group means as fixed effects

3. When you use blockA/B/C variables for (f, iid) it is assuming that they are random effects with only two levels each (0 and 1). And trying to estimate a variance from there! What I think you wanted to do is to use id as a random effect weighted by blockA/B/C.

In summary, I believe this works as you expected:

dat2 <- transform(dat2, id2 = id, id3 = id)
fit.inla <- inla(formula=y ~ 0 + block + f(id, blockA, model="iid") +
                   f(id2, blockB, model="iid") +
                   f(id3, blockC, model="iid"),
                 control.family = list(
                   hyper = list(
                     prec = list(
                       initial = 10,
                       fixed = TRUE
                     )
                   )
                 ),
                 data=dat2)

summary(fit.inla)
1/sqrt(fit.inla$summary.hyperpar[,"mean"])


Best regards,

    ƒacu.-

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To post to this group, send email to r-inla-disc...@googlegroups.com.
Visit this group at https://groups.google.com/group/r-inla-discussion-group.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages