That's an interesting question.A multiple group analysis can have the same information as a single group analysis, and can give exactly the same results (or very nearly exactly the same) , even when data are missing.
Here's some code to generate a data frame that has some missing data (missing at random, not completely at random), and then analyze it: y1 and y2 are regressed on x.
First, generate some data where the true estimates are 0.5 and 0.7.
library(lavaan)
set.seed(12345)
n <- 10000
# generate some data:
df <- data.frame(x = rep(c(0, 1), n),
F = rnorm(n),
randvar = runif(n))
df$y1 <- df$F + rnorm(n)
df$y2 <- df$F + rnorm(n)
df$y1 <- df$y1 + df$x * 0.5
df$y2 <- df$y2 + df$x * 0.7
# now get rid of scores, based on x, so data are mar. Missingness on y1 depends on y2, x and a random factor.
df$y1 <- ifelse(df$x == 0 & df$randvar > 0.6 & df$y2 > 0, NA, df$y1)
We end up with 20% of y1 missing in group x = 0
Let's do a regression, and see what estimates we get:
summary(lm(cbind(y1, y2) ~ x, data = df))
> summary(lm(cbind(y1, y2) ~ x, data = df))
Response y1 :
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.15223 0.01583 -9.616 <2e-16 ***
x 0.66517 0.02124 31.318 <2e-16 ***
Response y2 :
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.28044 0.01553 -18.05 <2e-16 ***
x 0.98025 0.02084 47.04 <2e-16 ***
(I cut out the less interesting bits).
The parameters for y1 and y2 are both biased upwards. They should be 0.5 and 0.7, they are 0.67 and 0.98.
Let's do a single group analysis in Lavaan:
singlemodel <- "y1 ~ x
y2 ~ x
y1 ~~ y1
y2 ~~ y2
y2 ~~ y1
"
> m1 <- sem(singlemodel, data = df, missing = "ml")
> summary(m1)
Regressions:
Estimate Std.Err z-value P(>|z|)
y1 ~
x 0.521 0.021 24.765 0.000
y2 ~
x 0.701 0.020 35.387 0.000
(Cutting out the boring bits).
We got 0.521 and 0.701 - pretty close to the right estimates! Cool!
Now let's do a multiple group:
twogroupmodel <-
"y1 ~ c(a, b) * 1 # intercept
y2 ~ c(c, d) * 1 # intercept
y1 ~~ c(v1a, v1) * y1
y2 ~~ c(v2a, v2) * y2
y2 ~~ c(v12a, v12) * y1
diff1 := b - a
diff2 := d - c"
m2 <- sem(twogroupmodel, data = df, group = "x", missing = "ml")
summary(m2)
Defined Parameters:
Estimate Std.Err z-value P(>|z|)
diff1 0.524 0.021 24.745 0.000
diff2 0.701 0.020 35.387 0.000
For y1 (diff1), which had missing data, the estimate is within 0.02 of the single group model. SE and z are (pretty much) identical to the single group model.
For y2 (diff2), no missing data, the estimate, SE and Z are the same.
Jeremy