Hi lavaan users,
I recall that users have asked questions about R-squares and differences in R-squares. I know lavaan can print R-squares directly in the output. However,
(a) they are not parameters and we cannot use them for user-defined parameters, and
(b) no p-values are available for them.
A trick came to my mind. I am not sure whether this has been mentioned before. This trick might have already been proposed because R-square is a topic frequently asked here. However, I searched and have not yet found this solution in previous posts (sorry if I overlooked related posts). So, I share it here (maybe again?).
This trick is really a "trick"/workaround, doing something in an unconventional way. Therefore, I share it here *not* to claim that it must work. Instead, I share it here to seek comments on two questions:
(a) Does it really work? Did I overlook something?
(b) If it does *sometimes* work, when it does *not* work?
This is an example:
library(lavaan)
#> This is lavaan 0.6-13
#> lavaan is FREE software! Please report any bugs.
set.seed(870432)
n <- 200
x1 <- rnorm(n)
x2 <- rnorm(n)
y1 <- .3 * x1 + .3 * x2 + sqrt(1 - .3^2 - .3^2) * rnorm(n)
y2 <- .0 * x1 + .1 * x2 + sqrt(1 - .0^2 - .1^2) * rnorm(n)
dat <- data.frame(x1, x2, y1, y2)
mod <-
"
y1 ~ x1 + x2
y2 ~ x1 + x2
y1 ~~ ev1 * y1
y2 ~~ ev2 * y2
y1rsq := 1 - ev1
y2rsq := 1 - ev2
rsqdiff := y1rsq - y2rsq
"
fit <- lavaan::sem(mod, dat)
# y1rsq, y2rsq, and rsqdiff in the original solution
# are not interpreted. They can be negative.
parameterEstimates(fit, rsquare = TRUE)[11:15, ]
#> lhs op rhs label est se z pvalue ci.lower ci.upper
#> 11 y1rsq := 1-ev1 y1rsq 0.214 0.079 2.725 0.006 0.060 0.368
#> 12 y2rsq := 1-ev2 y2rsq -0.008 0.101 -0.075 0.940 -0.205 0.190
#> 13 rsqdiff := y1rsq-y2rsq rsqdiff 0.222 0.127 1.743 0.081 -0.028 0.471
#> 14 y1 r2 y1 0.113 NA NA NA NA NA
#> 15 y2 r2 y2 0.012 NA NA NA NA NA
# y1rsq, y2rsq, and rsqdiff in the standardized solution
# are indeed the R-squares.
standardizedSolution(fit)[11:13, ]
#> lhs op rhs label est.std se z pvalue ci.lower ci.upper
#> 11 y1rsq := 1-ev1 y1rsq 0.113 0.041 2.762 0.006 0.033 0.194
#> 12 y2rsq := 1-ev2 y2rsq 0.012 0.015 0.798 0.425 -0.018 0.043
#> 13 rsqdiff := y1rsq-y2rsq rsqdiff 0.101 0.043 2.364 0.018 0.017 0.184
# Verify the results
est <- parameterEstimates(fit, rsquare = TRUE)
std <- standardizedSolution(fit)
# Compare the R-squares
est[14:15, "est"]
#> [1] 0.11322392 0.01236012
std[11:12, "est.std"]
#> [1] 0.11322392 0.01236012
# Compare the R-square difference
est[14, "est"] - est[15, "est"]
#> [1] 0.1008638
std[13, "est.std"]
#> [1] 0.1008638
To include R-square of a variable as a user-defined parameter, I labelled the error variance and then defined 1 - (error variance). In the unstandardized solution. this
user-defined variable is not meaningful, and can even be negative.
However, in the unstandardized solution (with std.all, the default), this variable is standardized. Therefore, 1 - (error variance after standardization) becomes the R-square.
With this trick, the difference between two R-squares can become a user-defined parameter, by computing the difference of the R-squares. Again, we need to look at the standardized solution.
WARNING: There are two issues with this approach:
1) When an R-square is close zero or sample size is small, the delta method SE, CI, and p-value in the standardized solution may not be appropriate due to the sampling distribution of R-square.
2) Even with se = "boot", the CIs in the standardized solution are not bootstrap CIs (they are delta method CIs based on bootstrap VCOV). Actually, even *if* the CIs in the standardized solution *are* bootstrap Cis, they may still not be appropriate for R-square in some cases.
Despite these two issues, it may still be useful to be able to use R-squares as user-defined parameters.
May I have your comments on this trick? Any pitfall?
-- Shu Fai