Re: Power Simsem - Continuous Exogenous - Standardized Betas

63 views
Skip to first unread message

Sunthud Pornprasertmanit

unread,
Jun 9, 2026, 9:14:05 PMJun 9
to lav...@googlegroups.com

Hi Ioannis,

If I understand correctly, I think your implementation makes the total variances of Comp, Warm, and Outcome equal to 1, given the exogenous covariance matrix and the regression coefficients. However, I would not say that all betas are therefore conventional standardized effect sizes.

For the dummy variable (Y), if it is coded 0/1, the coefficient is a mean difference between the two groups in the metric of the outcome. If the total variance of the outcome is 1, this coefficient is close to a standardized mean difference, but it is not exactly Cohen’s d, because Cohen’s d uses the within-group SD as the denominator. In a simple model, d = b / SD(e). If you instead force SD(e)=1, then the coefficient for Y is closer to Cohen’s d, but then the total variance of the outcome is not 1 anymore. This also makes the coefficients for continuous predictors less like conventional standardized regression coefficients.

I would not standardize X, Y, X^2, and XY again. For the continuous predictor (X), I believe you already create X as N(0,1) before creating the derived terms, so X is already standardized. That is, create X first, then compute X^2 and XY from that standardized X, while keeping Y as a dummy variable. Then, keep these values and put them into simsem directly. Do not standardize them again.

I am not completely sure whether a multiple-group setup would be better for your purpose. It may be cleaner if the main effect of Y is the focus, because then you can define the outcome variance within each group. In that case, the raw mean difference between groups would correspond more closely to a standardized mean difference. But for the regression setup you described, I would not standardize the dummy variable itself.

Best,
Sunthud

John Pap

unread,
Jun 11, 2026, 6:40:28 AMJun 11
to lavaan

Hi Sunthud!

Thank you so much for taking time and answering my question.

In my case, I generate from a normal distribution, and takes two fixed values (not 0 and 1, but two values defined in relation to the distribution of ).

I was reading about the idea of standardizing yesterday and I found support for your opinion in a recent paper on how to standardize (https://psycnet.apa.org/doiLanding?doi=10.1037%2Fhea0001188). So, I guess in my case, Y must be also standardized.

I also understood that in this case its better to set total variance to 1. Since, I also know the covariance of my variables, I used the PS argument..

I hope I understood your reply correctly.

Thanks,
Ioannis


My code is below



 b <- 0.25 #Generic beta
n  <- 200

set.seed(123)
Xvar <- rnorm(n, mean = 10, sd = 4)
Yvar <- sample(c(6, 16), size = n, replace = TRUE, prob = c(0.5, 0.5))

X_c  <- (Xvar - mean(Xvar)) / sd(Xvar)    # standardize X
Y_c  <- (Yvar - mean(Yvar)) / sd(Yvar)    # standardize Y
X_c2 <- X_c^2                             # do NOT re-standardize
XY_c <- X_c * Y_c                         # do NOT re-standardize

exodata1 <- data.frame(X = X_c, Y = Y_c, X2 = X_c2, XY = XY_c)
exoCov   <- cov.wt(exodata1, method = "ML")$cov

#----------------------------------PATH A------------------------------------#
#Only two betas are non-zero in path a (for simplicity)
b1c <- 0;  b2c <- b;   b3c <- 0;  b4c <- 0
b1w <- 0;  b2w <- -b;  b3w <- 0;  b4w <- 0
#----------------------------------PATH B------------------------------------#
bcO <- b
bwO <- b
#---------------------------------------------------------------------------#
# Exogenous Paths
kappa.free <- kappa.pop <- matrix(0,
                                  nrow = 3,
                                  ncol = 4,
                                  dimnames = list(c("Comp", "Warm", "Outcome"),
                                                  names(exodata1)))

kappa.free["Comp","X"]  <- "b1c";  kappa.pop["Comp","X"]  <- b1c
kappa.free["Comp","Y"]  <- "b2c";  kappa.pop["Comp","Y"]  <- b2c
kappa.free["Comp","X2"] <- "b3c";  kappa.pop["Comp","X2"] <- b3c
kappa.free["Comp","XY"] <- "b4c";  kappa.pop["Comp","XY"] <- b4c
kappa.free["Warm","X"]  <- "b1w";  kappa.pop["Warm","X"]  <- b1w
kappa.free["Warm","Y"]  <- "b2w";  kappa.pop["Warm","Y"]  <- b2w
kappa.free["Warm","X2"] <- "b3w";  kappa.pop["Warm","X2"] <- b3w
kappa.free["Warm","XY"] <- "b4w";  kappa.pop["Warm","XY"] <- b4w

exoPaths <- simsem::bind(free = kappa.free, popParam = kappa.pop)

#---------------------------------------------------------------------------#
# Endogenous Paths
beta.free <- beta.pop <- matrix(0,
                                nrow = 3,
                                ncol = 3,
                                dimnames = list(c("Comp", "Warm", "Outcome"),
                                                c("Comp", "Warm", "Outcome")))

beta.free["Outcome","Comp"] <- "bcO";  beta.pop["Outcome","Comp"] <- bcO
beta.free["Outcome","Warm"] <- "bwO";  beta.pop["Outcome","Warm"] <- bwO

endoPaths <- simsem::bind(free = beta.free, popParam = beta.pop)

#---------------------------------------------------------------------------#
# User-defined indirect effects
userParams <- '
        ind.effect.Comp := b2c*bcO
        ind.effect.Warm := b2w*bwO
        '
#---------------------------------------------------------------------------#
# Residual variances and covariance


# Use findFactorResidualVar() to ensure total variance = 1 (standardized betas)
# Required because correlated predictors (X, X2, XY) prevent automatic computation
resid.free <- resid.pop <- matrix(0, 3, 3,
                                  dimnames = list(c("Comp", "Warm", "Outcome"),
                                                  c("Comp", "Warm", "Outcome")))

corPsi_mat <- diag(3)
rownames(corPsi_mat) <- colnames(corPsi_mat) <- c("Comp", "Warm", "Outcome")

resVars <- findFactorResidualVar(
  beta   = beta.pop,
  corPsi = corPsi_mat,
  gamma  = kappa.pop,
  covcov = exoCov
)

diag(resid.free) <- NA
diag(resid.pop)  <- resVars
resid.free["Comp","Warm"] <- resid.free["Warm","Comp"] <- "covCW"
resid.pop["Comp","Warm"]  <- resid.pop["Warm","Comp"]  <- 0

residCov <- simsem::binds(free = resid.free, popParam = resid.pop)

#---------------------------------------------------------------------------#
# Assemble


simMod1 <- model.path(BE = endoPaths, PS = residCov,
                      KA = exoPaths,
                      indLab = rownames(kappa.free),
                      covLab = colnames(kappa.free),
                      con = userParams)

#---------------------------------------------------------------------------# 

# Run
set.seed(123)
datmod <- generate(simMod1, covData = exodata1, n = nrow(exodata1), type = "bootstrap")
fit1   <- analyze(simMod1, data = datmod)

rejectMCCI <- function(object) {
  CIs <- semTools::monteCarloCI(object)
  apply(CIs, MARGIN = 1, function(CI) 0 < CI["ci.lower"] | 0 > CI["ci.upper"])
}

sim1 <- sim(nRep = 100, model = fit1, covData = exodata1, n = nrow(exodata1),
            meanstructure = FALSE,
            multicore = TRUE, numProc = parallel::detectCores() - 1L,
            silent = TRUE, seed = 123, outfun = rejectMCCI)

testMCCI    <- do.call(rbind, getExtraOutput(sim1))
testMCCI_df <- as.data.frame(testMCCI)


#---------------------------------------------------------------------------#
# Average power across the two indirect effects
avg_power <- mean(colMeans(testMCCI_df)[1:2])

print(avg_power)

Sunthud Pornprasertmanit

unread,
Jun 12, 2026, 12:24:09 AMJun 12
to lav...@googlegroups.com
Hi Ioannis,

I do not see any obvious problem in your code. I think the remaining question is mainly how to define the effect size for the interaction between the grouping variable and the continuous variable.

The article you mentioned may be more directly based on the interaction of two continuous variables. For your case, because Y has only two values, the interpretation depends a lot on how Y is coded.

One possible coding is to use -0.5 and 0.5 for Y. If X is already standardized as N(0,1), then b_Z.X can be interpreted as the average effect of X across the two groups, where Z can be Comp, Warm, or Outcome. The interaction coefficient b_Z.XY would then be the difference in the X slope between the two groups. For example, if b_Z.X = 0.25 and b_Z.XY = 0.10, then the X slopes are 0.20 and 0.30 in the two groups.

So I think your code is fine computationally. The main issue is just the interpretation of the effect size. Standardizing Y is also reasonable if you want the coefficient to be in terms of a one-SD change in Y. I am just wondering whether the -0.5/0.5 coding may be easier to communicate when Y is an experimental grouping variable.

Maybe Shu Fai could also help us with the effect-size specification here.

Best,
Sunthud


--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/lavaan/79ba0db2-9081-4530-a92c-d563fbf9d637n%40googlegroups.com.

John Pap

unread,
Jun 15, 2026, 5:27:11 AMJun 15
to lavaan
Thanks so much Sunthud!

I actually read a bit on the topic. I think that the standardization is for me the best alternative. The reason is that having several studies other experimental and other no, I can more easily compare and update power analyses across studies. 
Also, in regards to the code, I made a small change to the model I estimate for sim1 from fit1 to simMod1 (as fit1 was just one model).

Thanks again for all the help explaining to me so I can understand :)

Ioannis
Reply all
Reply to author
Forward
0 new messages