mediation model: why are lavaan mediator estimates different to lm.beta estimates?

Ant Warland

unread,

Jun 10, 2020, 7:39:06 AM6/10/20

to lavaan

Hi,

I'm trying to determine whether the model I'm using for a multiple mediation analysis is correct. To do this I'm comparing the mediator estimates generated by the following lavaan model to estimates generated for equivalent regressions performed in lm.beta:

Here X is a binary variable, M1-4 are all continuous variables which have been transformed to z-scores (because lavaan said "some observed variances are (at least) a factor 1000 times larger than others" and failed to fit a model), C1-4 are a mix of binary and continuous variables.

Data <- data.frame(X = X, Y = Y, M1 = M1, M2 = M2, M3 = M3, M4 = M4, C1 = C1, C2 = C2, C3 = C3, C4 = C4)

model <- ' # direct effect

Y ~ c*X + C1 + C2 + C3 + C4

# mediator

M1 ~ a1*X + C1 + C2 + C3 + C4

M2 ~ a2*X + C1 + C2 + C3 + C4

M3 ~ a3*X + C1 + C2 + C3 + C4

M4 ~ a4*X + C1 + C2 + C3 + C4

Y ~ b1*M1 + b2*M2 + b3*M3 + b4*M4

# indirect effect (a*b)

indirect1 := a1*b1

indirect2 := a2*b2

indirect3 := a3*b3

indirect4 := a4*b4

# total effect

total_indirect := (a1*b1) + (a2*b2) + (a3*b3) + (a4*b4)

direct := c

proportion_mediated :=((a1*b1) + (a2*b2) + (a3*b3) + (a4*b4) )/ (c + (a1*b1) + (a2*b2) + (a3*b3) + (a4*b4) )

total := c + (a1*b1) + (a2*b2) + (a3*b3) + (a4*b4)

# M1 ~~ M2

#M1 ~~ M3

#M1 ~~ M4

#M2 ~~ M3

#M2 ~~ M4

#M3 ~~ M4

'

fit <- sem(model, data = Data)

summary(fit, fit.measures = TRUE, standardized = TRUE)

Std.lv estimates for regressions on X:

a1 = -0.309

a2 = -0.210

a3 = -0.205

a4 = -0.192

Std.all estimates for regressions on X:

a1 = -0.019

a2 = -0.013

a3 = -0.013

a4 = -0.012

Using the same variables, lm regressions of M1-M4 have the format

lm1 <- lm(formula = M1 ~ X + C1 + C2 + C3 + C4, data = df)

Lm.beta standardised coefficients of X:

lm1(M1) -0.018179344

lm2(M2): -0.01388415

lm3(M3): -0.015303170

lm4(M4): -0.011337755

I assume std.all are the fully standardised coefficients (betas) but why are they different to std.lv when I'm using z-scores?

But my main question is why are lavaan's standardised estimates different to the lm.beta estimates?

I want to use lavaan for this mediation analysis as it produces p-values for mediation effects unlike other methods I know of.

Thanks in advance!

Ant

Terrence Jorgensen

unread,

Jun 12, 2020, 5:32:49 AM6/12/20

to lavaan

Data <- data.frame(X = X, Y = Y, M1 = M1, M2 = M2, M3 = M3, M4 = M4, C1 = C1, C2 = C2, C3 = C3, C4 = C4)

Without a script generating values for X, Y, etc., this is not reproducible.

I assume std.all are the fully standardised coefficients (betas)

Yes

but why are they different to std.lv when I'm using z-scores?

You don't have latent variables in your model, so ignore that column. It might be treating a subset of observed variables as latent since you have regressions among the observed variables (lavaan internally has to "put them into latent space" by treating them as single-indicator constructs).

But my main question is why are lavaan's standardised estimates different to the lm.beta estimates?

Because SEM is based on asymptotic theory, whereas OLS is unbiased even in small samples. I imagine the (already similar) estimates would be more exactly the same if you set sample.cov.rescale=FALSE (which makes less of a difference in larger samples; see ?lavOptions for details) and if you estimated saturated models (i.e., estimate the residual covariances among your mediators).

Terrence D. Jorgensen

Assistant Professor, Methods and Statistics

Research Institute for Child Development and Education, the University of Amsterdam

http://www.uva.nl/profile/t.d.jorgensen

Ant Warland

unread,

Jun 12, 2020, 7:22:14 AM6/12/20

to lavaan

Hi Terrence,

Thanks for the explanation. Based on what you've said those std.all estimates will be fine for my purposes.