54 views

Skip to first unread message

Apr 5, 2019, 4:41:11 AM4/5/19

to lavaan

Dear users,

Im using *lavaan* package to create a dataset for the simplest mediation possible (1 exogenous variable, 1 mediator, 1 dependent varible, DV; NO latent variable). As a reference I take the values and structure of Figure 3.5 of Grace's 2006 book. Then, the data generated have been analyzed under the same *lavaan* and under MPlus.

I simulated 3 datasets where: 1) all variables were continuous and normally distributed; 2) Mediator is a binary variable and; 3) reponse vble is a binary variable.

Results:

1) __Simplest mediation with only continuous variables__: There is no discrepancy between the values set when generating data and values obtained when analysed in MPlus and lavaan

2) __Simplest mediation with binary mediator__ It doesn’t discrepancy between set and abotained values both in lavaan and MPlus

3) __Simplest mediation with binary DV__ Data generated in lavaan works in MPlus, not in lavaan. In this case, it can be that the differences between lavaan and MPlus can be on the estimator. Lavaan cannot work with MLR or estimators for binary variables

Copy below the commands for the 3 cases. Sorry for the length!

1. __Simplest mediation with all continuousvariablez__ It works

#Generated data (I use the example from Grace 2006, p. 46, Fig. 3.5)

N <- 10000

x <- rnorm(N)

y1 <- 0.55*x + rnorm(N)

y2 <- 0.24*x + 0.47*y1 + rnorm(N)

LinearMedData <- data.frame(x, y1,y2)

write.xlsx(LinearMedData, "C:/Users/gaiarrido/Desktop/LinearMedData.xlsx") # write sheet

#Data analysis

LinearMedMod <- 'y1~a*x

y2~b*x+ c*y1

'

fitLinearMedMod <- sem(LinearMedMod, data = LinearMedData)

summary(fitLinearMedMod)

Regressions:

Estimate Std.Err z-value P(>|z|)

y1 ~

x (a) 0.548 0.010 55.612 0.000

y2 ~

x (b) 0.244 0.011 21.539 0.000

y1 (c) 0.456 0.010 45.379 0.000

Variances: Estimate Std.Err z-value P(>|z|)

.y1 0.981 0.014 70.711 0.000

.y2 0.990 0.014 70.711 0.000

same values in **MPlus** (LinearMedData.csv is the data generated in *lavaan*)

DATA: FILE IS LinearMedData.csv ;

VARIABLE: NAMES sort x y1 y2;

USEVARIABLES x y1 y2;

ANALYSIS: ESTIMATOR = ML;

MODEL: y1 on x;

y2 on y1;

y2 on x;

OUTPUT: SAMPSTAT

STDYX

CINTERVAL

TECH3;

Results

Estimate S.E. Est./S.E. P-Value

Y1 ON

X 0.548 0.010 55.611 0.000

Y2 ON

Y1 0.456 0.010 45.378 0.000

X 0.244 0.011 21.542 0.000

Intercepts

Y1 0.002 0.010 0.230 0.818

Y2 -0.011 0.010 -1.080 0.280

Residual Variances

Y1 0.981 0.014 70.709 0.000

Y2 0.990 0.014 70.710 0.000

2. __Simplest mediation with binary mediator__

#Generated data

N <- 10000

x <- rnorm(N)

y1dummy <- inv.logit(0.55* x) > runif(N)

#I have to do a trick to create a dummy variable depending on other when generating dataset cause lavaan cannot work still with dummy, binary or categorical variables. In words of Terrence: *It will be probit by default (logit unavailable) if you declare the outcome as categorical using the ordered= argument*

y2 <- 0.24*x + 0.47*y1 + rnorm(N)

LogitMedData <- data.frame(x, y1dummy,y2)

write.xlsx(LogitMedData, "C:/Users/gaiarrido/Desktop/LogitMedData.xlsx") # write sheet

#Data analysis

LogitMedMod <- 'y1dummy~a*x

y2~b*x+ c*y1dummy

'

fitLogitMedMod <- sem(LogitMedMod, data = LogitMedData, estimator="WLSMV" , ordered="y1dummy")

summary(fitLogitMedMod)

Regressions:

Estimate Std.Err z-value P(>|z|)

y1dummy ~

x (a) 0.345 0.013 25.990 0.000

y2 ~

x (b) 0.498 0.012 41.974 0.000

y1dummy (c) -0.013 0.014 -0.940 0.347

Same two problems here

1) I have to say that is ordered, not dummy cause *lavaan* cannot work still with dummy

2) I cannot use estimator MLR

In **MPlus**

DATA: FILE IS LogitMedData.csv ;

VARIABLE: NAMES sort x y1dummy y2;

USEVARIABLES x y1dummy y2;

CATEGORICAL ARE y1dummy;

ANALYSIS: TYPE = random;

ALGORITHM = INTEGRATION;

ESTIMATOR = MLR;

MODEL: y1dummy on x;

y2 on y1dummy;

y2 on x;

OUTPUT: SAMPSTAT

STDYX

CINTERVAL

TECH3;

Results

Estimate S.E. Est./S.E. P-Value

Y1DUMMY ON

X 0.558 0.022 25.433 0.000

Y2 ON

Y1DUMMY -0.020 0.023 -0.898 0.369

X 0.496 0.011 43.987 0.000

3. Simplest mediation with binary DV

#Generated data

N <- 10000

x <- rnorm(N)

y1 <- 0.55*x + rnorm(N)

y2dummy <- inv.logit(0.24*x+0.47*y1)> runif(N) #In the forum told me that this is the good option

#I have to do a trick to create a dummy variable depending on other when generating dataset cause lavaan cannot work still with dummy, binary or categorical variables. In words of Terrence: *It will be probit by default (logit unavailable) if you declare the outcome as categorical using the ordered= argument*

LogitDvData <- data.frame(x, y1,y2dummy)

write.xlsx(LogitDvData, "C:/Users/gaiarrido/Desktop/LogitDvData.xlsx") # write sheet

#Data analysis

LogitDvMod <- 'y1~a*x

y2dummy~b*x+ c*y1

'

fitLogitDvMod <- sem(LogitDvMod, data = LogitDvData, estimator="WLSMV" , ordered="y2dummy")

summary(fitLogitDvMod)

Regressions:

Estimate Std.Err z-value P(>|z|)

y1 ~

x (a) 0.548 0.010 55.807 0.000

y2dummy ~

x (b) 0.148 0.014 10.340 0.000

y1 (c) 0.267 0.012 22.401 0.000

Same two problems here

3) I have to say that is ordered, not dummy cause lavaan cannot work still with dummy

4) I cannot use estimator MLR

In **MPlus**

DATA: FILE IS LogitDvData.csv ;

VARIABLE: NAMES sort x y1 y2dummy;

USEVARIABLES x y1 y2dummy;

CATEGORICAL ARE y2dummy;

ANALYSIS: TYPE = random;

ALGORITHM = INTEGRATION;

ESTIMATOR = MLR;

MODEL: y1 on x;

y2dummy on y1;

y2dummy on x;

OUTPUT: SAMPSTAT

STDYX

CINTERVAL

TECH3;

Results are good

Estimate S.E. Est./S.E. P-Value

Y1 ON

X 0.548 0.010 55.402 0.000

Y2DUMMY ON

Y1 0.452 0.022 20.323 0.000

X 0.251 0.024 10.338 0.000

Apr 5, 2019, 5:59:40 AM4/5/19

to lavaan

FYI, I didn't see a single question in you entire post.

I have to do a trick to create a dummy variable depending on other when generating dataset cause lavaan cannot work still with dummy, binary or categorical variables.

Yes it can. If the variables are binary or ordered, it will essentially replace linear regression with probit regression. That is the same thing Mplus does by default, although Mplus also has the option to use a logit link if you want to use a much slower numerical algorithm for obtaining ML estimates.

In words of Terrence:

It will be probit by default (logit unavailable) if you declare the outcome as categorical using the ordered= argument

Exactly.

1) I have to say that is ordered, not dummy cause

lavaancannot work still with dummy

Yes it can, the same way that Mplus treats it. I think you did not understand when I told you that **binary data are equivalently described as nominal or ordinal**. Take sex for example (although not truly binary). If you code it such that 1 = male, then it is ordered in the sense that 1 indicates more "male-sexed-ness" than 0. If you code it such that 1 = female, then it is ordered in the sense that 1 indicates more "female-sexed-ness" than 0. Because you do not use additional (nominal) categories, there is no practical distinction between treating these 2 categories as ordered or unordered.

2) I cannot use estimator MLR

That is only a big advantage of using a logit link if you want to use FIML to handle missing data. In lavaan, you can only set missing="pairwise" with the probit link, which assumes MCAR data rather than MAR. Or you can use multiple imputation, which (like FIML) assumes MAR.

But whether you have missing or complete data, using DWLS estimation with a probit link also comes with robust SEs (but unlike the logit link with MLR, the probit link also provides a robust test statistic and fit indices). Logit and probit links provide notoriously similar results, so there is no substantive reason to prefer a logit except that it might be more familiar to you.

Terrence D. Jorgensen

Assistant Professor, Methods and Statistics

Research Institute for Child Development and Education, the University of Amsterdam

Apr 5, 2019, 12:35:18 PM4/5/19

to lav...@googlegroups.com

Dear Dr. Jorgensen,

**My main question** is why, when I include one binary variable, I dont get in the analyses stage the values I defined in data generation . As in the case 2 & 3. While in case 1, with only continuous variables, I see no discrepancy. I try to explain myself:

thanks a lot for your reply and sorry. It seems that I copy the draft and not the last version I wanted to post. That's why no questions and the references to your words. My apologies.

Regarding binary variables and the discreepancy between the generated data and the subsequent analyses for lavaan, while concordance in MPlus:

1. I created a dataset with one dummy variable, is it well defined?

N <- 10000

x <- rnorm(N)

y1dummy <- inv.logit(0.55* x) > runif(N)

y2 <- 0.24*x
+ 0.47*y1dummy + rnorm(N)

2. I create a model where the dummy variable is the mediator

LogitMedMod <- 'y1dummy~a*x

y2~b*x+ c*y1dummy

'

fitLogitMedMod <- sem(LogitMedMod, data = LogitMedData, ordered="y1dummy")

3. I run the model in lavaan and the path coefficients are not the ones defined in step 1.

summary(fitLogitMedMod)

Regressions: Estimate Std.Err z-value P(>|z|)

y1dummy ~

x (a) 0.355 0.013 26.719 0.000

y2 ~

x (b) 0.215 0.011 19.284 0.000

y1dummy (c) 0.275 0.013 21.205 0.0004. However, when I run the same model with the same data in MPlus, the path coefficients are the ones defined in step 1

In **MPlus**

DATA: FILE IS LogitMedData.csv ;

VARIABLE: NAMES sort x y1dummy y2;

USEVARIABLES x y1dummy y2;

CATEGORICAL ARE y1dummy;

ANALYSIS: TYPE = random;

ALGORITHM = INTEGRATION;

ESTIMATOR = MLR;

MODEL: y1dummy on x;

y2 on y1dummy;

y2 on x;

Results

Estimate S.E. Est./S.E. P-Value

Y1DUMMY ON

X 0.576 0.022 25.973 0.000

Y2 ON

X 0.253 0.010 24.416 0.000

Y1DUMMY 0.446
0.021
21.234 0.000

--

You received this message because you are subscribed to the Google Groups "lavaan" group.

To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.

To post to this group, send email to lav...@googlegroups.com.

Visit this group at https://groups.google.com/group/lavaan.

For more options, visit https://groups.google.com/d/optout.

Apr 5, 2019, 4:47:13 PM4/5/19

to lavaan

Mario

The difference is the link function.

lavaan uses the probit link function, while Mplus is using the logit link function.

If you want to transform it you can multiple the probit slope for 1.6 to get a close estimate of the equivalent logit version https://statmodeling.stat.columbia.edu/2006/06/06/take_logit_coef/?fbclid=IwAR1B4npbUC4Ng7w4IU6CRBG02QH0YLh7kj_wZD9DjtIQggGrOoaHxxqY-d8

You can save the slopes in logit metric

a_logit := a*1.6

c_logit := c*1.6

Also, a clarification of what Mplus is doing. Mplus is NOT using MLR, it is using Marginal Maximum Likelihood (MML), this is part of the confusion caused by their shortcuts. When ordered variables are included, and MLR Mplus will use MML which is the ML estimator when using categorical variables, for example is the most common estimator with IRT

To unsubscribe from this group and stop receiving emails from it, send an email to lav...@googlegroups.com.

Apr 7, 2019, 7:51:33 AM4/7/19

to lav...@googlegroups.com

Thanks so much Mauricio. This is exactly what I needed to know. Now things are clear for me

To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.

Apr 7, 2019, 5:39:01 PM4/7/19

to lavaan

Mario

Actually, as Gelman stated in his post, the factor for the relation between logit and probit models is not fixed, but it varies. I did a further test, and the "ideal" ration between them vary in function of the absolute value of the slope. Where the ratio ranges from 1.6 to 1.87 (as you can see in the plot). Given my results I would suggest to use 1.7 as the factor to relate logit and probit regression slopes. As using this the average difference between the logistic slope and the transformed probit would be 0.02, while using the 1.6 the average difference would be 0.05.

I got these results by looking at the ratio beteween the slopes over 50000 replications treating the logistic slope as random from a Normal(mean=0.55, sd=2)

My suggestion:

a_logit := a*1.7

c_logit := c*1.7

Apr 8, 2019, 7:26:54 AM4/8/19

to lavaan

My suggestion:a_logit := a*1.7c_logit := c*1.7

Or you could simply simulate binary data from the probit model:

`y1dummy <- pnorm(0.55* x) > runif(N)`

Reply all

Reply to author

Forward

0 new messages