WLSMV vs. MLR in a model with 4 exogenous LVs, 4 endogenous LVs and a binary observed outcome

Tobias Ludwig

unread,

May 10, 2016, 11:15:22 AM5/10/16

to lavaan

Hi lavaan-group,

I´m still working on a SEM consisting of

- four latent exogeneous variables (1 single-item indicator (see Yves response here or Kline, 2016), two LVs are at least measured by 3 indicators (5-point Likert-style items), one LV is measured by four metric indicators (derived from item-parcelling – I know, it´s “bad style”, but this is what is commonly accepted for this construct),
- four endogenous latent variables (each LV measured by five 5-point Likert items)
- as well as two binary observed outcomes (coded 0/1).

In a single-group analysis the model is already quite huge, df = 600. I do have a sample of 835 participants, consisting of two different groups. Having a look at histograms I see that the data is not always normal (as far as you can tell having ordinal indicators).

The goal is

a) to investigate the relationship between the exogenous variables and the endogenous variables

b) comparing the means of the endogenous LVs across both groups while controlling for the exogenous LVs

c) analyzing the impact of the endogenous LVs on the dichotomous outcome

Most of the analysis was carried out using the MLR estimator as I do have 5-point Likert items (Beauducel & Herzberg, 2006). In doing so I was not able to analyze c) within the SEM framework provided by lavaan as it does not support binary outcomes using the MLR estimator (which is different from Mplus. – I know that Yves is planning this for a future release of lavaan – afaik). So far I calculated factor scores to analyze c) by means of logistic regression within the glm() framework. But this seems to be a messy solution. For example, I don´t have many missings per item (usually below 1.3 % per indicator MAR) but predict() is not able to calculate factor scores for a case if there is a missing withing the manifest indicators in that case so I got a lot of NAs.

Using WLSMV brought me a lot of errors first, models did not converge and I got errors that said I had not enough cases for computing the gamma matrix (it might be that I forgot to code all my ordinal indicators as factor and, respectively didn´t use the order argument). Alternative estimators like the MML, which is able to handle dichotomous outcomes (as suggested by Yves here) took forever.

So because of the messy solution to analyze c) I tried the WLSMV estimator again and somehow it seems to work now. BUT there are a couple of questions arousing regarding the use of the WLSMV estimator I would be really happy when someone with more experience can help me with:

Question 1: Is there any other option to analyze c) within the SEM framework I might have overlooked?

Question 2: Using the WLSMV estimator I do get a list of 50 warnings which might related to the zero.cell.warn argument:

…

50: In pc_cor_TS(fit.y1 = FIT[[i]], fit.y2 = FIT[[j]], method = optim.method, ... :

lavaan WARNING: empty cell(s) in bivariate table of wdn.handlung.3 x i8.09

…

Can I ignore these warnings?

Question 3: I am using WLSMV with missing=pairwise. I have a really low amount of missings in my data set (0.2% - 1.39%). Is it still worth to multiple impute my data? Besides the fact that it takes a pretty long time to calculate, when I do run cfa.mi() I get a strange chi-squre p-value which is p = 0.66 with chi^2 = 2271 and df = 717. That does not make any sense to me. This doesn´t happen when I use missing=pairwise.I think that I want to keep it simple and therefore just use missing=pairwise. What would you suggest?

Question 4: The model fit produced by the MLR estimator and the WLSMV estimator differs. For example, I got a CFI of 0.93 with the MLR estimator and a CFI of .92 with the WLSMV estimator. Sometimes it is the other way around and sometimes the difference is bigger. I think this is neglectable, right?

Question 5: The standardized regression coefficients (and correlations) vary. For example, using MLR as estimations method, a standardized regression coefficient in my model is .26, using WLSMV it is .49. This is a big difference. Is there any particular reason for it? Kline (2016, p. 258) refers to a study where robust WLS estimation failed to perform well when distributions on categorical indicators are markedly skewed (But is robust WLS the same as WLSMV?). I know that the indicators for this LV are skewed a little bit. On the other hand he says: “In general, ML estimates and their standard errors may both be too low when the data analyzed are from categorical indicators, and the degree of this negative bias is higher as distributions of items responses become increasingly non-normal (p. 257). In addition Brown (2006) states that “treating categorical variables in CFA are multifold, including that it can (1) produce attenuated estimates of the relationships (correlations) among indicators, especially when there are floor or ceiling effects. […] d) produce incorrect standard errors. Is there any new research on the efficacy of WLSMV vs. MLR by non-normal, ordinal indicators? What would you as

Question 6: What is best practice for the parametrization argument? Until now, I left it to default, although ?cfa doesn´t state what´s the default (theta or delta). As far as I know, delta scaling fixes the total variance of the LV to 1.0, which is not what I want as I want to compare different groups and I can not assume that the variance of the LV between both groups is equal.

Question 7: Using MLR I yielded two significant differences in the endogenous latent variables across the groups. Now, using WLSMV all four differences on latent means are significant with meaningful effect size. Does WLSMV have so much more power? How can I justify these “new” effects ?

Question 8: Are dichotomous outcomes modeled by probit link functions using WLSMV ? I know that there is a way to rescale the results from probit regression to Odds Ratios like in Logit regression for much easier interpretation. Is that meaningful here?

Question 9: Is my sample size big enough?

I know that this is a pretty long question. Thank you all in advance.

Here is my model:

int=~ i1.03+i1.09+i1.16+i1.18+i1.21

exp=~i6.05+i6.08+i6.11+i6.15+i6.16

mu=~i7b.22+i7b.08+i7b.16+i7b.17+i7b.04

evi=~i8.05+i8.09+i8.11+i8.14+i8.18

i1.18~~i1.21

i1.03~~i1.16

i6.08~~i6.15

i7b.22~~i7b.04

# single item indicator

fw=~fw.rasch

fw.rasch~~0.2781611*fw.rasch

nfc=~nfc1+nfc2+nfc3+nfc5+nfc13+nfc14

si=~si.emo.mean+si.epi.mean+si.auf.mean+si.wert.mean

wdn=~ wdn.handlung.1+wdn.handlung.2+wdn.handlung.3+wdn.handlung.4+wdn.pers.1+wdn.pers.2+wdn.pers.3+wdn.pers.4

wdn.pers.2 ~~ wdn.pers.4

#Regressions

int+exp+mu+evi~fw+wdn+si

int+evi~fw

exp+mu~wdn

#indirect effect

fw+si+wdn~nfc

Hyp.Post.richtig~evi+mu #Hyp.Post.richtig is the observed binary outcome

fw~~si+wdn

si~~wdn

int~~evi+exp+mu

Terrence Jorgensen

unread,

May 11, 2016, 5:50:38 AM5/11/16

to lavaan

Question 1: Is there any other option to analyze c) within the SEM framework I might have overlooked?

Not that I am aware of, but try posting your questions on SEMNET. It is a much more general forum with a much wider audience of SEM experts, and for the most part, your questions are not specific to lavaan software.

Question 2: Using the WLSMV estimator I do get a list of 50 warnings which might related to the zero.cell.warn argument:...Can I ignore these warnings?

You were using robust ML because almost all your variables have 5+ categories, except for the binary outcome. When you switch to WLSMV, do you then start treating those 5-category Likert items as ordered as well as the binary outcome? That probably explains why you observe the parameter differences in Questions 5 & 7. I would be inclined to try fitting the model with DWLS and only treating the binary outcome as ordered, but I'm not sure how robust the "robust" column is when treating 5-category items as continuous. That would be something to ask on SEMNET as well.

Question 3: I am using WLSMV with missing=pairwise. I have a really low amount of missings in my data set (0.2% - 1.39%)... What would you suggest?

It's not preferable, but given the small amount of missingness, I wouldn't expect results to differ greatly between pairwise deletion and multiple imputation (or robust FIML).

Question 4: The model fit produced by the MLR estimator and the WLSMV estimator differs. ... I think this is neglectable, right?

They are different estimators, and you are also fitting the model to different data (treating indicators as continuous v. ordered yields different df and sample moments), so you can't expect fit to be the same. Technically, the MLR model is fundamentally misspecified, so it doesn't make sense to interpret those fit measures anyway.

Question 6: What is best practice for the parametrization argument? Until now, I left it to default, although ?cfa doesn´t state what´s the default (theta or delta). As far as I know, delta scaling fixes the total variance of the LV to 1.0, which is not what I want as I want to compare different groups and I can not assume that the variance of the LV between both groups is equal.

The default is delta, and that has nothing to do with setting the LV (factor) variance to 1.0 (that is the "std.lv" argument). The delta parameterization fixes residual variances of each latent item response to 1.0 (in the first group), and theta parameterization fixes the total variances of each latent item response to 1.0 (in the first group). But in either parameterization, constraining loadings and thresholds across groups allows the latent item response (residual) variances to be estimated in all other groups. Millsap & Yu-Tein (2004) recommended using theta for testing measurement invariance, which you need to do before running your regression model, so you should probably use theta (see examples here).

Question 7: Using MLR I yielded two significant differences in the endogenous latent variables across the groups. Now, using WLSMV all four differences on latent means are significant with meaningful effect size. Does WLSMV have so much more power? How can I justify these “new” effects ?

As you quoted in Question 5, effects are attenuated when treating categorical data as continuous. Treating them appropriately should give you more power. Compare a Pearson correlation between 2 binary/ordered items to a polychoric correlation between the same items, and the polychoric will always be at least as large because of the additional information in its assumption about the underlying normal continuum.

Question 8: Are dichotomous outcomes modeled by probit link functions using WLSMV ? I know that there is a way to rescale the results from probit regression to Odds Ratios like in Logit regression for much easier interpretation. Is that meaningful here?

Yes (probit by default), and if you have formulas for transforming probit slopes to logit slopes, then those should apply here too.

Question 9: Is my sample size big enough?

Depends how big your group sizes are. Robust DWLS needs bigger N than robust ML for estimates to stabilize.

#Regressions

int+exp+mu+evi~fw+wdn+si

Both lines below are redundant with the line above, with already regresses the outcomes "int" and "evi" on the predictor "fw", as well as the outcomes "exp" and "mu" on the predictor "wdn". Every variable on the lefthand side of the "~" operator is regressed on every variable on the righthand side.

int+evi~fw

exp+mu~wdn

#indirect effect

fw+si+wdn~nfc

Hyp.Post.richtig~evi+mu #Hyp.Post.richtig is the observed binary outcome

These regressions don't match your description of the model. You said there are 4 exogenous LVs, but the only exogenous LV in your syntax (i.e., without any predictors) is "nfc".

Terrence D. Jorgensen

Postdoctoral Researcher, Methods and Statistics

Research Institute for Child Development and Education, the University of Amsterdam

UvA web page: http://www.uva.nl/profile/t.d.jorgensen

Tobias Ludwig

unread,

May 13, 2016, 9:09:06 AM5/13/16

to lavaan

Thank you for your answer, Terrence!

Am Mittwoch, 11. Mai 2016 11:50:38 UTC+2 schrieb Terrence Jorgensen:

Question 2: Using the WLSMV estimator I do get a list of 50 warnings which might related to the zero.cell.warn argument:...Can I ignore these warnings?

You were using robust ML because almost all your variables have 5+ categories, except for the binary outcome. When you switch to WLSMV, do you then start treating those 5-category Likert items as ordered as well as the binary outcome?

Yes, exactly.

That probably explains why you observe the parameter differences in Questions 5 & 7. I would be inclined to try fitting the model with DWLS and only treating the binary outcome as ordered, but I'm not sure how robust the "robust" column is when treating 5-category items as continuous. That would be something to ask on SEMNET as well.

Question 3: I am using WLSMV with missing=pairwise. I have a really low amount of missings in my data set (0.2% - 1.39%)... What would you suggest?

It's not preferable, but given the small amount of missingness, I wouldn't expect results to differ greatly between pairwise deletion and multiple imputation (or robust FIML).

Yes, that´s what I think too.

Question 4: The model fit produced by the MLR estimator and the WLSMV estimator differs. ... I think this is neglectable, right?

They are different estimators, and you are also fitting the model to different data (treating indicators as continuous v. ordered yields different df and sample moments), so you can't expect fit to be the same. Technically, the MLR model is fundamentally misspecified, so it doesn't make sense to interpret those fit measures anyway.

Question 6: What is best practice for the parametrization argument? Until now, I left it to default, although ?cfa doesn´t state what´s the default (theta or delta). As far as I know, delta scaling fixes the total variance of the LV to 1.0, which is not what I want as I want to compare different groups and I can not assume that the variance of the LV between both groups is equal.

The default is delta, and that has nothing to do with setting the LV (factor) variance to 1.0 (that is the "std.lv" argument).

Yeah, here I was wrong. Of course I did not mean the variance of the latent variable, but the latent response variable.

The delta parameterization fixes residual variances of each latent item response to 1.0 (in the first group), and theta parameterization fixes the total variances of each latent item response to 1.0 (in the first group).

I read Kline´s Chapter on modeling with ordinal indicators and found a confusing statement "In delta scaling (parameterization), the total variance of the latent response variables is fixed to 1.0." (p.326) What is correct now? Anyway, I will pick theta for the analysis of measurement invariance. As far as I know now the only impact is on the unstandardized solution, right?

Question 8: Are dichotomous outcomes modeled by probit link functions using WLSMV ? I know that there is a way to rescale the results from probit regression to Odds Ratios like in Logit regression for much easier interpretation. Is that meaningful here?

Yes (probit by default), and if you have formulas for transforming probit slopes to logit slopes, then those should apply here too.

I thought this would be easy, but couldn´t even find a relation it in the multiple regression bible by Cohen, Cohen, West & Aiken. There are discussions on CrossValidated where people multiply probits by 1.7 or 1.8 to get a logit which seems weird to me as in the non-linear regressions don´t have constant marginal effects.

Question 9: Is my sample size big enough?

Depends how big your group sizes are. Robust DWLS needs bigger N than robust ML for estimates to stabilize.

It is about 410 in each group. Until now, everything is converging fine.

#Regressions

int+exp+mu+evi~fw+wdn+si

Both lines below are redundant with the line above, with already regresses the outcomes "int" and "evi" on the predictor "fw", as well as the outcomes "exp" and "mu" on the predictor "wdn". Every variable on the lefthand side of the "~" operator is regressed on every variable on the righthand side.

int+evi~fw

exp+mu~wdn

Again, this was my mistake. There are some "#" missing in the codes.

#indirect effect

fw+si+wdn~nfc

Hyp.Post.richtig~evi+mu #Hyp.Post.richtig is the observed binary outcome

These regressions don't match your description of the model. You said there are 4 exogenous LVs, but the only exogenous LV in your syntax (i.e., without any predictors) is "nfc".

Again, my mistake. The original theory said there are 4 exogenous LV, but nfc is so highly correlated to the other 3 exogenous LV that I can reasonably modify the model to use nfs to explain the other three, which makes a lot of sense.

Two more things regarding modeling with ordinal indicators:

1. In the Handbook of Structural Equation Modeling by Hoyle on page 207 is written, that adding an ordinal outcome (as a regression) is equivalent to include another indicator in the measurement model of the LV on which the outcome is regressed on, and therefore, the meaning of the LV changes. This is not the fact in classical (multiple) regression analysis, right? Given a simple hypothetical model where a factor "health" is measured by indicators like "age", "smoking" and "weight" which are all proven to have an impact on health. When the binary outcome "death" is regressed on illness, which makes sense in a particular way, technically "death" is treated as an indicator for health. How does that make conceptually sense?

2. Given the output of my model above for the binary outcome:

Hyp.Post.richtig ~ est std.err z p std

evi 0.214 0.081 2.655 0.008 0.179 0.179

mu -0.592 0.162 -3.658 0.000 -0.309 -0.309

According to this page, the predicted probability for Hyp.Post.richtig being 1 is

P=F(evi*0.214+mu*-0.592)), where F is the standard normal cumulative distribution function. But on which metric is evi and mu considered? I think it is not on the item-metric where I can then put in scores from 1-5, because evi and mu are latent variables.

I think, that this is the underlying latent response metric which is different from the original metric and not interpretable? Where is the intercept? Is it estimated to be zero? But that would mean, that if mu and evi are both zero, one still had a probability of getting Hyp.Post.richtig=1 of pnorm(0) = 50%, right? Where is my misunderstanding?

Terrence, you pointed out here, that one could generate predicted values of the outcome for certain levels of the predictors, but the predictors are latent variables here, how can I get representative values?

Logits were so much easier ;-)

Thank you!

Tobias Ludwig

unread,

May 13, 2016, 12:11:05 PM5/13/16

to lavaan

And one more question: Is there any way to compare non-nested models using WLSMV? As far as I know, it is not possible to calculate Information criterions like AIC and BIC, right?

Terrence Jorgensen

unread,

May 16, 2016, 4:50:23 PM5/16/16

to lavaan

I read Kline´s Chapter on modeling with ordinal indicators and found a confusing statement "In delta scaling (parameterization), the total variance of the latent response variables is fixed to 1.0." (p.326) What is correct now? Anyway, I will pick theta for the analysis of measurement invariance. As far as I know now the only impact is on the unstandardized solution, right?

You can check for yourself when you fit your model using the different parameterizations. The residual variances are under the heading "variances"and the SDs of the latent item responses are under the heading "Scales y* ".

Terrence Jorgensen

unread,

May 16, 2016, 4:59:28 PM5/16/16

to lavaan

And one more question: Is there any way to compare non-nested models using WLSMV? As far as I know, it is not possible to calculate Information criterions like AIC and BIC, right?

Correct, information criteria are calculated from likelihoods, so they aren't available using least-squares estimators. The only test statistics for comparing non-nested models that I am aware of are also calculated from (casewise) likelihoods. This isn't a lavaan-specific question, so you can post on SEMNET, but I think your only option might be incremental fit indices (e.g., CFI). If you can specify a single theoretically justifiable null model that is nested within all competing models (regardless of whether they are nested), then you can fit that model to your data and pass it to the fitMeasures() function (look at the "baseline.model" argument). The resulting CFIs will all be somewhere on the same continuum between a null model and a saturated model (in which all models are nested, by definition).

http://dx.doi.org/10.1037/1082-989X.8.1.16

Reply all

Reply to author

Forward