Hi lavaan-group,
I´m still working on a SEM consisting of
In a single-group analysis the model is already quite huge, df = 600. I do have a sample of 835 participants, consisting of two different groups. Having a look at histograms I see that the data is not always normal (as far as you can tell having ordinal indicators).
The goal is
a) to investigate the relationship between the exogenous variables and the endogenous variables
b) comparing the means of the endogenous LVs across both groups while controlling for the exogenous LVs
c) analyzing the impact of the endogenous LVs on the dichotomous outcome
Most of the analysis was carried out using the MLR estimator as I do have 5-point Likert items (Beauducel & Herzberg, 2006). In doing so I was not able to analyze c) within the SEM framework provided by lavaan as it does not support binary outcomes using the MLR estimator (which is different from Mplus. – I know that Yves is planning this for a future release of lavaan – afaik). So far I calculated factor scores to analyze c) by means of logistic regression within the glm() framework. But this seems to be a messy solution. For example, I don´t have many missings per item (usually below 1.3 % per indicator MAR) but predict() is not able to calculate factor scores for a case if there is a missing withing the manifest indicators in that case so I got a lot of NAs.
Using WLSMV brought me a lot of errors first, models did not converge and I got errors that said I had not enough cases for computing the gamma matrix (it might be that I forgot to code all my ordinal indicators as factor and, respectively didn´t use the order argument). Alternative estimators like the MML, which is able to handle dichotomous outcomes (as suggested by Yves here) took forever.
So because of the messy solution to analyze c) I tried the WLSMV estimator again and somehow it seems to work now. BUT there are a couple of questions arousing regarding the use of the WLSMV estimator I would be really happy when someone with more experience can help me with:
Question 1: Is there any other option to analyze c) within the SEM framework I might have overlooked?
Question 2: Using the WLSMV estimator I do get a list of 50 warnings which might related to the zero.cell.warn argument:
…
50: In pc_cor_TS(fit.y1 = FIT[[i]], fit.y2 = FIT[[j]], method = optim.method, ... :
lavaan WARNING: empty cell(s) in bivariate table of wdn.handlung.3 x i8.09
…
Can I ignore these warnings?
Question 3: I am using WLSMV with missing=pairwise. I have a really low amount of missings in my data set (0.2% - 1.39%). Is it still worth to multiple impute my data? Besides the fact that it takes a pretty long time to calculate, when I do run cfa.mi() I get a strange chi-squre p-value which is p = 0.66 with chi^2 = 2271 and df = 717. That does not make any sense to me. This doesn´t happen when I use missing=pairwise.I think that I want to keep it simple and therefore just use missing=pairwise. What would you suggest?
Question 4: The model fit produced by the MLR estimator and the WLSMV estimator differs. For example, I got a CFI of 0.93 with the MLR estimator and a CFI of .92 with the WLSMV estimator. Sometimes it is the other way around and sometimes the difference is bigger. I think this is neglectable, right?
Question 5: The standardized regression coefficients (and correlations) vary. For example, using MLR as estimations method, a standardized regression coefficient in my model is .26, using WLSMV it is .49. This is a big difference. Is there any particular reason for it? Kline (2016, p. 258) refers to a study where robust WLS estimation failed to perform well when distributions on categorical indicators are markedly skewed (But is robust WLS the same as WLSMV?). I know that the indicators for this LV are skewed a little bit. On the other hand he says: “In general, ML estimates and their standard errors may both be too low when the data analyzed are from categorical indicators, and the degree of this negative bias is higher as distributions of items responses become increasingly non-normal (p. 257). In addition Brown (2006) states that “treating categorical variables in CFA are multifold, including that it can (1) produce attenuated estimates of the relationships (correlations) among indicators, especially when there are floor or ceiling effects. […] d) produce incorrect standard errors. Is there any new research on the efficacy of WLSMV vs. MLR by non-normal, ordinal indicators? What would you as
Question 6: What is best practice for the parametrization argument? Until now, I left it to default, although ?cfa doesn´t state what´s the default (theta or delta). As far as I know, delta scaling fixes the total variance of the LV to 1.0, which is not what I want as I want to compare different groups and I can not assume that the variance of the LV between both groups is equal.
Question 7: Using MLR I yielded two significant differences in the endogenous latent variables across the groups. Now, using WLSMV all four differences on latent means are significant with meaningful effect size. Does WLSMV have so much more power? How can I justify these “new” effects ?
Question 8: Are dichotomous outcomes modeled by probit link functions using WLSMV ? I know that there is a way to rescale the results from probit regression to Odds Ratios like in Logit regression for much easier interpretation. Is that meaningful here?
Question 9: Is my sample size big enough?
I know that this is a pretty long question. Thank you all in advance.
Here is my model:
int=~ i1.03+i1.09+i1.16+i1.18+i1.21
exp=~i6.05+i6.08+i6.11+i6.15+i6.16
mu=~i7b.22+i7b.08+i7b.16+i7b.17+i7b.04
evi=~i8.05+i8.09+i8.11+i8.14+i8.18
i1.18~~i1.21
i1.03~~i1.16
i6.08~~i6.15
i7b.22~~i7b.04
# single item indicator
fw=~fw.rasch
fw.rasch~~0.2781611*fw.rasch
nfc=~nfc1+nfc2+nfc3+nfc5+nfc13+nfc14
si=~si.emo.mean+si.epi.mean+si.auf.mean+si.wert.mean
wdn=~ wdn.handlung.1+wdn.handlung.2+wdn.handlung.3+wdn.handlung.4+wdn.pers.1+wdn.pers.2+wdn.pers.3+wdn.pers.4
wdn.pers.2 ~~ wdn.pers.4
#Regressions
int+exp+mu+evi~fw+wdn+si
int+evi~fw
exp+mu~wdn
#indirect effect
fw+si+wdn~nfc
Hyp.Post.richtig~evi+mu #Hyp.Post.richtig is the observed binary outcome
fw~~si+wdn
si~~wdn
int~~evi+exp+mu
Question 1: Is there any other option to analyze c) within the SEM framework I might have overlooked?
Question 2: Using the WLSMV estimator I do get a list of 50 warnings which might related to the zero.cell.warn argument:...Can I ignore these warnings?
Question 3: I am using WLSMV with missing=pairwise. I have a really low amount of missings in my data set (0.2% - 1.39%)... What would you suggest?
Question 4: The model fit produced by the MLR estimator and the WLSMV estimator differs. ... I think this is neglectable, right?
Question 6: What is best practice for the parametrization argument? Until now, I left it to default, although ?cfa doesn´t state what´s the default (theta or delta). As far as I know, delta scaling fixes the total variance of the LV to 1.0, which is not what I want as I want to compare different groups and I can not assume that the variance of the LV between both groups is equal.
Question 7: Using MLR I yielded two significant differences in the endogenous latent variables across the groups. Now, using WLSMV all four differences on latent means are significant with meaningful effect size. Does WLSMV have so much more power? How can I justify these “new” effects ?
Question 8: Are dichotomous outcomes modeled by probit link functions using WLSMV ? I know that there is a way to rescale the results from probit regression to Odds Ratios like in Logit regression for much easier interpretation. Is that meaningful here?
Question 9: Is my sample size big enough?
#Regressions
int+exp+mu+evi~fw+wdn+si
int+evi~fw
exp+mu~wdn
#indirect effect
fw+si+wdn~nfc
Hyp.Post.richtig~evi+mu #Hyp.Post.richtig is the observed binary outcome
Question 2: Using the WLSMV estimator I do get a list of 50 warnings which might related to the zero.cell.warn argument:...Can I ignore these warnings?
You were using robust ML because almost all your variables have 5+ categories, except for the binary outcome. When you switch to WLSMV, do you then start treating those 5-category Likert items as ordered as well as the binary outcome?
That probably explains why you observe the parameter differences in Questions 5 & 7. I would be inclined to try fitting the model with DWLS and only treating the binary outcome as ordered, but I'm not sure how robust the "robust" column is when treating 5-category items as continuous. That would be something to ask on SEMNET as well.Question 3: I am using WLSMV with missing=pairwise. I have a really low amount of missings in my data set (0.2% - 1.39%)... What would you suggest?
It's not preferable, but given the small amount of missingness, I wouldn't expect results to differ greatly between pairwise deletion and multiple imputation (or robust FIML).
Question 4: The model fit produced by the MLR estimator and the WLSMV estimator differs. ... I think this is neglectable, right?
They are different estimators, and you are also fitting the model to different data (treating indicators as continuous v. ordered yields different df and sample moments), so you can't expect fit to be the same. Technically, the MLR model is fundamentally misspecified, so it doesn't make sense to interpret those fit measures anyway.Question 6: What is best practice for the parametrization argument? Until now, I left it to default, although ?cfa doesn´t state what´s the default (theta or delta). As far as I know, delta scaling fixes the total variance of the LV to 1.0, which is not what I want as I want to compare different groups and I can not assume that the variance of the LV between both groups is equal.
The default is delta, and that has nothing to do with setting the LV (factor) variance to 1.0 (that is the "std.lv" argument).
The delta parameterization fixes residual variances of each latent item response to 1.0 (in the first group), and theta parameterization fixes the total variances of each latent item response to 1.0 (in the first group).
Question 8: Are dichotomous outcomes modeled by probit link functions using WLSMV ? I know that there is a way to rescale the results from probit regression to Odds Ratios like in Logit regression for much easier interpretation. Is that meaningful here?
Yes (probit by default), and if you have formulas for transforming probit slopes to logit slopes, then those should apply here too.
Question 9: Is my sample size big enough?
Depends how big your group sizes are. Robust DWLS needs bigger N than robust ML for estimates to stabilize.
#Regressions
int+exp+mu+evi~fw+wdn+si
Both lines below are redundant with the line above, with already regresses the outcomes "int" and "evi" on the predictor "fw", as well as the outcomes "exp" and "mu" on the predictor "wdn". Every variable on the lefthand side of the "~" operator is regressed on every variable on the righthand side.int+evi~fw
exp+mu~wdn
#indirect effect
fw+si+wdn~nfc
Hyp.Post.richtig~evi+mu #Hyp.Post.richtig is the observed binary outcome
These regressions don't match your description of the model. You said there are 4 exogenous LVs, but the only exogenous LV in your syntax (i.e., without any predictors) is "nfc".
I read Kline´s Chapter on modeling with ordinal indicators and found a confusing statement "In delta scaling (parameterization), the total variance of the latent response variables is fixed to 1.0." (p.326) What is correct now? Anyway, I will pick theta for the analysis of measurement invariance. As far as I know now the only impact is on the unstandardized solution, right?
And one more question: Is there any way to compare non-nested models using WLSMV? As far as I know, it is not possible to calculate Information criterions like AIC and BIC, right?