Dear lavaan group
I would like to ask three rather general and technical questions, as I am at the very beginning of a project where I would like to use SEM.
1.
The situation is the following: I would like to build a structural equation model out of several latent variables (the corresponding manifest variables are all categorical) and also one single manifest variable standing for its own (measured on an interval scale).
In a first step I would like to do a CFA to see if my measurement model fits the data. See the picture in the attachment to get an idea of the CFA. But the problem is, I don’t know how to deal with the single manifest variable, called M17 in the attachement. Since I would like this variable to be part of my SEM model, I guess I have to include it also in the CFA, but how to do this in lavaan?
This is my guess how to do the CFA:
cfa <- "F1 =~ M1 + M2 + M3 + M4
F2 =~ M5 + M6 + M7 + M8
F3 =~ M9 + M10 + M11 + M12
F4 =~ M13 + M14 + M15 + M16
F5 =~ F1 + F2 + F3 + F4
M17 =~ M17"
cfa_fit <- cfa(cfa, estimator = "WLSMV", data = data)
Is this okay or is there a recommended way how to technically include this one manifest variable around the latent variables in a CFA?
2.
I would like to control my SEM for some variables. Do I need to integrate them also in the CFA?
3.
I did run a first CFA with the code above and there was a warning message appearing:
lavaan WARNING: 199 bivariate tables have empty cells; to see them, use:
lavInspect(fit, "zero.cell.tables")
By collapsing response categories of several categorical manifest variables I was able to reduce the number of bivariate tables with empty cells drastically but not completely. Do you think the appearing of this warning message could also be mentioned as a limitation of the model (for example in a paper)? What other options do I have at this stage?
Isabel
I don’t know how to deal with the single manifest variable
M17 =~ M17"
cfa_fit <- cfa(cfa, estimator = "WLSMV", data = data)
Is this okay or is there a recommended way how to technically include this one manifest variable around the latent variables in a CFA?
I would like to control my SEM for some variables. Do I need to integrate them also in the CFA?
I did run a first CFA with the code above and there was a warning message appearing:
lavaan WARNING: 199 bivariate tables have empty cells; to see them, use:
lavInspect(fit, "zero.cell.tables")
By collapsing response categories of several categorical manifest variables
Hi Terrence,
Thank you very much for your response!
There is still two things that bother me:
1. When I am running the above code (still without including the control variables). I get the following warning message (next to the one with the empty bivariate tables) that seems to disappear when I am deleting the line with the M17 =~ M17.
lavaan WARNING: covariance matrix of latent variables
is not positive definite;
use inspect(fit,"cov.lv") to investigate.
What does this mean and is it a big problem, how can I avoid it?
2. When I try to include 2 control variables like age (numeric) and gender (ordinal) to the above code in the same way as I included the M17 variable above, I get the following additional warning message:
lavaan WARNING: could not compute scaled test statisticlavaan WARNING: could not compute standard errors!lavaan NOTE: this may be a symptom that the model is not identified.
So as it is suggesting in the message, the
model seems to be not identified. Is there another way to bring in the control
variables or is my only option now to exclude (one of) them?
Best, Jsabel
lavaan WARNING: covariance matrix of latent variables
is not positive definite;
use inspect(fit,"cov.lv") to investigate.
What does this mean and is it a big problem, how can I avoid it?
2. When I try to include 2 control variables like age (numeric) and gender (ordinal)
Dear Terrence,
thanks again for the support!
I have now adapted my CFA with your recommendation, and it seems to work (also the warning messages from above disappeared).
Let me ask some more questions to check if I have adapted my code correctly:
Gender is indeed a dummy variable, I have now classified it as “numeric” instead of “ordered”.
Age I have not categorized now and just classified as “numeric”. If I would categorize it, I would do the k-1 dummy variables as you described and also classify them as “numeric”, correct?
What if I would like to include a exogenous ordinal control variable, would I simply treat it as “numeric” instead of “ordered”?
I have attached a picture which shows my idea of how to impute the control variables. Is it correct to control only the latent variables F4, F5 and F6 for the predictors, since the first order latent variables F1, F2 and F3 are “absorbed” by F4 (or should I also control them for the predictors)?
I tried to introduce a variance term for the control variables, the following warning message appears:
Error in tmp[cbind(REP$row[idx], REP$col[idx])] <- lavpartable$free[idx] :
NAs are not allowed in subscripted assignments
In addition: Warning messages:
1: In lavaan::lavaan(model = cfa3, data = data_imputed, estimator = "WLSMV", :
lavaan WARNING: syntax contains parameters involving exogenous covariates; switching to fixed.x = FALSE
Is it in general not possible to include a variance between control variables and if yes, why?
Why is there no intercept/residual term for the exogenous predictors when I check summary(CFA, standardized = TRUE, fit.measures = TRUE)?
In my understanding the includion of these predictors is regression, which could have an intercept..
If I would like to introduce moderator variables in my SEM, do I have to include them also in my CFA and if yes, how?
Thanks a lot again for the support!
Best wishes, JsabelAge I have not categorized now and just classified as “numeric”. If I would categorize it, I would do the k-1 dummy variables as you described and also classify them as “numeric”, correct?
What if I would like to include a exogenous ordinal control variable, would I simply treat it as “numeric” instead of “ordered”?
Is it correct to control only the latent variables F4, F5 and F6 for the predictors, since the first order latent variables F1, F2 and F3 are “absorbed” by F4 (or should I also control them for the predictors)?
Is it in general not possible to include a variance between control variables and if yes, why?
Why is there no intercept/residual term for the exogenous predictors
If I would like to introduce moderator variables in my SEM, do I have to include them also in my CFA and if yes, how?
Dear Terrence, thank you again for these
helpful insights and sry for my late response.
In the meantime I tried to understand your recommendations and comments (still busy) J In addition, I conducted some literature research, and again confused myself.
May I summarize my main questions/what I understood and you may correct me? (I also have prepared some new and better pictures for illustration)
Situation: Supposing my hypothesis of a structural equation model looks like the one in the attachment called SEM. In order to check the hypothesis, I first would like to confirm the corresponding measurement model in a CFA. My first guess of such a measurement model you can also find in the attachment, it is called CFA.
Main problem no 1: Is a saturated measurement always preferred over testing each latent construct separately? From what I know from the literature, this is the case.
Main problem no 2: My hypothesized SEM model is containing a manifest moderator and a manifest control variable. As I have the hypothesis that my model is confounded by a variable, I introduce this variable as control variable and regress all the latent variables on this one in the SEM (as it is described here: http://www.statmodel2.com/discussion/messages/11/2656.html?1370553834 ). Do I need to introduce the manifest moderator and control variable also in the measurement model for the CFA, or is the measurement model only about the latent constructs? Here your recommendation is to introduce them in the measurement model as well. At this point, I am still a bit confused how the saturated measurement model and the corresponding lavaan code would look like. I also found some comments/examples on the internet that suggest to put only the latent constructs into the measurement model. For example the answers in this forum: https://www.researchgate.net/post/If_a_model_has_observed_variables_in_SEM_do_we_need_to_delete_it_while_conducting_CFA or the example described in this publication: http://steinhardtapps.es.its.nyu.edu/create/courses/3311/reading/7-Reporting_SEM_and_CFA__Schreiber__Stage__King__Nora__Barlow_.pdf
Sry for asking again these very basic questions, I would really
appreciate if someone could again give me some feedback about my
considerations.
Thanks and I wish you all a very nice day!
Jsabel
May I summarize my main questions/what I understood and you may correct me? (I also have prepared some new and better pictures for illustration)