CFA with latent and manifest variables

Jsabel Hodel

unread,

Mar 9, 2018, 2:01:19 AM3/9/18

to lav...@googlegroups.com

Dear lavaan group

I would like to ask three rather general and technical questions, as I am at the very beginning of a project where I would like to use SEM.

1.

The situation is the following: I would like to build a structural equation model out of several latent variables (the corresponding manifest variables are all categorical) and also one single manifest variable standing for its own (measured on an interval scale).

In a first step I would like to do a CFA to see if my measurement model fits the data. See the picture in the attachment to get an idea of the CFA. But the problem is, I don’t know how to deal with the single manifest variable, called M17 in the attachement. Since I would like this variable to be part of my SEM model, I guess I have to include it also in the CFA, but how to do this in lavaan?

This is my guess how to do the CFA:

cfa <- "F1 =~ M1 + M2 + M3 + M4

F2 =~ M5 + M6 + M7 + M8

F3 =~ M9 + M10 + M11 + M12

F4 =~ M13 + M14 + M15 + M16

F5 =~ F1 + F2 + F3 + F4

M17 =~ M17"

cfa_fit <- cfa(cfa, estimator = "WLSMV", data = data)

Is this okay or is there a recommended way how to technically include this one manifest variable around the latent variables in a CFA?

2.

I would like to control my SEM for some variables. Do I need to integrate them also in the CFA?

3.

I did run a first CFA with the code above and there was a warning message appearing:

lavaan WARNING: 199 bivariate tables have empty cells; to see them, use:

lavInspect(fit, "zero.cell.tables")

By collapsing response categories of several categorical manifest variables I was able to reduce the number of bivariate tables with empty cells drastically but not completely. Do you think the appearing of this warning message could also be mentioned as a limitation of the model (for example in a paper)? What other options do I have at this stage?

Thank you for your help!

Isabel

CFA.pdf

Terrence Jorgensen

unread,

Mar 9, 2018, 6:13:18 AM3/9/18

to lavaan

I don’t know how to deal with the single manifest variable

M17 =~ M17"

cfa_fit <- cfa(cfa, estimator = "WLSMV", data = data)

Is this okay or is there a recommended way how to technically include this one manifest variable around the latent variables in a CFA?

This is fine. The cfa() function automatically (and appropriately) fixes the residual variance of a single indicator to zero, and either the loading or the "factor" variance is fixed to 1, depending on whether std.lv=TRUE or FALSE. See ?lavOptions

I would like to control my SEM for some variables. Do I need to integrate them also in the CFA?

If you want to control for them, yes, they should be added as predictors of whatever outcomes for which you want to interpret slopes as conditional on the covariates.

I did run a first CFA with the code above and there was a warning message appearing:

lavaan WARNING: 199 bivariate tables have empty cells; to see them, use:

lavInspect(fit, "zero.cell.tables")

This is bound to happen, and is more likely when you have more categories and fewer observations. It is not an error, just a warning so you are aware of it.

By collapsing response categories of several categorical manifest variables

That is not recommended because you would be throwing away valuable information about individual differences. lavaan already does the advisable thing by default, so don't collapse categories.

Terrence D. Jorgensen

Postdoctoral Researcher, Methods and Statistics

Research Institute for Child Development and Education, the University of Amsterdam

UvA web page: http://www.uva.nl/profile/t.d.jorgensen

Isa

unread,

Mar 9, 2018, 8:02:16 AM3/9/18

to lavaan

Hi Terrence,

Thank you very much for your response!

There is still two things that bother me:

1. When I am running the above code (still without including the control variables). I get the following warning message (next to the one with the empty bivariate tables) that seems to disappear when I am deleting the line with the M17 =~ M17.

lavaan WARNING: covariance matrix of latent variables

is not positive definite;

use inspect(fit,"cov.lv") to investigate.

What does this mean and is it a big problem, how can I avoid it?

2. When I try to include 2 control variables like age (numeric) and gender (ordinal) to the above code in the same way as I included the M17 variable above, I get the following additional warning message:

lavaan WARNING: could not compute scaled test statistic

lavaan WARNING: could not compute standard errors!

lavaan NOTE: this may be a symptom that the model is not identified.

So as it is suggesting in the message, the model seems to be not identified. Is there another way to bring in the control variables or is my only option now to exclude (one of) them?

Best, Jsabel

Terrence Jorgensen

unread,

Mar 9, 2018, 7:41:21 PM3/9/18

to lavaan

lavaan WARNING: covariance matrix of latent variables

is not positive definite;

use inspect(fit,"cov.lv") to investigate.

What does this mean and is it a big problem, how can I avoid it?

a NPD matrix could mean there is a Heywood case (out-of-bounds estimate, such as a correlation exceeding 1 or a non-positive (residual) variance. As the message says, inspect the LV covariance matrix using the syntax it shows you, so you can see if there is a negative (or zero) factor variance. You can also use inspect(fit, "cor.lv") to see if any factor correlations exceed +/-1. If there are no Heywood cases, then it is difficult to say what the problem is.

2. When I try to include 2 control variables like age (numeric) and gender (ordinal)

I assume you measured gender as binary, so you should enter it into the model as a dummy code, not classified as "ordered". If you allowed for k > 2 categories of gender, you need k - 1 dummy codes as predictors. Use the argument "fixed.x = TRUE", and do not create latent phantom constructs for either age or gender -- they are just exogenous predictors.

Isa

unread,

Mar 12, 2018, 12:21:24 PM3/12/18

to lavaan

Dear Terrence,

thanks again for the support!

I have now adapted my CFA with your recommendation, and it seems to work (also the warning messages from above disappeared).

Let me ask some more questions to check if I have adapted my code correctly:

Regarding the Classifications of the control variables:

Gender is indeed a dummy variable, I have now classified it as “numeric” instead of “ordered”.

Age I have not categorized now and just classified as “numeric”. If I would categorize it, I would do the k-1 dummy variables as you described and also classify them as “numeric”, correct?

What if I would like to include a exogenous ordinal control variable, would I simply treat it as “numeric” instead of “ordered”?

Control variables as exogenous predictors:

I have attached a picture which shows my idea of how to impute the control variables. Is it correct to control only the latent variables F4, F5 and F6 for the predictors, since the first order latent variables F1, F2 and F3 are “absorbed” by F4 (or should I also control them for the predictors)?

Conceptual question regarding the variance of control variables:

I tried to introduce a variance term for the control variables, the following warning message appears:

Error in tmp[cbind(REP$row[idx], REP$col[idx])] <- lavpartable$free[idx] :

NAs are not allowed in subscripted assignments

In addition: Warning messages:

1: In lavaan::lavaan(model = cfa3, data = data_imputed, estimator = "WLSMV", :

lavaan WARNING: syntax contains parameters involving exogenous covariates; switching to fixed.x = FALSE

Is it in general not possible to include a variance between control variables and if yes, why?

Conceptual question regarding exogenous predictors:

Why is there no intercept/residual term for the exogenous predictors when I check summary(CFA, standardized = TRUE, fit.measures = TRUE)?

In my understanding the includion of these predictors is regression, which could have an intercept..

General question about Moderator variables:

If I would like to introduce moderator variables in my SEM, do I have to include them also in my CFA and if yes, how?

Thanks a lot again for the support!

Best wishes, Jsabel

CFA_control.pdf

Terrence Jorgensen

unread,

Mar 13, 2018, 8:17:21 AM3/13/18

to lavaan

Age I have not categorized now and just classified as “numeric”. If I would categorize it, I would do the k-1 dummy variables as you described and also classify them as “numeric”, correct?

I would only make dummy codes for age if you have only observed a few discrete age categories. If you measured age continuously, you can treat it that way (assuming its effects are linear).

What if I would like to include a exogenous ordinal control variable, would I simply treat it as “numeric” instead of “ordered”?

Again, depends if you want to make a linearity assumption (which is what you would do if you treated it as continuous). If there are few ordinal categories, I would treat it as nominal. The order among categories would only be important if you are interested in testing hypotheses that incorporate the ordering among the groups, but you said they are just control variables.

Is it correct to control only the latent variables F4, F5 and F6 for the predictors, since the first order latent variables F1, F2 and F3 are “absorbed” by F4 (or should I also control them for the predictors)?

It depends on what effects you want to interpret in terms of controlling for age and sex. If you want to say "Among people with the same sex and age, the effect of F5 on F2 is...", then you need to regress F2 on both F5 and the control variables. Your current model does not control for age and sex at all, it merely estimates the effects of age and sex on F4, F5, and F6, which have no other predictors. Age and sex can only indirectly affect F1, F2 and F3 via F4 (not F5 or F6, which only covary with F4). An indirect effect of age and sex on F1 does not "control" for their effects on F1.

You might also want to check for effects of age and sex ob indicators. Your model currently assumes measurement invariance across sex and age. You can use MIMIC (or restricted factor anaysis) models to test that assumption. If the assumption of metric invariance is violated, your latent-regression coefficients will be biased.

Is it in general not possible to include a variance between control variables and if yes, why?

There is no need to do so for exogenous observed predictors (the "X" part of the model, as opposed to the endogenous "Y" part). The default setting in lavaan is "fixed.x = TRUE", which indicates that the means and (co)variances of/among exogenous observed variables is not estimated; instead, their sample statistics are simply assumed to be fixed by design, and only their covariances with other variables are used to estimate their effects on other (latent or observed) variables. This allows you to include categorical exogenous predictors without violating the multivariate normality assumption, although it also means there cannot be any missing values among the exogenous variables. As the warning message states, you can simply set "fixed.x = FALSE", which is what it does if it sees you estimating exogenous parameters in the syntax.

But that has nothing to do with your error message. I suspect it is an issue with your syntax, so please post that.

Why is there no intercept/residual term for the exogenous predictors

See above. If they are fixed, they are not estimated, but their observed sample stats are simply assumed.

If I would like to introduce moderator variables in my SEM, do I have to include them also in my CFA and if yes, how?

The moderator variable is also a predictor, so you should check measurement invariance across it, too, before drawing conclusions about how it affects the slopes of other predictors. If the interaction involves latent variables, you can use the product-indicator method (see the ?indProd help page in the semTools package, and read the references therein). If you write me, I can share an in-press IMPS-proceedings chapter I coauthored about fitting product-indicator MIMIC / RFA models in lavaan. It is focused on testing measurement invariance, but the same latent-interaction (with an exogenous covariate) can be used to predict other latent factors.

Isa

unread,

Apr 5, 2018, 1:42:28 AM4/5/18

to lavaan

Dear Terrence, thank you again for these helpful insights and sry for my late response.

In the meantime I tried to understand your recommendations and comments (still busy) J In addition, I conducted some literature research, and again confused myself.

May I summarize my main questions/what I understood and you may correct me? (I also have prepared some new and better pictures for illustration)

Situation: Supposing my hypothesis of a structural equation model looks like the one in the attachment called SEM. In order to check the hypothesis, I first would like to confirm the corresponding measurement model in a CFA. My first guess of such a measurement model you can also find in the attachment, it is called CFA.

Main problem no 1: Is a saturated measurement always preferred over testing each latent construct separately? From what I know from the literature, this is the case.

Main problem no 2: My hypothesized SEM model is containing a manifest moderator and a manifest control variable. As I have the hypothesis that my model is confounded by a variable, I introduce this variable as control variable and regress all the latent variables on this one in the SEM (as it is described here: http://www.statmodel2.com/discussion/messages/11/2656.html?1370553834 ). Do I need to introduce the manifest moderator and control variable also in the measurement model for the CFA, or is the measurement model only about the latent constructs? Here your recommendation is to introduce them in the measurement model as well. At this point, I am still a bit confused how the saturated measurement model and the corresponding lavaan code would look like. I also found some comments/examples on the internet that suggest to put only the latent constructs into the measurement model. For example the answers in this forum: https://www.researchgate.net/post/If_a_model_has_observed_variables_in_SEM_do_we_need_to_delete_it_while_conducting_CFA or the example described in this publication: http://steinhardtapps.es.its.nyu.edu/create/courses/3311/reading/7-Reporting_SEM_and_CFA__Schreiber__Stage__King__Nora__Barlow_.pdf

Sry for asking again these very basic questions, I would really appreciate if someone could again give me some feedback about my considerations.

Thanks and I wish you all a very nice day!

Jsabel

SEM.pdf

CFA.pdf

Terrence Jorgensen

unread,

Apr 5, 2018, 8:20:30 AM4/5/18

to lavaan

May I summarize my main questions/what I understood and you may correct me? (I also have prepared some new and better pictures for illustration)

I refer you to my previous post, where I advised you not only to include the observed control and moderator variable, but to test whether the measurement parameters were invariant with respect to them (using a MIMIC model).

Reply all

Reply to author

Forward