CFA with ordinal data

1,095 views
Skip to first unread message

Niels L

unread,
Jan 20, 2016, 5:51:58 AM1/20/16
to lavaan
Dear all,
 
This is my first time using R and the Lavaan package, so I apologize in advance for asking basic questions. Currently, I am trying to fit three nested models on my dataset (23 ordinal variables, 634 observations), and I have run into a few issues I hope you can help me with:
 
1. I receive >50 'empty cell(s) in bivariate table' warnings, and I am only able to view the first 50. I already read that empty cell warnings are not necessarily bad, but in order to judge this I need to know how many there are. Is there any way I can view all the warnings? Or alternatively, the bivariate frequency tables?
?
2. I would like to incorporate age as a regressor in my models, is this at all possible when doing CFA on ordinal data? Currently, I have tried to have my latent variable(s) covary with age like this:
 
'factor1 =~ item1 + item2 + item3 + ... + item23
factor1~~age'
 
However, I get the following error:
Error in tmp[cbind(REP$row[idx], REP$col[idx])] <- lavpartable$free[idx] : 
  NAs are not allowed in subscripted assignments
 
What would be the proper way to model this?
 
3. When analyzing ordinal data, Lavaan does not provide an AIC or BIC. What statistic would you recommend to choose the superior model (as mentioned, the three models are nested )?
 
Again, I apologize if these are stupid questions, but any help would be greatly appreciated!
 
Best,

Niels
 
 
 

Terrence Jorgensen

unread,
Jan 20, 2016, 8:10:39 AM1/20/16
to lavaan
1. I receive >50 'empty cell(s) in bivariate table' warnings, and I am only able to view the first 50. I already read that empty cell warnings are not necessarily bad, but in order to judge this I need to know how many there are. Is there any way I can view all the warnings? Or alternatively, the bivariate frequency tables?

Use the lavTables() function.  You can provide it your data.frame or your fitted model object (returned by lavaan() or cfa() function), and it will return a data.frame summarizing all the 2-way tables among ordered variables.  Any zero frequencies will be found in rows with 0 under the column "obs.freq"

2. I would like to incorporate age as a regressor in my models, is this at all possible when doing CFA on ordinal data?

Yes, regardless of the type of indicator, that is called a MIMIC model (easy to find information by searching Google or SEMNET).

Currently, I have tried to have my latent variable(s) covary with age like this:
 
'factor1 =~ item1 + item2 + item3 + ... + item23
factor1~~age'

A single tilde is used for regression:    factor1 ~ age

However, I get the following error:
Error in tmp[cbind(REP$row[idx], REP$col[idx])] <- lavpartable$free[idx] : 
  NAs are not allowed in subscripted assignments

Not sure what caused this without seeing the full R script, and maybe the data.  Do you get an error when actually regression the factor on age?

3. When analyzing ordinal data, Lavaan does not provide an AIC or BIC.

No, those are calculated from log-likelihoods, so by definition they are only available using MLE, not least-squares estimators.

What statistic would you recommend to choose the superior model (as mentioned, the three models are nested )?

Because the models are nested, you can use the chi-squared difference test, which can be calculated using the anova() function.

fit1 <- lavaan(...)
fit2 <- lavaan(...)
fit3 <- lavaan(...)
anova(fit1, fit2, fit3) 

For non-nested models, you could compare them using incremental fit indices (e.g., CFI, TLI), as long as they are calculated using a baseline/null model that is nested within all competing models, so that the CFI/TLI places the competing models on the same continuum between a null model and a saturated model.

Terry

Niels L

unread,
Jan 20, 2016, 10:55:45 AM1/20/16
to lavaan
Dear Terry,
 
Massive thanks to you! As an absolute R beginner, I can hardly overstate how much you helped me out.
 
Using a single tilde indeed eliminated the error I got when trying to regress the factor on age. Could you (or someone else) confirm that I am correctly controlling for age when entering the following syntax?:
 
'factor1 =~ item1 + ... + item12
factor2 =~ item13 + ... + item23
factor1~age
factor2~age'
 
Again, thanks very much for your help!
 
Niels
 

Terrence Jorgensen

unread,
Jan 22, 2016, 7:20:20 AM1/22/16
to lavaan
Could you (or someone else) confirm that I am correctly controlling for age when entering the following syntax?:
 
'factor1 =~ item1 + ... + item12
factor2 =~ item13 + ... + item23
factor1~age
factor2~age'

Well, you are using age to predict each factor, that is correct.  But the term "control" indicates that you want to estimate the effects of a predictor (say X) on an outcome (say Y) when holding a covariate (here, age) constant.  What effects are you interested in estimating while controlling for age?  So far, you are estimating the effect of age on each factor, and the effects of the factors on their indicators, so you are saying that age only affects the indicators indirectly via the factors (you can think of the factors as mediators in this circumstance). But there is no "controlling" going on, so that may not be what you mean.

In the CFA without age, the only predictors are latent factors, and the only outcomes are observed indicators.  To estimate the effect of factors on indicators (i.e., factor loadings) while controlling for age, then you should regress the indicators on age.  Because you haven't observed the latent factor, the model will not be identified if you try to simultaneously estimate the effect of age on the factors and on all indicators, so you need to choose at least one indicator per factor that is not regressed on age (or you can fit a series of models, regressing one item at a time on age).  When you regress the factors on age, this is called the MIMIC method for detecting differential item functioning (DIF; also called measurement bias or noninvariance); or you can use restricted factor analysis (RFA), in which you allow the factors to correlate with age instead of estimating those relationships regression slopes -- they are equivalent models, so they lead to the same conclusions, but you can probably find more literature on the MIMIC method.  Here are some links to get started:


Terry

Niels L

unread,
Jan 26, 2016, 6:19:54 AM1/26/16
to lavaan

Op vrijdag 22 januari 2016 13:20:20 UTC+1 schreef Terrence Jorgensen:
Dear Terry,
 
Thank you very much for the advice and the recommendations, I am going to look into it!
 
Best,
 
Niels

Niels L

unread,
Jan 26, 2016, 11:02:52 AM1/26/16
to lavaan

Op woensdag 20 januari 2016 14:10:39 UTC+1 schreef Terrence Jorgensen:

 
What statistic would you recommend to choose the superior model (as mentioned, the three models are nested )?

Because the models are nested, you can use the chi-squared difference test, which can be calculated using the anova() function.

fit1 <- lavaan(...)
fit2 <- lavaan(...)
fit3 <- lavaan(...)
anova(fit1, fit2, fit3) 

For non-nested models, you could compare them using incremental fit indices (e.g., CFI, TLI), as long as they are calculated using a baseline/null model that is nested within all competing models, so that the CFI/TLI places the competing models on the same continuum between a null model and a saturated model.


 
 
Even though my models are nested, I run into the problem described here (https://groups.google.com/forum/#!topic/lavaan/oYBI6uBx5ro), and forcing the covariances to 1 deteriorates my model fits. Therefore, I would like to  calculate calculate the chi-squared difference test by hand (using the formula described here: https://www.statmodel.com/chidiff.shtml), since I know the models are nested. When using the WLSMV estimator, are the scaled Chi-square values reported by Lavaan scaled with the Satorra-Bentler method?(i.e. am I allowed to use this test)
 
Thanks in advance!
 
Niels

Terrence Jorgensen

unread,
Jan 27, 2016, 3:13:48 AM1/27/16
to lavaan
I would like to  calculate calculate the chi-squared difference test by hand (using the formula described here: https://www.statmodel.com/chidiff.shtml), since I know the models are nested. When using the WLSMV estimator, are the scaled Chi-square values reported by Lavaan scaled with the Satorra-Bentler method?(i.e. am I allowed to use this test)

I forget what the default behavior is, but read the description of the "test" argument in the ?lavaan help page to see how to request the specific test statistic you want.

Terry

yrosseel

unread,
Jan 27, 2016, 3:22:31 AM1/27/16
to lav...@googlegroups.com
On 01/27/2016 09:13 AM, Terrence Jorgensen wrote:
> I would like to calculate calculate the chi-squared difference test
> by hand (using the formula described here:
> https://www.statmodel.com/chidiff.shtml

This procedure is only valid when test = "Satorra.Bentler" (or estimator
= "MLM", or "WLSM")

This particular version of the scaled difference test is implemented in
lavaan using lavTestLRT(fit1, fit2, test = "satorra.bentler.2001").

You could indeed compute this by hand, and you should get the same
result as the output of lavTestLRT, in this case.

But WLSMV uses the Satterthwaite adjustment (mean AND variance
adjusted). This calls for a different scaled test statistic, which
corresponds to lavTestLRT(fit1, fit2, test = "satorra.2000").

Computing this by hand would be, well, a challenge. But of course, you
can look at the source code to see how things are done.

Yves.

Niels L

unread,
Jan 27, 2016, 5:20:28 AM1/27/16
to lavaan
Dear Terry and Yves,
 
Thank you very much for your replies.
 
Yves, if you describe the process of computing the Satterthwaite scaled statistic difference test as a challenge, I would prefer to have it done via Lavaan, as I am not really comfortable with the R syntax.
However, when using the anova() function, I receive an error.
 
I am running the following syntax:
 
RPQ1factor.model <- 'general aggression =~ RPQ1 + RPQ2 + RPQ3 + RPQ4 + RPQ5 + RPQ6 + RPQ7 + RPQ8 + RPQ9 + RPQ10 + RPQ11 + RPQ12 + RPQ13 + RPQ14 + RPQ15 + RPQ16 + RPQ17 + RPQ18 + RPQ19 + RPQ20 + RPQ21 + RPQ22 + RPQ23 '
RPQ2factor.model <- 'reactive =~ RPQ1 + RPQ3 + RPQ5 + RPQ7 + RPQ8 + RPQ11 + RPQ13 + RPQ14 + RPQ16 + RPQ19 + RPQ22
                    proactive =~ RPQ2 + RPQ4 + RPQ6 + RPQ9 + RPQ10 + RPQ12 + RPQ15 + RPQ17 + RPQ18 + RPQ20 + RPQ21 + RPQ23'
 
f1 <- cfa(RPQ1factor.model, data = data, ordered = c("RPQ1", "RPQ2", "RPQ3", "RPQ4", "RPQ5", "RPQ6", "RPQ7", "RPQ8", "RPQ9", "RPQ10", "RPQ11", "RPQ12", "RPQ13", "RPQ14", "RPQ15", "RPQ16", "RPQ17", "RPQ18", "RPQ19", "RPQ20", "RPQ21", "RPQ22", "RPQ23"), std.lv = TRUE)
f2 <- cfa(RPQ2factor.model, data = data, ordered = c("RPQ1", "RPQ2", "RPQ3", "RPQ4", "RPQ5", "RPQ6", "RPQ7", "RPQ8", "RPQ9", "RPQ10", "RPQ11", "RPQ12", "RPQ13", "RPQ14", "RPQ15", "RPQ16", "RPQ17", "RPQ18", "RPQ19", "RPQ20", "RPQ21", "RPQ22", "RPQ23"), std.lv = TRUE)
 
anova(f1, f2, test = "satorra.2000", SB.classic = FALSE)
 
Error in names(mods) <- model.names : 
  'names' attribute [3] must be the same length as the vector [2]
 
Any thoughts on how to fix this?
 
On another note, the 1 factor model does not converge unless I standardize the latent variables (std.lv = TRUE), when it converges in 20 iterations.The other model converges just fine either way, with the exact same fit measures.
I am aware that standardizing the lv has implications for the interpretation of the factor loadings, but does it have any other concrete implications for the model fitting? (I am not really interested in the factor loadings, but rather in which model has the best fit). 
I apologize for asking this here, but my google searching yielded no concrete information.
 
Thanks!
 
 
 
 
 

Niels L

unread,
Feb 4, 2016, 8:58:18 AM2/4/16
to lavaan
In case another SEM newbie also runs into this problem, it was solved for me by defining the 1 factor model the same as the 2 factor model, with the addition of fixing the covariance between the reactive and proactive factor to 1, like so:

RPQ1factor.model <- 'reactive =~ RPQ1 + RPQ3 + RPQ5 + RPQ7 + RPQ8 + RPQ11 + RPQ13 + RPQ14 + RPQ16 + RPQ19 + RPQ22
                               proactive =~ RPQ2 + RPQ4 + RPQ6 + RPQ9 + RPQ10 + RPQ12 + RPQ15 + RPQ17 + RPQ18 + RPQ20 + RPQ21 + RPQ23
                               reactive ~~ 1*proactive'

This is effectively the same as the previous definition of the 1 factor model, but it makes it possible for Lavaan to recognize the nestedness of the models. 

Op woensdag 27 januari 2016 11:20:28 UTC+1 schreef Niels L:
Reply all
Reply to author
Forward
0 new messages