Confirmatory Factor Analysis for ordinal data - I can not calculate the robust values

Ayse Kılınçaslan

unread,

Feb 22, 2017, 7:30:33 PM2/22/17

to lavaan

Hi everyone,

I am new to R and trying to use lavaan for CFA of a 19 item suicide scale. 213 adolescents were interviewed with the scale.

The scale includes items that has potential responses that are (1) dichotomous (yes/no) (2) ordinal (rank ordered, e.g., frequency of thoughts from low to high; y1, y2,y3,y4,y5), (3) categorical (e.g., reasons for suicidal ideation) and (4) continuous (e.g., total number of attempts).

“x1","x2","x3","x4","x5","y1","y2","y3","y4","y5","z1","z3","z4","z6","z8","z9" were accepted as ordinal and “z2”, “z5” and “z7” as continuous.

A recent paper found that the scale has two factors “Factor 1” and “Factor2”

I did the analysis as explained in the Lavaan Tutorial (September 29, 2016). I used lavaan (0.5.-22) package.

model.madan <- 'factorone=~x1+x2+x3+x4+x5+z1+z2+z3+z4+z5+z6+z7+z8+z9

factortwo=~y1+y2+y3+y4+y5'

fitmadan <- cfa (model.madan, data = CSSRSlife,

ordered =c("x1","x2","x3","x4","x5","y1","y2","y3","y4","y5","z1","z3","z4","z6","z8","z9"))

At this step several different warnings appeared:

Warning in pc_cor_TS(fit.y1 = UNI[[i]], fit.y2 = UNI[[j]], method = optim.method, : lavaan WARNING: empty cell(s) in bivariate table of x2 x x1 (MANY MANY OF THEM)

Warning in lav_samplestats_step2(UNI = FIT, ov.names = ov.names, zero.add = zero.add, : lavaan WARNING: correlation between variables y1 and x1 is (nearly) 1.0

Warning in muthen1984(Data = X[[g]], ov.names = ov.names[[g]], ov.types = ov.types, : lavaan WARNING: trouble inverting W matrix; used generalized inverse

Warning in lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats, : lavaan WARNING: could not compute standard errors!

lavaan NOTE: this may be a symptom that the model is not identified.

Warning in lav_model_test(lavmodel = lavmodel, lavpartable = lavpartable, : lavaan WARNING: could not compute scaled test statistic

Warning in lav_object_post_check(lavobject) : lavaan WARNING: some estimated ov variances are negative.

As many researchers say, these are warnings, not errors, I went on:

summary (fitmadan, fit.measures = TRUE)

It gave me the below anaysis:

lavaan (0.5-22) converged normally after 64 iterations

Number of observations 213

Estimator DWLS Robust

Minimum Function Test Statistic 1330.221 NA

Degrees of freedom 151 151

P-value (Chi-square) 0.000 NA

Scaling correction factor NA

Shift parameter

for simple second-order correction (Mplus variant)

Model test baseline model:

Minimum Function Test Statistic 1418310995056427467384028.000 431475753396686974604662.000

Degrees of freedom 171 171

P-value 0.000 0.000

User model versus baseline model:

Comparative Fit Index (CFI) 1.000 NA

Tucker-Lewis Index (TLI) 1.000 NA

Robust Comparative Fit Index (CFI) NA

Robust Tucker-Lewis Index (TLI) NA

Root Mean Square Error of Approximation:

RMSEA 0.192 NA

90 Percent Confidence Interval 0.183 0.201 NA NA

P-value RMSEA <= 0.05 0.000 NA

Robust RMSEA NA

90 Percent Confidence Interval NA NA

Standardized Root Mean Square Residual:

SRMR 0.190 0.190

Weighted Root Mean Square Residual:

WRMR 2.499 2.499

Parameter Estimates:

Information Expected

Standard Errors Robust.sem

Latent Variables:

Estimate Std.Err z-value P(>|z|)

factorone =~

x1 1.000

x2 0.910 NA

x3 0.827 NA

x4 0.880 NA

x5 0.609 NA

z1 0.871 NA

z2 0.462 NA

z3 0.691 NA

z4 0.833 NA

z5 0.285 NA

z6 1.036 NA

z7 0.283 NA

z8 0.560 NA

z9 0.914 NA

factortwo =~

y1 1.000

y2 0.999 NA

y3 0.973 NA

y4 0.930 NA

y5 0.998 NA

Covariances:

Estimate Std.Err z-value P(>|z|)

factorone ~~

factortwo 1.000 NA

Intercepts:

Estimate Std.Err z-value P(>|z|)

.x1 0.000

.x2 0.000

.x3 0.000

.x4 0.000

.x5 0.000

.z1 0.000

.z2 0.427 NA

.z3 0.000

.z4 0.000

.z5 0.150 NA

.z6 0.000

.z7 0.103 NA

.z8 0.000

.z9 0.000

.y1 0.000

.y2 0.000

.y3 0.000

.y4 0.000

.y5 0.000

factorone 0.000

factortwo 0.000

Thresholds:

Estimate Std.Err z-value P(>|z|)

x1|t1 0.006 NA

x2|t1 0.317 NA

x3|t1 0.818 NA

x4|t1 0.693 NA

x5|t1 1.908 NA

z1|t1 0.649 NA

z3|t1 0.578 NA

z4|t1 1.375 NA

z6|t1 1.629 NA

z8|t1 2.080 NA

z9|t1 0.510 NA

y1|t1 0.006 NA

y1|t2 0.418 NA

y1|t3 0.496 NA

y1|t4 1.188 NA

y1|t5 1.346 NA

y2|t1 0.006 NA

y2|t2 0.649 NA

y2|t3 1.237 NA

y2|t4 1.840 NA

y2|t5 2.350 NA

y3|t1 0.065 NA

y3|t2 0.564 NA

y3|t3 0.786 NA

y3|t4 1.237 NA

y3|t5 1.908 NA

y4|t1 0.053 NA

y4|t2 0.678 NA

y4|t3 1.120 NA

y4|t4 2.080 NA

y4|t5 2.350 NA

y5|t1 0.317 NA

y5|t2 0.496 NA

y5|t3 0.770 NA

y5|t4 1.016 NA

y5|t5 1.629 NA

Variances:

Estimate Std.Err z-value P(>|z|)

.x1 -0.220

.x2 -0.011

.x3 0.165

.x4 0.056

.x5 0.548

.z1 0.075

.z2 0.651 NA

.z3 0.418

.z4 0.154

.z5 0.235 NA

.z6 -0.309

.z7 0.248 NA

.z8 0.617

.z9 -0.019

.y1 0.117

.y2 0.118

.y3 0.163

.y4 0.236

.y5 0.119

factorone 1.220 NA

factortwo 0.883 NA

Scales y*:

Estimate Std.Err z-value P(>|z|)

x1 1.000

x2 1.000

x3 1.000

x4 1.000

x5 1.000

z1 1.000

z3 1.000

z4 1.000

z6 1.000

z8 1.000

z9 1.000

y1 1.000

y2 1.000

y3 1.000

y4 1.000

y5 1.000

> inspect(fitmadan,what="std")$lambda (for standardized factor loadins)

factrn fctrtw

x1 1.104 0.000

x2 1.005 0.000

x3 0.914 0.000

x4 0.971 0.000

x5 0.672 0.000

z1 0.962 0.000

z2 0.535 0.000

z3 0.763 0.000

z4 0.920 0.000

z5 0.544 0.000

z6 1.144 0.000

z7 0.531 0.000

z8 0.619 0.000

z9 1.010 0.000

y1 0.000 0.940

y2 0.000 0.939

y3 0.000 0.915

y4 0.000 0.874

y5 0.000 0.938

My questions are:

1. Am I doing something wrong?

2. Why am I having that much warnings? Is it just related to sample size (19 items, 213 cases)

3. Should I worry for them? What can I do?

4. It does not calculate robust values (as far as I know, I should report these values). How can I calculate them?

In case you would like to see the data, I am attaching the SPSS file.

Thanks in advance for your help!

Ayşe

life19item.R.novalues.sav

Terrence Jorgensen

unread,

Feb 25, 2017, 8:57:14 AM2/25/17

to lavaan

1. Am I doing something wrong?

It is hard to know without seeing your syntax as well.

2. Why am I having that much warnings? Is it just related to sample size (19 items, 213 cases)

This is indeed a small sample for such a large model with (mostly) categorical items.

3. Should I worry for them? What can I do?

DWLS is an asymptotic estimation method, performing worse than MLE does at moderate sample size with continuous data. If it is not feasible to gather more data, then you might benefit from using the blavaan package to fit the model in a Bayesian framework so that you can incorporate prior information from the previous research you mentioned. There is a dedicated forum for the blavaan package

https://groups.google.com/forum/?utm_source=digest&utm_medium=email#!forum/blavaan

and SEMNET would be a good place to ask for advice in general.

www2.gsu.edu/~mkteer/semnet.html

4. It does not calculate robust values (as far as I know, I should report these values).

That is what one of the warnings says.

How can I calculate them?

If lavaan can't, there is a reason (e.g., the warnings about having trouble inverting the weight matrix). You could try using the latest version of lavaan, but the warning messages don't seem to indicate a software bug being the problem, so I wouldn't get my hopes up.

http://lavaan.ugent.be/development.html

Terrence D. Jorgensen

Postdoctoral Researcher, Methods and Statistics

Research Institute for Child Development and Education, the University of Amsterdam

UvA web page: http://www.uva.nl/profile/t.d.jorgensen

Message has been deleted

Ayse Kılınçaslan

unread,

Feb 27, 2017, 7:34:02 AM2/27/17

to lavaan

Dear Dr. Jorgensen,

I am very much thankful to you for your answers. I started from the last one: To try the new verson of lavaan (lavaan 0.6-1.1098)

The below was my syntax:

> model.madanL <- 'factorone=~x1+x2+x3+x4+x5+z1+z2+z3+z4+z5+z6+z7+z8+z9

+ factortwo=~y1+y2+y3+y4+y5'

> fit.madanL <- cfa(model.madanL, data = CSSRSlifetime, ordered =c("x1","x2","x3","x4","x5","y1","y2","y3","y4","y5","z1","z3","z4","z6","z8","z9"))

It ended with an error:

Error in nlminb(start = start.x, objective = minimize.this.function, gradient = GRADIENT, :

NA/NaN gradient evaluation

I did not understand the error. I know that I should stop at this point but when I wrote:

> summary(fit.madanL, fit.measures = TRUE)

It gave another error:

Error in summary(fit.madanL, fit.measures = TRUE) :

'fit.madanL' object can not be found.

19 item CSSRS scale has two versions: lifetime and recent.

The above data (which I had problem were associated with lifetime answers). I decided to use the above syntax and lavan 0.6-1.1098 for recent answers. Both (recent and lifetime) were answered by the same 213 patients, and I tried CFA for the same two factors. Both data were predominantly ordinal in nature.

model.madanR <- 'factorone=~a1+a2+a3+a4+a5+c1+c2+c3+c4+c5+c6+c7+c8+c9

+ factortwo=~b1+b2+b3+b4+b5'

> fit.madanR <- cfa(model.madanR, data = CSSRSrecent, ordered=c("a1","a2","a3","a4","a5","b1","b2","b3","b4","b5","c1","c3","c4","c6","c8","c9"))

> summary (fit.madanR, fit.measures = TRUE)

And it worked without any warnings:

lavaan (0.6-1.1098) converged normally after 65 iterations

Number of observations 213

Estimator DWLS Robust

Minimum Function Test Statistic 521.791 357.129

Degrees of freedom 151 151

P-value (Chi-square) 0.000 0.000

Scaling correction factor 2.030

Shift parameter 100.090

for simple second-order correction (Mplus variant)

Model test baseline model:

Minimum Function Test Statistic 519451.816 159221.395

Degrees of freedom 171 171

P-value 0.000 0.000

User model versus baseline model:

Comparative Fit Index (CFI) 0.999 0.999

Tucker-Lewis Index (TLI) 0.999 0.999

Robust Comparative Fit Index (CFI) NA

Robust Tucker-Lewis Index (TLI) NA

Root Mean Square Error of Approximation:

RMSEA 0.108 0.080

90 Percent Confidence Interval 0.098 0.118 0.070 0.091

P-value RMSEA <= 0.05 0.000 0.000

Robust RMSEA NA

90 Percent Confidence Interval NA NA

Standardized Root Mean Square Residual:

SRMR 0.144 0.144

Weighted Root Mean Square Residual:

WRMR 1.565 1.565

So my new questions are:

Why might this happen? Is it just related to content of the answers (sample size, item number and factor numbers are the same)
What do you advise at this stage? Shall I try blavaan package?

Thank you very much indeed,

Ayse

Ayse Kılınçaslan

unread,

Feb 28, 2017, 3:45:20 AM2/28/17

to lavaan

I really do not understand what the “Error in nlminb(start = start.x, objective = minimize.this.function, gradient = GRADIENT, : NA/NaN gradient evaluation” mean?

When I googled it in a lavan forum ( https://groups.google.com/forum/#!topic/lavaan/26NE8tEoYJI )

Prof. Rosseel answered it “there must be something funny with your dataset” asked the forum writer to send him the full R script and a snippet of the data (just enough to replicate the error)? There was no return.

When I check the data, I see nothing wrong and it is similar to the “recent dataset” which I had no problem.

Any help will be greately appreciated,

Ayse

Ayse Kılınçaslan

unread,

Mar 1, 2017, 4:13:11 AM3/1/17

to lavaan

I tried the blavaan and it ended with an error "blavaan ERROR: ordinal variables are not yet supported"

> library (blavaan)

This is lavaan 0.6-1.1098.

This is blavaan 0.2-3

> model.madan <- 'factorone=~x1+x2+x3+x4+x5+z1+z2+z3+z4+z5+z6+z7+z8+z9

+ factortwo=~y1+y2+y3+y4+y5'

> fit.madan <- bcfa (model.madan, data = CSSRSlifetime, ordered =c("x1","x2","x3","x4","x5","y1","y2","y3","y4","y5","z1","z3","z4","z6","z8","z9"))

Error in blavaan (model.madan, data = CSSRSlifetime, ordered = c("x1", :

blavaan ERROR: ordinal variables are not yet supported.

Yves Rosseel

unread,

Mar 1, 2017, 5:47:02 AM3/1/17

to lav...@googlegroups.com

> So my new questions are:
>
>

> 1. Why might this happen? Is it just related to content of the answers

> (sample size, item number and factor numbers are the same)

> 2. What do you advise at this stage? Shall I try blavaan package?

I do not think other software will help you (and blavaan does not
support ordinal data yet).

The problem is with the data. Try starting with smaller models,
involving less items. Perhaps start with a single factor, and four
items. Add one item at a time. Perhaps you have items that are (almost)
identical. In that case, leave one out.

Yves.

Ayse Kılınçaslan

unread,

Mar 1, 2017, 5:23:57 PM3/1/17

to lavaan

Dear Prof. Rosseel,

Thank you very much for your answer.

The CSSRS scale includes items concerning lifetime suicidal ideation, suicidal ideation severity and suicidal behavior. And a recent study found two factors and I tried these 2 factors on my data. But it did not work with my data.

When I tried smaller models it worked. For example 5-item ideation score yielded a robust RMSEA of 0.051. (However, 5-factor severity ideation yielded a RMSEA of 0.134 and 9-factor behavior 0.149).

At this stage would you advise conducting a principal component analysis, in order to decide which items to retain, out of 19 items of the whole scale?

If so, how can I do PCA with R for a data with mostly categorical items (16 out of 19)?

All the best,

Ayse

Terrence Jorgensen

unread,

Mar 2, 2017, 5:16:30 AM3/2/17

to lavaan

Error in nlminb(start = start.x, objective = minimize.this.function, gradient = GRADIENT, :

NA/NaN gradient evaluation

I did not understand the error.

nlminb() is the optimizer that searches for a set of parameter estimates that minimize the fit function. The error means your model did not converge on a solution.

> summary(fit.madanL, fit.measures = TRUE)

It gave another error:

Error in summary(fit.madanL, fit.measures = TRUE) :

'fit.madanL' object can not be found.

Because of the previous error, that object was not created. R cannot find it because it is not there.

If so, how can I do PCA with R for a data with mostly categorical items (16 out of 19)?

PCA assumes multivariate normality (at least implicitly, as it is applied), so it would not be appropriate. Yves advice was not to change your hypothesized model by permanently reducing the number of items. What you should do is simply choose 3 items and fit a 1-factor CFA (it will be just identified). From there, start building up the model slowly by adding an item at a time for the same factor, then 3 items for another factor, and add one more at a time... The goal is not to interpret those models, but to try to find out the practical limits of your data by seeing when the convergence problems start happening.

Ayse Kılınçaslan

unread,

Mar 17, 2017, 7:12:44 AM3/17/17

to lavaan

Dear Prof. Rosseel and Dr. Jorgensen,

Thank you very much. I learned a lot from you.

All the best,

Ayse

Reply all

Reply to author

Forward