Confirmatory Factor Analysis for ordinal data - I can not calculate the robust values

794 views
Skip to first unread message

Ayse Kılınçaslan

unread,
Feb 22, 2017, 7:30:33 PM2/22/17
to lavaan

Hi everyone,


I am new to R and trying to use lavaan for CFA of a 19 item suicide scale. 213 adolescents were interviewed with the scale.


The scale includes items that has potential responses that are (1) dichotomous (yes/no) (2) ordinal (rank ordered, e.g., frequency of thoughts from low to high; y1, y2,y3,y4,y5), (3) categorical (e.g., reasons for suicidal ideation) and (4) continuous (e.g., total number of attempts).


“x1","x2","x3","x4","x5","y1","y2","y3","y4","y5","z1","z3","z4","z6","z8","z9" were accepted as ordinal and “z2”, “z5” and “z7” as continuous.

A recent paper found that the scale has two factors “Factor 1” and “Factor2”


I did the analysis as explained in the Lavaan Tutorial (September 29, 2016). I used lavaan (0.5.-22) package.


model.madan <- 'factorone=~x1+x2+x3+x4+x5+z1+z2+z3+z4+z5+z6+z7+z8+z9

factortwo=~y1+y2+y3+y4+y5'

fitmadan <- cfa (model.madan, data = CSSRSlife,

ordered =c("x1","x2","x3","x4","x5","y1","y2","y3","y4","y5","z1","z3","z4","z6","z8","z9"))


At this step several different warnings appeared:

Warning in pc_cor_TS(fit.y1 = UNI[[i]], fit.y2 = UNI[[j]], method = optim.method,  :  lavaan WARNING: empty cell(s) in bivariate table of x2 x x1   (MANY MANY OF THEM)

Warning in lav_samplestats_step2(UNI = FIT, ov.names = ov.names, zero.add = zero.add,  : lavaan WARNING: correlation between variables y1 and x1 is (nearly) 1.0

Warning in muthen1984(Data = X[[g]], ov.names = ov.names[[g]], ov.types = ov.types,  : lavaan WARNING: trouble inverting W matrix; used generalized inverse

Warning in lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats,  :  lavaan WARNING: could not compute standard errors!

  lavaan NOTE: this may be a symptom that the model is not identified.

Warning in lav_model_test(lavmodel = lavmodel, lavpartable = lavpartable,  :  lavaan WARNING: could not compute scaled test statistic

Warning in lav_object_post_check(lavobject) :   lavaan WARNING: some estimated ov variances are negative.


As many researchers say, these are warnings, not errors, I went on:


summary (fitmadan, fit.measures = TRUE)


It gave me the below anaysis:


lavaan (0.5-22) converged normally after  64 iterations

 

  Number of observations                           213

 

  Estimator                                       DWLS      Robust

  Minimum Function Test Statistic             1330.221          NA

  Degrees of freedom                               151         151

  P-value (Chi-square)                           0.000          NA

  Scaling correction factor                                     NA

  Shift parameter                                    

    for simple second-order correction (Mplus variant)

Model test baseline model:

  Minimum Function Test Statistic           1418310995056427467384028.000  431475753396686974604662.000

  Degrees of freedom                               171         171

  P-value                                        0.000       0.000

User model versus baseline model:

  Comparative Fit Index (CFI)                    1.000          NA

  Tucker-Lewis Index (TLI)                       1.000          NA

  Robust Comparative Fit Index (CFI)                            NA

  Robust Tucker-Lewis Index (TLI)                               NA

Root Mean Square Error of Approximation:

  RMSEA                                          0.192          NA

  90 Percent Confidence Interval          0.183  0.201          NA     NA

  P-value RMSEA <= 0.05                          0.000          NA

  Robust RMSEA                                                  NA

  90 Percent Confidence Interval                                NA     NA

Standardized Root Mean Square Residual:

  SRMR                                           0.190       0.190

Weighted Root Mean Square Residual:

  WRMR                                           2.499       2.499

Parameter Estimates:

  Information                                 Expected

  Standard Errors                           Robust.sem

Latent Variables:

                   Estimate  Std.Err  z-value  P(>|z|)

  factorone =~                                       

    x1                1.000                          

    x2                0.910       NA                 

    x3                0.827       NA                 

    x4                0.880       NA                 

    x5                0.609       NA                  

    z1                0.871       NA                 

    z2                0.462       NA                 

    z3                0.691       NA                 

    z4                0.833       NA                 

    z5                0.285       NA                 

    z6                1.036       NA                 

    z7                0.283       NA                 

    z8                0.560       NA                 

    z9                0.914       NA                 

  factortwo =~                                       

    y1                1.000                          

    y2                0.999       NA                 

    y3                0.973       NA                 

    y4                0.930       NA                 

    y5                0.998       NA                 

 

Covariances:

                   Estimate  Std.Err  z-value  P(>|z|)

  factorone ~~                                       

    factortwo         1.000       NA                 

Intercepts:

                   Estimate  Std.Err  z-value  P(>|z|)

   .x1                0.000                          

   .x2                0.000                          

   .x3                0.000                          

   .x4                0.000                          

   .x5                0.000                          

   .z1                0.000                          

   .z2                0.427       NA                 

   .z3                0.000                          

   .z4                0.000                          

   .z5                0.150       NA                 

   .z6                0.000                          

   .z7                0.103       NA                 

   .z8                0.000                           

   .z9                0.000                          

   .y1                0.000                          

   .y2                0.000                          

   .y3                0.000                          

   .y4                0.000                          

   .y5                0.000                          

    factorone         0.000                          

    factortwo         0.000                          

Thresholds:

                   Estimate  Std.Err  z-value  P(>|z|)

    x1|t1             0.006       NA                 

    x2|t1             0.317       NA                 

    x3|t1             0.818       NA                 

    x4|t1             0.693       NA                 

    x5|t1             1.908       NA                 

    z1|t1             0.649       NA                 

    z3|t1             0.578       NA                 

    z4|t1             1.375       NA                 

    z6|t1             1.629       NA                  

    z8|t1             2.080       NA                 

    z9|t1             0.510       NA                 

    y1|t1             0.006       NA                 

    y1|t2             0.418       NA                 

    y1|t3             0.496       NA                 

    y1|t4             1.188       NA                 

    y1|t5             1.346       NA                 

    y2|t1             0.006       NA                 

    y2|t2             0.649       NA                 

    y2|t3             1.237       NA                 

    y2|t4             1.840       NA                 

    y2|t5             2.350       NA                 

    y3|t1             0.065       NA                 

    y3|t2             0.564       NA                 

    y3|t3             0.786       NA                 

    y3|t4             1.237       NA                 

    y3|t5             1.908       NA                 

    y4|t1             0.053       NA                 

    y4|t2             0.678       NA                 

    y4|t3             1.120       NA                 

    y4|t4             2.080       NA                 

    y4|t5             2.350       NA                 

    y5|t1             0.317       NA                 

    y5|t2             0.496       NA                 

    y5|t3             0.770       NA                 

    y5|t4             1.016       NA                 

    y5|t5             1.629       NA                  

Variances:

                   Estimate  Std.Err  z-value  P(>|z|)

   .x1               -0.220                          

   .x2               -0.011                          

   .x3                0.165                          

   .x4                0.056                          

   .x5                0.548                          

   .z1                0.075                          

   .z2                0.651       NA                 

   .z3                0.418                          

   .z4                0.154                          

   .z5                0.235       NA                 

   .z6               -0.309                          

   .z7                0.248       NA                 

   .z8                0.617                          

   .z9               -0.019                          

   .y1                0.117                          

   .y2                0.118                          

   .y3                0.163                           

   .y4                0.236                          

   .y5                0.119                          

    factorone         1.220       NA                 

    factortwo         0.883       NA                 

Scales y*:

                   Estimate  Std.Err  z-value  P(>|z|)

    x1                1.000                          

    x2                1.000                          

    x3                1.000                          

    x4                1.000                           

    x5                1.000                          

    z1                1.000                          

    z3                1.000                          

    z4                1.000                          

    z6                1.000                          

    z8                1.000                          

    z9                1.000                          

    y1                1.000                          

    y2                1.000                          

    y3                1.000                          

    y4                1.000                          

    y5                1.000                          

> inspect(fitmadan,what="std")$lambda   (for standardized factor loadins)

   factrn fctrtw

x1  1.104  0.000

x2  1.005  0.000

x3  0.914  0.000

x4  0.971  0.000

x5  0.672  0.000

z1  0.962  0.000

z2  0.535  0.000

z3  0.763  0.000

z4  0.920  0.000

z5  0.544  0.000

z6  1.144  0.000

z7  0.531  0.000

z8  0.619  0.000

z9  1.010  0.000

y1  0.000  0.940

y2  0.000  0.939

y3  0.000  0.915

y4  0.000  0.874

y5  0.000  0.938

 

My questions are:

1. Am I doing something wrong?

2. Why am I having that much warnings? Is it just related to sample size (19 items, 213 cases)

3.  Should I worry for them? What can I do?

4. It does not calculate robust values (as far as I know, I should report these values). How can I calculate them?


In case you would like to see the data, I am attaching the SPSS file.


Thanks in advance for your help!


Ayşe

 

life19item.R.novalues.sav

Terrence Jorgensen

unread,
Feb 25, 2017, 8:57:14 AM2/25/17
to lavaan

1. Am I doing something wrong?


It is hard to know without seeing your syntax as well.

2. Why am I having that much warnings? Is it just related to sample size (19 items, 213 cases)


This is indeed a small sample for such a large model with (mostly) categorical items.

3.  Should I worry for them? What can I do?


DWLS is an asymptotic estimation method, performing worse than MLE does at moderate sample size with continuous data.  If it is not feasible to gather more data, then you might benefit from using the blavaan package to fit the model in a Bayesian framework so that you can incorporate prior information from the previous research you mentioned.  There is a dedicated forum for the blavaan package


and SEMNET would be a good place to ask for advice in general.


4. It does not calculate robust values (as far as I know, I should report these values).


That is what one of the warnings says.

How can I calculate them?


If lavaan can't, there is a reason (e.g., the warnings about having trouble inverting the weight matrix).  You could try using the latest version of lavaan, but the warning messages don't seem to indicate a software bug being the problem, so I wouldn't get my hopes up.


Terrence D. Jorgensen
Postdoctoral Researcher, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Message has been deleted

Ayse Kılınçaslan

unread,
Feb 27, 2017, 7:34:02 AM2/27/17
to lavaan

Dear Dr. Jorgensen,

I am very much thankful to you for your answers. I started from the last one: To try the new verson of lavaan (lavaan 0.6-1.1098)

The below was my syntax:

> model.madanL <- 'factorone=~x1+x2+x3+x4+x5+z1+z2+z3+z4+z5+z6+z7+z8+z9

+ factortwo=~y1+y2+y3+y4+y5'

> fit.madanL <- cfa(model.madanL, data = CSSRSlifetime, ordered =c("x1","x2","x3","x4","x5","y1","y2","y3","y4","y5","z1","z3","z4","z6","z8","z9"))

It ended with an error:

Error in nlminb(start = start.x, objective = minimize.this.function, gradient = GRADIENT,  :

  NA/NaN gradient evaluation

I did not understand the error. I know that I should stop at this point but when I wrote:

> summary(fit.madanL, fit.measures = TRUE)

It gave another error:

Error in summary(fit.madanL, fit.measures = TRUE) :

  'fit.madanL' object can not be found.

 

19 item CSSRS scale has two versions: lifetime and recent.

The above data (which I had problem were associated with lifetime answers). I decided to use the above syntax and lavan 0.6-1.1098 for recent answers. Both (recent and lifetime) were answered by the same 213 patients, and I tried CFA for the same two factors. Both data were predominantly ordinal in nature.

model.madanR <- 'factorone=~a1+a2+a3+a4+a5+c1+c2+c3+c4+c5+c6+c7+c8+c9

+ factortwo=~b1+b2+b3+b4+b5'

> fit.madanR <- cfa(model.madanR, data = CSSRSrecent, ordered=c("a1","a2","a3","a4","a5","b1","b2","b3","b4","b5","c1","c3","c4","c6","c8","c9"))

> summary (fit.madanR, fit.measures = TRUE)

And it worked without any warnings:

lavaan (0.6-1.1098) converged normally after  65 iterations

 

  Number of observations                           213

  Estimator                                       DWLS      Robust

  Minimum Function Test Statistic              521.791     357.129

  Degrees of freedom                               151         151

  P-value (Chi-square)                           0.000       0.000

  Scaling correction factor                                  2.030

  Shift parameter                                          100.090

    for simple second-order correction (Mplus variant)

Model test baseline model:

  Minimum Function Test Statistic           519451.816  159221.395

  Degrees of freedom                               171         171

  P-value                                        0.000       0.000

User model versus baseline model:

  Comparative Fit Index (CFI)                    0.999       0.999

  Tucker-Lewis Index (TLI)                       0.999       0.999

  Robust Comparative Fit Index (CFI)                            NA

  Robust Tucker-Lewis Index (TLI)                               NA

Root Mean Square Error of Approximation:

  RMSEA                                          0.108       0.080

  90 Percent Confidence Interval          0.098  0.118       0.070  0.091

  P-value RMSEA <= 0.05                          0.000       0.000

  Robust RMSEA                                                  NA

  90 Percent Confidence Interval                                NA     NA

Standardized Root Mean Square Residual:

  SRMR                                           0.144       0.144

Weighted Root Mean Square Residual:

 WRMR                              1.565       1.565


So my new questions are:


  1. Why might this  happen? Is it just related to content of the answers (sample size, item number and factor numbers are the same)
  2. What do you advise at this stage? Shall I try blavaan package?

Thank you very much indeed,

Ayse

Ayse Kılınçaslan

unread,
Feb 28, 2017, 3:45:20 AM2/28/17
to lavaan

I really do not understand what the “Error in nlminb(start = start.x, objective = minimize.this.function, gradient = GRADIENT,  :   NA/NaN gradient evaluation” mean?

When I googled it in a lavan forum ( https://groups.google.com/forum/#!topic/lavaan/26NE8tEoYJI )

Prof. Rosseel answered it “there must be something funny  with your dataset” asked the forum writer to send him the full R script and a snippet of the data (just enough to replicate the error)? There was no return.

When I check the data, I see nothing wrong and it is similar to the “recent dataset” which I had no problem.

Any help will be greately appreciated,

Ayse 

Ayse Kılınçaslan

unread,
Mar 1, 2017, 4:13:11 AM3/1/17
to lavaan
I tried the blavaan and it ended with an error "blavaan ERROR: ordinal variables are not yet supported"

> library (blavaan)

This is lavaan 0.6-1.1098.

This is blavaan 0.2-3

> model.madan <- 'factorone=~x1+x2+x3+x4+x5+z1+z2+z3+z4+z5+z6+z7+z8+z9

+ factortwo=~y1+y2+y3+y4+y5'

> fit.madan <- bcfa (model.madan, data = CSSRSlifetime, ordered =c("x1","x2","x3","x4","x5","y1","y2","y3","y4","y5","z1","z3","z4","z6","z8","z9"))

Error in blavaan (model.madan, data = CSSRSlifetime, ordered = c("x1",  :

  blavaan ERROR: ordinal variables are not yet supported.

Yves Rosseel

unread,
Mar 1, 2017, 5:47:02 AM3/1/17
to lav...@googlegroups.com
> So my new questions are:
>
>
> 1. Why might this happen? Is it just related to content of the answers
> (sample size, item number and factor numbers are the same)
> 2. What do you advise at this stage? Shall I try blavaan package?

I do not think other software will help you (and blavaan does not
support ordinal data yet).

The problem is with the data. Try starting with smaller models,
involving less items. Perhaps start with a single factor, and four
items. Add one item at a time. Perhaps you have items that are (almost)
identical. In that case, leave one out.

Yves.

Ayse Kılınçaslan

unread,
Mar 1, 2017, 5:23:57 PM3/1/17
to lavaan

Dear Prof. Rosseel,

Thank you very much for your answer.

The CSSRS scale includes items concerning lifetime suicidal ideation, suicidal ideation severity and suicidal behavior. And a recent study found two factors and I tried these 2 factors on my data. But it did not work with my data.

When I tried smaller models it worked. For example 5-item ideation score yielded a robust RMSEA of 0.051. (However, 5-factor severity ideation yielded a RMSEA of 0.134 and 9-factor behavior 0.149).

At this stage would you advise conducting a principal component analysis, in order to decide which items to retain, out of 19 items of the whole scale?

If so, how can I do PCA with R for a data with mostly categorical items (16 out of 19)?

All the best,

Ayse

Terrence Jorgensen

unread,
Mar 2, 2017, 5:16:30 AM3/2/17
to lavaan

Error in nlminb(start = start.x, objective = minimize.this.function, gradient = GRADIENT,  :

  NA/NaN gradient evaluation

I did not understand the error.


nlminb() is the optimizer that searches for a set of parameter estimates that minimize the fit function.  The error means your model did not converge on a solution.

> summary(fit.madanL, fit.measures = TRUE)

It gave another error:

Error in summary(fit.madanL, fit.measures = TRUE) :

  'fit.madanL' object can not be found.


Because of the previous error, that object was not created.  R cannot find it because it is not there.

If so, how can I do PCA with R for a data with mostly categorical items (16 out of 19)?


PCA assumes multivariate normality (at least implicitly, as it is applied), so it would not be appropriate.  Yves advice was not to change your hypothesized model by permanently reducing the number of items.  What you should do is simply choose 3 items and fit a 1-factor CFA (it will be just identified).  From there, start building up the model slowly by adding an item at a time for the same factor, then 3 items for another factor, and add one more at a time... The goal is not to interpret those models, but to try to find out the practical limits of your data by seeing when the convergence problems start happening.

Ayse Kılınçaslan

unread,
Mar 17, 2017, 7:12:44 AM3/17/17
to lavaan

Dear Prof. Rosseel and Dr. Jorgensen,


Thank you very much. I learned a lot from you.


All the best,


Ayse

Reply all
Reply to author
Forward
0 new messages