CFA on non-normal ordinal data

278 views
Skip to first unread message

Maria Vale

unread,
Aug 28, 2023, 6:17:01 AM8/28/23
to lavaan

Hi everyone,

I hope you are doing well,

I'm taking my first steps in R and the Lavaan Package, and I'm trying to investigate the factorial structure of a measure related to dating victimization. The sample consists of 957 participants without missing values. The response options use a 5-point Likert scale. Both univariate and multivariate normalities are violated.

Here is the model, along with the outputs and warning messages.

Model.

Four first-order factor structure

model <- '

f1 =~ TV1 + TV2 + TV3 + TV4 + TV5 + TV6 + TV7 + TV8 + TV9 + TV10

f2 =~ TV11 + TV12 + TV13 + TV14 + TV15 + TV16 + TV17

f3 =~ TV18 + TV19 + TV20 + TV21 + TV22+ TV23 + TV24 + TV25

f4 =~ TV26 + TV27 + TV28 + TV29+ TV30'

fit.model <- cfa(model, data = TARV, ordered = TRUE, estimator = "DWLS", se = "robust", test = "scaled.shifted", std.lv = TRUE)

summary(fit.model, std = TRUE, fit.measures = TRUE)

 

Output:

lavaan 0.6.16 ended normally after 34 iterations

 

  Estimator                                       DWLS

  Optimization method                           NLMINB

  Number of model parameters                       185

 

  Number of observations                           957

 

Model Test User Model:

                                              Standard      Scaled

  Test Statistic                               381.267     617.183

  Degrees of freedom                               399         399

  P-value (Chi-square)                           0.730       0.000

  Scaling correction factor                                  1.083

  Shift parameter                                          265.087

    simple second-order correction                               

 

Model Test Baseline Model:

 

  Test statistic                            142505.191   17287.640

  Degrees of freedom                               435         435

  P-value                                        0.000       0.000

  Scaling correction factor                                  8.430

 

User Model versus Baseline Model:

 

  Comparative Fit Index (CFI)                    1.000       0.987

  Tucker-Lewis Index (TLI)                       1.000       0.986

                                                                 

  Robust Comparative Fit Index (CFI)                            NA

  Robust Tucker-Lewis Index (TLI)                               NA

 

Root Mean Square Error of Approximation:

 

  RMSEA                                          0.000       0.024

  90 Percent confidence interval - lower         0.000       0.020

  90 Percent confidence interval - upper         0.009       0.028

  P-value H_0: RMSEA <= 0.050                    1.000       1.000

  P-value H_0: RMSEA >= 0.080                    0.000       0.000

                                                                  

  Robust RMSEA                                                  NA

  90 Percent confidence interval - lower                        NA

  90 Percent confidence interval - upper                        NA

  P-value H_0: Robust RMSEA <= 0.050                            NA

  P-value H_0: Robust RMSEA >= 0.080                            NA

 

Standardized Root Mean Square Residual:

 

  SRMR                                           0.041       0.041



I am getting this warning message:

Warning message:

In lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats,  :

  lavaan WARNING:

    The variance-covariance matrix of the estimated parameters (vcov)

    does not appear to be positive definite! The smallest eigenvalue

    (= -5.592776e-16) is smaller than zero. This may be a symptom that

    the model is not identified.

 

I did run these functions, and I am getting these outputs:

 

> lavInspect(fit.model, "cov.lv")

      f1    f2    f3    f4

f1 1.000                 

f2 0.854 1.000           

f3 0.922 0.816 1.000     

f4 0.903 0.873 0.937 1.000

 

> det(lavInspect(fit.model, "cov.lv"))

[1] 0.003425802

 

> eigen(lavInspect(fit.model, "cov.lv"))

eigen() decomposition

$values

[1] 3.6536666 0.1999477 0.0990346 0.0473511

 

$vectors

           [,1]       [,2]          [,3]       [,4]

[1,] -0.5036916  0.1988844  0.7828187299  0.3064875

[2,] -0.4841475 -0.8386756 -0.0007780208 -0.2494471

[3,] -0.5033330  0.4944816 -0.1812847326 -0.6850399

[4,] -0.5084800  0.1120542 -0.5952563117  0.6120146


What can I do to find out the origin of these messages and solve the problem? 

Are the DWLS, robust and scaled.shifted correct for the scale type and sample size? Or should I change for MLM, robust and satorra.bentler?

Does the chi-square statistic and is p-value compromise the model?

I'm sorry if these seem like really basic questions.

Thank you,

Maria

Keith Markus

unread,
Aug 28, 2023, 9:14:20 AM8/28/23
to lavaan
Maria,
Others may spot something more specific, however, based on the information that you provided, this is what I would do in your situation.

Step 1: Use the summary output, parameter table, and/or parameter matrices to take inventory of all the parameters being estimated in your model.  Confirm that where you rely on convenience functions that they have indeed freed all and only the parameters you intended.  Make sure that your model contains no unintended free parameters.  It may also be helpful to fit the model to data simulated from the model (see simpulateData()) to determine whether the problem is specific to the data, or a general problem for the model.

Step 2: Follow the strategy you have already started by investigating the observed and implied covariances.  See if the issues are limited to the covariances of latent variables or if they extend to observed variables either observed or implied.  Try fitting a model with all the latent variable variances fixed to 1 and all the correlations=covariances fixed to 1 (and all the non-zero loadings freely estimated).  This tests whether you need more than one latent variable.  If you do, then try fixing the correlations between subsets of latent variables to 1 to see if you need more than 1 but less than 4.  Not every subset of correlations is coherent, choose subsets of latent variables to treat as being the same.  I count 6 possible pairs and 4 possible trios.  This will tell you if the model is over-parameterized due to too many dimensions.  (A tutorial on this procedure was published in the SEM Journal some decades ago.)

Step 3: If none of that helps, try building up the model a little at a time.  Fit a one-factor model separately to each of the four subscales.  Then add them one at a time go gradually work your way up to the full model.  See at what step the problems become evident and use that to narrow down the source of the problem.

Keith
------------------------
Keith A. Markus
John Jay College of Criminal Justice, CUNY
http://jjcweb.jjay.cuny.edu/kmarkus
Frontiers of Test Validity Theory: Measurement, Causation and Meaning.
http://www.routledge.com/books/details/9781841692203/

Terrence Jorgensen

unread,
Sep 15, 2023, 5:48:15 AM9/15/23
to lavaan

The sample consists of 957 participants without missing values.


That's not very large, considering how many variables you are modeling. 

The response options use a 5-point Likert scale.


So your model has to account for 30*29/2 polychoric correlations  + 30*4 thresholds = 555 summary statistics.  

Both univariate and multivariate normalities are violated.


How asymmetric are the thresholds?  


Treating as numeric and using robust ML might not require as large a sample size for stable results.  Standardized parameters might still be slightly attenuated relative to assuming underlying normality of latent responses, but that isn't really a good basis for choosing whether to treat the responses as ordinal or numeric.


I am getting this warning message:

    The variance-covariance matrix of the estimated parameters (vcov)

    does not appear to be positive definite! 

I did run these functions


No, you inspected the covariance matrix of variables.  The message is about the covariance matrix of parameter estimates.

ACOV <- vcov(fit.model) 
cov2cor(ACOV) # any correlations near or in excess of 1?
det(ACOV) # etc.

 should I change for MLM, robust and satorra.bentler?


That might be better, given the size of the model, but hard to know without a simulation tailored to your own situation.  How much larger would your sample be if you (multiply) imputed incomplete data?
 

Does the chi-square statistic and is p-value compromise the model?


Probably, given the small N relative to number of parameters.  The ratio is only around 5 for your hypothesized model, and is < 2 for your saturated model (the basis for calculating the model's chi-squared statistic).
 
Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Reply all
Reply to author
Forward
0 new messages