CFA on non-normal ordinal data

Maria Vale

unread,

Aug 28, 2023, 6:17:01 AM8/28/23

to lavaan

Hi everyone,

I hope you are doing well,

I'm taking my first steps in R and the Lavaan Package, and I'm trying to investigate the factorial structure of a measure related to dating victimization. The sample consists of 957 participants without missing values. The response options use a 5-point Likert scale. Both univariate and multivariate normalities are violated.

Here is the model, along with the outputs and warning messages.

Model.

Four first-order factor structure

model <- '

f1 =~ TV1 + TV2 + TV3 + TV4 + TV5 + TV6 + TV7 + TV8 + TV9 + TV10

f2 =~ TV11 + TV12 + TV13 + TV14 + TV15 + TV16 + TV17

f3 =~ TV18 + TV19 + TV20 + TV21 + TV22+ TV23 + TV24 + TV25

f4 =~ TV26 + TV27 + TV28 + TV29+ TV30'

fit.model <- cfa(model, data = TARV, ordered = TRUE, estimator = "DWLS", se = "robust", test = "scaled.shifted", std.lv = TRUE)

summary(fit.model, std = TRUE, fit.measures = TRUE)

Output:

lavaan 0.6.16 ended normally after 34 iterations

Estimator DWLS

Optimization method NLMINB

Number of model parameters 185

Number of observations 957

Model Test User Model:

Standard Scaled

Test Statistic 381.267 617.183

Degrees of freedom 399 399

P-value (Chi-square) 0.730 0.000

Scaling correction factor 1.083

Shift parameter 265.087

simple second-order correction

Model Test Baseline Model:

Test statistic 142505.191 17287.640

Degrees of freedom 435 435

P-value 0.000 0.000

Scaling correction factor 8.430

User Model versus Baseline Model:

Comparative Fit Index (CFI) 1.000 0.987

Tucker-Lewis Index (TLI) 1.000 0.986

Robust Comparative Fit Index (CFI) NA

Robust Tucker-Lewis Index (TLI) NA

Root Mean Square Error of Approximation:

RMSEA 0.000 0.024

90 Percent confidence interval - lower 0.000 0.020

90 Percent confidence interval - upper 0.009 0.028

P-value H_0: RMSEA <= 0.050 1.000 1.000

P-value H_0: RMSEA >= 0.080 0.000 0.000

Robust RMSEA NA

90 Percent confidence interval - lower NA

90 Percent confidence interval - upper NA

P-value H_0: Robust RMSEA <= 0.050 NA

P-value H_0: Robust RMSEA >= 0.080 NA

Standardized Root Mean Square Residual:

SRMR 0.041 0.041

I am getting this warning message:

Warning message:

In lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats, :

lavaan WARNING:

The variance-covariance matrix of the estimated parameters (vcov)

does not appear to be positive definite! The smallest eigenvalue

(= -5.592776e-16) is smaller than zero. This may be a symptom that

the model is not identified.

I did run these functions, and I am getting these outputs:

> lavInspect(fit.model, "cov.lv")

f1 f2 f3 f4

f1 1.000

f2 0.854 1.000

f3 0.922 0.816 1.000

f4 0.903 0.873 0.937 1.000

> det(lavInspect(fit.model, "cov.lv"))

[1] 0.003425802

> eigen(lavInspect(fit.model, "cov.lv"))

eigen() decomposition

$values

[1] 3.6536666 0.1999477 0.0990346 0.0473511

$vectors

[,1] [,2] [,3] [,4]

[1,] -0.5036916 0.1988844 0.7828187299 0.3064875

[2,] -0.4841475 -0.8386756 -0.0007780208 -0.2494471

[3,] -0.5033330 0.4944816 -0.1812847326 -0.6850399

[4,] -0.5084800 0.1120542 -0.5952563117 0.6120146

What can I do to find out the origin of these messages and solve the problem?

Are the DWLS, robust and scaled.shifted correct for the scale type and sample size? Or should I change for MLM, robust and satorra.bentler?

Does the chi-square statistic and is p-value compromise the model?

I'm sorry if these seem like really basic questions.

Thank you,

Maria

Keith Markus

unread,

Aug 28, 2023, 9:14:20 AM8/28/23

to lavaan

Maria,

Others may spot something more specific, however, based on the information that you provided, this is what I would do in your situation.

Step 1: Use the summary output, parameter table, and/or parameter matrices to take inventory of all the parameters being estimated in your model. Confirm that where you rely on convenience functions that they have indeed freed all and only the parameters you intended. Make sure that your model contains no unintended free parameters. It may also be helpful to fit the model to data simulated from the model (see simpulateData()) to determine whether the problem is specific to the data, or a general problem for the model.

Step 2: Follow the strategy you have already started by investigating the observed and implied covariances. See if the issues are limited to the covariances of latent variables or if they extend to observed variables either observed or implied. Try fitting a model with all the latent variable variances fixed to 1 and all the correlations=covariances fixed to 1 (and all the non-zero loadings freely estimated). This tests whether you need more than one latent variable. If you do, then try fixing the correlations between subsets of latent variables to 1 to see if you need more than 1 but less than 4. Not every subset of correlations is coherent, choose subsets of latent variables to treat as being the same. I count 6 possible pairs and 4 possible trios. This will tell you if the model is over-parameterized due to too many dimensions. (A tutorial on this procedure was published in the SEM Journal some decades ago.)

Step 3: If none of that helps, try building up the model a little at a time. Fit a one-factor model separately to each of the four subscales. Then add them one at a time go gradually work your way up to the full model. See at what step the problems become evident and use that to narrow down the source of the problem.

Keith

------------------------
Keith A. Markus
John Jay College of Criminal Justice, CUNY
http://jjcweb.jjay.cuny.edu/kmarkus
Frontiers of Test Validity Theory: Measurement, Causation and Meaning.
http://www.routledge.com/books/details/9781841692203/

Terrence Jorgensen

unread,

Sep 15, 2023, 5:48:15 AM9/15/23

to lavaan

The sample consists of 957 participants without missing values.

That's not very large, considering how many variables you are modeling.

The response options use a 5-point Likert scale.

So your model has to account for 30*29/2 polychoric correlations + 30*4 thresholds = 555 summary statistics.

Both univariate and multivariate normalities are violated.

How asymmetric are the thresholds?

https://psycnet.apa.org/doi/10.1037/a0029315

Treating as numeric and using robust ML might not require as large a sample size for stable results. Standardized parameters might still be slightly attenuated relative to assuming underlying normality of latent responses, but that isn't really a good basis for choosing whether to treat the responses as ordinal or numeric.

https://doi.org/10.3389/feduc.2020.589965

I am getting this warning message:

The variance-covariance matrix of the estimated parameters (vcov)

does not appear to be positive definite!

I did run these functions

No, you inspected the covariance matrix of variables. The message is about the covariance matrix of parameter estimates.

ACOV <- vcov(fit.model)

cov2cor(ACOV) # any correlations near or in excess of 1?

det(ACOV) # etc.

should I change for MLM, robust and satorra.bentler?

That might be better, given the size of the model, but hard to know without a simulation tailored to your own situation. How much larger would your sample be if you (multiply) imputed incomplete data?

Does the chi-square statistic and is p-value compromise the model?

Probably, given the small N relative to number of parameters. The ratio is only around 5 for your hypothesized model, and is < 2 for your saturated model (the basis for calculating the model's chi-squared statistic).

Terrence D. Jorgensen

Assistant Professor, Methods and Statistics

Research Institute for Child Development and Education, the University of Amsterdam

http://www.uva.nl/profile/t.d.jorgensen

Reply all

Reply to author

Forward