CFA WLSMV estimator with categorical indicators

710 views
Skip to first unread message

Yang Yang

unread,
Feb 14, 2018, 4:27:56 PM2/14/18
to lavaan
Hi all, 

I am conducting a large project to evaluate model fit of a four-factor model. The four-factor model contains 92 categorical items. Each item has two response options, but these options are not ordinal, instead they are binary (0 and 1). I collected over 20K responses from 15 different countries. There are two questions I want to answer. First, does the four-factor model fit the combined sample well? Second, does the four-factor model fit all country/area subgroups well?

I followed the instructions and recommendation of lavaan package and other resources. Basically, I converted the response from 1 and 2 to 0 and 1, and performed a listwise deletion. For the estimator, I used WLSMV, which is claimed to be the most appropriate estimation for categorical data. There's no warning for the combined sample, but I got many pieces of warning when running country/area subgroups. By the way, all subgroups have N > 250 (however, given the number of indicators, 92, and factors, 4, this N is apparently insufficient).

Let's take the Australia sample (N = 776) as an example. I got two warnings:
1. Error in nlminb(start = start.x, objective = minimize.this.function, gradient = GRADIENT,    
    NA/NaN gradient evaluation
2. In lav_samplestats_from_data(lavdata = lavdata, missing = lavoptions$missing,  :

    lavaan WARNING: number of observations (776) too small to compute Gamma

When I got the first warning, R cannot calculate anything for me. However, if there is only the second warning (for example, U.S. sample N = 3,000), R can calculate what I need (e.g., fit indices). So my first question is that what does the first warning mean and how can I fix it? (P.S. I examined the response pattern, it's good).



The lavaan PDF says "lavaan will automatically switch to the WLSMV estimator: it will use diagonally weighted least squares (DWLS) to estimate model parameters, but it will use the full weighted matrix to compute robust standard error, and a mean- and variance-adjusted test statistics." So my second question is that "which function test statistic (chi-square) should I report, DWLS or Robust?" (See the example below).

lavaan (0.5-23.1097) converged normally after 458 iterations

  Number of observations                           634

  Estimator                                       DWLS      Robust
  Minimum Function Test Statistic            10164.685    6863.405
  Degrees of freedom                              4088        4088
  P-value (Chi-square)                           0.000       0.000
  Scaling correction factor                                  2.526
  Shift parameter                                         2840.064
    for simple second-order correction (Mplus variant)


Thanks you so much!
Yang




Terrence Jorgensen

unread,
Feb 22, 2018, 10:11:30 AM2/22/18
to lavaan
I converted the response from 1 and 2 to 0 and 1,

This is unnecessary if the variables are declared as ordered.

and performed a listwise deletion.

This is unnecessary and suboptimal.  You can set the argument missing = "pairwise" to use all available data.

all subgroups have N > 250 (however, given the number of indicators, 92, and factors, 4, this N is apparently insufficient).

That is very true.  92 indicators implies (92*91) / 2 = 4186 estimated polychoric correlations (per group) that the model tries to reproduce.  That's why even the sample of 776 is too small for some calculations.

how can I fix it?

Gather more data or fit a smaller model.  You are pushing beyond the limits of what can reasonably be expected (estimating thousands of parameters from only hundreds of data points).

which function test statistic (chi-square) should I report, DWLS or Robust?

Robust

Terrence D. Jorgensen
Postdoctoral Researcher, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Yang Yang

unread,
Feb 22, 2018, 5:28:15 PM2/22/18
to lav...@googlegroups.com, tjorge...@gmail.com
Appreciate, Terrence! Your explanations help a lot.

I have one more question regarding the issue of sample size. I have data from 20 countries (coded as A to T) whose N range from 262 to 3,578. Given the rationale that 92 indicators need super large samples to get appropriate results, I wonder why some country samples can get results, but some cannot. 

I attached a table below to clarify my question. Countries are listed in a descending order of N., For example, Country H has 380 respondents, but Country I have a smaller sample size (N = 369). The issue is that I got fit indices for Country I, but not for Country H. Anything wrong with this? And how can I fix this problem?

Thanks in advance!!!

Yang

Country N RMSEA SRMR CFI TLI GFI
A 3578 0.043 0.044 0.932 0.931 1.000
B 2831 0.044 0.046 0.923 0.921 1.000
C 939 0.044 0.052 0.931 0.930 1.000
D 776
E 634 0.048 0.060 0.929 0.927 0.998
F 389 0.047 0.066 0.940 0.939 1.000
G 383 0.040 0.062 0.943 0.942 1.000
H 380
I 369 0.042 0.065 0.899 0.896 1.000
J 362
K 361 0.051 0.070 0.887 0.884 1.000
L 360 0.054 0.074 0.773 0.767 0.999
M 354
N 350
O 322
P 309 0.045 0.069 0.941 0.939 1.000
Q 307 0.051 0.074 0.860 0.856 1.000
R 305 0.043 0.069 0.867 0.864 0.999
S 294 0.044 0.070 0.909 0.907 0.999
T 262


--
You received this message because you are subscribed to a topic in the Google Groups "lavaan" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lavaan/Ns-zMnifkDU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lavaan+unsubscribe@googlegroups.com.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at https://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages