CFA WLSMV estimator with categorical indicators

Yang Yang

unread,

Feb 14, 2018, 4:27:56 PM2/14/18

to lavaan

Hi all,

I am conducting a large project to evaluate model fit of a four-factor model. The four-factor model contains 92 categorical items. Each item has two response options, but these options are not ordinal, instead they are binary (0 and 1). I collected over 20K responses from 15 different countries. There are two questions I want to answer. First, does the four-factor model fit the combined sample well? Second, does the four-factor model fit all country/area subgroups well?

I followed the instructions and recommendation of lavaan package and other resources. Basically, I converted the response from 1 and 2 to 0 and 1, and performed a listwise deletion. For the estimator, I used WLSMV, which is claimed to be the most appropriate estimation for categorical data. There's no warning for the combined sample, but I got many pieces of warning when running country/area subgroups. By the way, all subgroups have N > 250 (however, given the number of indicators, 92, and factors, 4, this N is apparently insufficient).

Let's take the Australia sample (N = 776) as an example. I got two warnings:

1. Error in nlminb(start = start.x, objective = minimize.this.function, gradient = GRADIENT, :

NA/NaN gradient evaluation

2. In lav_samplestats_from_data(lavdata = lavdata, missing = lavoptions$missing, :

lavaan WARNING: number of observations (776) too small to compute Gamma

When I got the first warning, R cannot calculate anything for me. However, if there is only the second warning (for example, U.S. sample N = 3,000), R can calculate what I need (e.g., fit indices). So my first question is that what does the first warning mean and how can I fix it? (P.S. I examined the response pattern, it's good).

The lavaan PDF says "lavaan will automatically switch to the WLSMV estimator: it will use diagonally weighted least squares (DWLS) to estimate model parameters, but it will use the full weighted matrix to compute robust standard error, and a mean- and variance-adjusted test statistics." So my second question is that "which function test statistic (chi-square) should I report, DWLS or Robust?" (See the example below).

lavaan (0.5-23.1097) converged normally after 458 iterations

  Number of observations                           634

  Estimator                                       DWLS      Robust
  Minimum Function Test Statistic            10164.685    6863.405
  Degrees of freedom                              4088        4088
  P-value (Chi-square)                           0.000       0.000
  Scaling correction factor                                  2.526
  Shift parameter                                         2840.064
    for simple second-order correction (Mplus variant)

Thanks you so much!

Yang

Terrence Jorgensen

unread,

Feb 22, 2018, 10:11:30 AM2/22/18

to lavaan

I converted the response from 1 and 2 to 0 and 1,

This is unnecessary if the variables are declared as ordered.

and performed a listwise deletion.

This is unnecessary and suboptimal. You can set the argument missing = "pairwise" to use all available data.

all subgroups have N > 250 (however, given the number of indicators, 92, and factors, 4, this N is apparently insufficient).

That is very true. 92 indicators implies (92*91) / 2 = 4186 estimated polychoric correlations (per group) that the model tries to reproduce. That's why even the sample of 776 is too small for some calculations.

how can I fix it?

Gather more data or fit a smaller model. You are pushing beyond the limits of what can reasonably be expected (estimating thousands of parameters from only hundreds of data points).

which function test statistic (chi-square) should I report, DWLS or Robust?

Robust

Terrence D. Jorgensen

Postdoctoral Researcher, Methods and Statistics

Research Institute for Child Development and Education, the University of Amsterdam

UvA web page: http://www.uva.nl/profile/t.d.jorgensen

Yang Yang

unread,

Feb 22, 2018, 5:28:15 PM2/22/18

to lav...@googlegroups.com, tjorge...@gmail.com

Appreciate, Terrence! Your explanations help a lot.

I have one more question regarding the issue of sample size. I have data from 20 countries (coded as A to T) whose N range from 262 to 3,578. Given the rationale that 92 indicators need super large samples to get appropriate results, I wonder why some country samples can get results, but some cannot.

I attached a table below to clarify my question. Countries are listed in a descending order of N., For example, Country H has 380 respondents, but Country I have a smaller sample size (N = 369). The issue is that I got fit indices for Country I, but not for Country H. Anything wrong with this? And how can I fix this problem?

Thanks in advance!!!

Yang

Country	N	RMSEA	SRMR	CFI	TLI	GFI
A	3578	0.043	0.044	0.932	0.931	1.000
B	2831	0.044	0.046	0.923	0.921	1.000
C	939	0.044	0.052	0.931	0.930	1.000
D	776
E	634	0.048	0.060	0.929	0.927	0.998
F	389	0.047	0.066	0.940	0.939	1.000
G	383	0.040	0.062	0.943	0.942	1.000
H	380
I	369	0.042	0.065	0.899	0.896	1.000
J	362
K	361	0.051	0.070	0.887	0.884	1.000
L	360	0.054	0.074	0.773	0.767	0.999
M	354
N	350
O	322
P	309	0.045	0.069	0.941	0.939	1.000
Q	307	0.051	0.074	0.860	0.856	1.000
R	305	0.043	0.069	0.867	0.864	0.999
S	294	0.044	0.070	0.909	0.907	0.999
T	262

--
You received this message because you are subscribed to a topic in the Google Groups "lavaan" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lavaan/Ns-zMnifkDU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lavaan+unsubscribe@googlegroups.com.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at https://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward