lavaan error: number of observations too small to compute Gamma in MGCFA using WLSMV

1,091 views
Skip to first unread message

Natalija Plasonja

unread,
Sep 17, 2019, 5:23:11 AM9/17/19
to lavaan
Hello,

I am new to lavaan and I'd really appreciate your guidance on the following issue.

I want to do a multigroup confirmatory factor analysis on a scale containing 3 factors and 16 items on a 5-point Likert scale.
As suggested, I used the WLSMV estimator because of the categorical data. What I want to do is test invariance of the scale across three different gropus (depending on the age of the patients) :
- young adults (= 83)
- adults (= 1281)
- eldery persons (= 107)

When it comes to my data, they are not normally distributed (Mardia coefficient: b2d= 519.104 / z=51.119 / p=0.00 ). There are no missing data and outliers. 

The model tested is the following:
model<-'
F1=~E1+E5+E9+E11+E14+E15+E18
F2=~E2+E4+E13+E17
F3=~E3+E7+E8+E12+E16
F1~~F2
F1~~F3
F2~~F3

Now, I know that this topic was already mentioned before, but I couldn't find the answers needed.
Whenever I want to do a MGCFA on young adults and eldery patients, I get the lavaan error message: 
In lav_samplestats_from_data(lavdata = lavdata, missing = lavoptions$missing,  :

  lavaan WARNING: number of observations (83) too small to compute Gamma


Therefore, I have a few questions:

1) What is Gamma and do I need it to test the scale's invariance depending on the age of the participants?
2) Can I use the MLR estimator instead (or any other estimator)? 
3) I know that one of the solutions is to increase the number of participants, the problem is I can't do that since I'm analysing the data from a project that is over now. What are my other options instead?

Thank you for your answers, 

- Natalija, a desperate PhD baby student






Terrence Jorgensen

unread,
Sep 17, 2019, 6:10:55 AM9/17/19
to lavaan
When it comes to my data, they are not normally distributed (Mardia coefficient: b2d= 519.104 / z=51.119 / p=0.00 ).

With N > 1400, you have a lot of power to detect minor deviations, so the test is not very informative on its own. 

Whenever I want to do a MGCFA on young adults and eldery patients, I get the lavaan error message: 
In lav_samplestats_from_data(lavdata = lavdata, missing = lavoptions$missing,  :

  lavaan WARNING: number of observations (83) too small to compute Gamma


Therefore, I have a few questions:

1) What is Gamma and do I need it to test the scale's invariance depending on the age of the participants?


As described on the ?lavInspect help page, Gamma is the asymptotic covariance matrix of the sample statistics.  A parameter's point estimate is accompanied by a SE, which is an estimate of its sampling variability across repeated samples from the same population.  Multiple parameters not only vary but covary (i.e., correlate across samples), leading to an estimated sampling-covariance matrix.  Gamma is needed for some calculations (I forget which), but Gamma can only be calculated if you have enough information from your data.  With 16 items on 5-point scales, you have 16*15/2 polychoric correlations + 16*4 thresholds = 184 sample statistics in each group, which is way larger than your 2 small Ns. 

If you still get test statistics from your models and can use lavTestLRT() to compare models, then I'm guessing the Gamma is not needed for those calculations.


2) Can I use the MLR estimator instead (or any other estimator)? 


A barplot() of each variable would show you how asymmetric the distributions are.  If each variable is approximately symmetric and each response category has a large enough N, then treating these as continuous with a robust ML estimator would give you approximately unbiased results about your factor loadings/correlations, as well as test statistics with approximately nominal Type I error rates.


Unfortunately, I don't know of any studies about treating ordinal data as continuous in the context of testing invariance, especially with a major imbalance in sample sizes.  ML might avoid the issue with Gamma, but you would still have smaller Ns in those 2 groups that the number of sample statistics: 16*19/2 = 152 sample stats.

 

3) I know that one of the solutions is to increase the number of participants, the problem is I can't do that since I'm analysing the data from a project that is over now. What are my other options instead?


I think you could try both estimators (robust ML and robust DWLS).   If they both lead to the same conclusions, that adds a little extra confidence.  And if invariance constraints hold (i.e., null hypotheses are not rejected), then that stabilizes the estimates because there are fewer of them and they are using information from all groups instead of one.  But the big problem is still the small Ns: with so little information about those populations, you have very little power to detect truly meaningful violations of invariance (unless the violations are huge).  So Type II error rates will probably be large.

If the models fit well in each sample and you have simple structure (no cross-loadings or correlated residuals across factors), you could test invariance for each factor separately to decrease the number of estimated parameters in your models (and the number of sample stats, which might resolve the Gamma issue).

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Natalija Plasonja

unread,
Sep 17, 2019, 10:50:23 AM9/17/19
to lavaan
Thank you so much for you clear explanationsTerrence Jorgensen! Testing each factor separately is an excellent idea !

I forgot to mention that, despite the number of observations beign too small to compute Gamma, lavaan gave me fit indices that were pretty good. But, when comparing the models between them, the invariance constraints stopped applying to the models at one point. Are those fit indices reliable event with this Gamma problem?

Thanks!

Natalija Plasonja

unread,
Sep 18, 2019, 6:25:06 AM9/18/19
to lavaan
In a book by Beaujean, A. (2014) Latent variance Modelin Using R. A step-by-Step Guide, I found the following explanation:
"If a categorical data is coded numerically, R assumes it is a continuous variable, unless told differently" (p.12).

I assume then that if your Likert scale points are coded numerically, you should use estimators for the continuous data?

Either way, I'll try both estimators, but I'm a little bit confused with the nature of the data depending on different statistical programs.


On Tuesday, September 17, 2019 at 12:10:55 PM UTC+2, Terrence Jorgensen wrote:

Terrence Jorgensen

unread,
Sep 18, 2019, 4:29:22 PM9/18/19
to lavaan
I assume then that if your Likert scale points are coded numerically, you should use estimators for the continuous data?

No, it means lavaan has no way of knowing the numbers represent ordinal categories, unless you tell it that the variables are ordinal using the ordered= argument:


when comparing the models between them, the invariance constraints stopped applying to the models at one point

I don't understand what you mean.
Reply all
Reply to author
Forward
0 new messages