WLSMV estimator: are results reliable when number of observations is too small to compute Gamma?

309 views
Skip to first unread message

Łukasz Deryło

unread,
Jun 14, 2019, 7:19:00 AM6/14/19
to lavaan

I run CFA (confirmatory factor analysis) with WLSMV estimator (since my data are ordinal) in lavaanand I get the following warning message:

number of observations (190) too small to compute Gamma

Is this a problem with Gamma only and the rest is computed correctly? Can I proceed with results obtained with this warning? E.g. interpret estimates, p-values and fit indices in usual way?

Or maybe this affects somehow (how?) credibility of a whole CFA?

Terrence Jorgensen

unread,
Jun 17, 2019, 5:03:27 PM6/17/19
to lavaan

Can I proceed with results obtained with this warning?


"Can" and "should" are 2 different things.  This estimator, and the corresponding robust corrections, require much larger N to stabilize and yield unbiased estimates/SEs and nominal error rates.

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Pavneet Kaur

unread,
Jun 24, 2019, 11:36:37 AM6/24/19
to lavaan
I am also working on CFA with ordinal variables from a scale of 0-3. I faced the same problem as Łukasz Deryło because my sample size is  199 (total is 227 but the rest cases have missing data).
Can anyone please suggest me how can I address the concerns of the warning: 
                   In lav_samplestats_from_data(lavdata = lavdata, missing = lavoptions$missing,  :
                  lavaan WARNING: number of observations (199) too small to compute Gamma
Thanks in advance.

Terrence Jorgensen

unread,
Jun 25, 2019, 1:29:14 AM6/25/19
to lavaan
I am also working on CFA with ordinal variables from a scale of 0-3. I faced the same problem as Łukasz Deryło because my sample size is  199 

You can't escape the need for more data.  SEM involves complex multivariate systems, and relies heavily on asymptotic theory (what should happen as N approaches infinity, not what does happen in finite samples).  Modeling 2nd-order moments (covariance matrices) already requires N > 120 for estimation to stabilize even for smallish models in the best case scenario (multivariate normality), and for the test statistic's sampling distribution to be approximately chi-squared.  Robust corrections for continuous data rely on even higher-order moments (4th order, i.e., multivariate kurtosis), which requires even larger N to stabilize.  But the robust procedure in WLSMV is adjusting for multistage estimation (first thresholds, then polychoric correlations, then fitting your model to those), which involve even more assumptions about latent variables that underlie each observed discrete indicators, so even more data is needed for that process to stabilize.  

I thought Gamma was only necessary for calculating the robust chi-squared statistic.  Do you still get estimates and SEs?

Pavneet Bharaj

unread,
Jun 25, 2019, 9:49:40 AM6/25/19
to lav...@googlegroups.com
Thanks a lot Dr. Terrence.
Yeah I am getting all the estimates even after the warning message. 
Please see the attached:


lavaan 0.6-3 ended normally after 58 iterations

  Optimization method                           NLMINB
  Number of free parameters                         50

                                                  Used       Total
  Number of observations                           199         214

  Estimator                                       DWLS      Robust
  Model Fit Test Statistic                     365.028     376.625
  Degrees of freedom                               203         203
  P-value (Chi-square)                           0.000       0.000
  Scaling correction factor                                  1.270
  Shift parameter                                           89.145


--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at https://groups.google.com/group/lavaan.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/c22f2ce3-b977-4127-9d82-7b6c31c4087b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Pavneet Kaur Bharaj
Doctoral Student
Indiana University Bloomington

Terrence Jorgensen

unread,
Jun 25, 2019, 10:52:48 AM6/25/19
to lavaan
Oh, you even get the robust test.  I must have misunderstood what Gamma is used for.

Pavneet Bharaj

unread,
Jun 25, 2019, 10:58:43 AM6/25/19
to lav...@googlegroups.com
My model is specified as (for which I shared the output above):
model<-'Exper=~NA*e1+e2+e3+e4+ e5 
        Belief1=~NA*B1.1+B1.2
        Belief2=~NA*B2.1+B2.2+B2.3+B3.1+B3.2+B4.1+B4.2
        Belief3=~NA*B5.1+B5.2+B5.3+B6.1+B6.2+B6.3+B7.1+B7.2
Belief2~~1*Belief2
Belief1~~1*Belief1
Belief3~~1*Belief3
Exper~~1*Exper
          Belief1~g11*Exper
          Belief2~b21*Belief1
          Belief2~g21*Exper
          Belief3~b31*Belief1
          Belief3~b32*Belief2
          Belief3~g31*Exper
a:=g11*b31
b:=g21*b32
c:=g11*b21*b32'
fit_model<-cfa(model, data=mydata, estimator="wlsmv")
summary(fit_model, fit.measures=TRUE, standardized=T)

Also, I want to know about what difference does "ordered" command do in the analysis as my fit indices were quite different when I used it as 
fit_model<-cfa(model, data=mydata, estimator="wlsmv",
               ordered=c("B1.1","B1.2","B2.1","B2.2","B2.3","B3.1",
                         "B3.2","B4.1","B4.2","B5.1","B5.2","B5.3","B6.1",
                         "B6.2","B6.3","B7.1","B7.2","e1","e2","e3","e4","e5"))

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at https://groups.google.com/group/lavaan.

For more options, visit https://groups.google.com/d/optout.

Terrence Jorgensen

unread,
Jun 25, 2019, 11:32:10 AM6/25/19
to lavaan
what difference does "ordered" command do in the analysis

Without it, lavaan will treat the numeric values as though the numbers are meaningful when placed on a number line (i.e., interval-level data).  Declaring outcomes as "ordered" means the numbers are treated as ordinal categories, so lavaan assumes there is a normally distributed latent item-response underlying each ordered indicator, and that your model is hypothesizing relationships among those latent item-responses.

Here is a great teaching article about how to interpret SEMs with categorical outcomes, although it is about growth factors rather than common factors.

 
my fit indices were quite different when I used it as 

Pavneet Bharaj

unread,
Jun 25, 2019, 11:34:03 AM6/25/19
to lav...@googlegroups.com
Thanks a lot Dr. Terrence. That really helps in clarifying my doubts.


--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at https://groups.google.com/group/lavaan.

For more options, visit https://groups.google.com/d/optout.

Łukasz Deryło

unread,
Jun 27, 2019, 5:39:11 AM6/27/19
to lavaan
I wonder, what I should do now: discussion changed it's merit (thanks Mr Kaur!) and ended up with answer for question that was asked somewhere in the middle (question about "ordered" parameter) while main question (the one about Gamma) is still unanswered. Should I post it again?


W dniu wtorek, 25 czerwca 2019 17:34:03 UTC+2 użytkownik Pavneet Kaur napisał:
Thanks a lot Dr. Terrence. That really helps in clarifying my doubts.


On Tue, Jun 25, 2019 at 11:32 AM Terrence Jorgensen <tjorge...@gmail.com> wrote:
what difference does "ordered" command do in the analysis

Without it, lavaan will treat the numeric values as though the numbers are meaningful when placed on a number line (i.e., interval-level data).  Declaring outcomes as "ordered" means the numbers are treated as ordinal categories, so lavaan assumes there is a normally distributed latent item-response underlying each ordered indicator, and that your model is hypothesizing relationships among those latent item-responses.

Here is a great teaching article about how to interpret SEMs with categorical outcomes, although it is about growth factors rather than common factors.

 
my fit indices were quite different when I used it as 

Yes, that is expected.  Your are modeling different data.



Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lav...@googlegroups.com.

To post to this group, send email to lav...@googlegroups.com.
Visit this group at https://groups.google.com/group/lavaan.
To view this discussion on the web visit https://groups.google.com/d/msgid/lavaan/8d30a9db-3181-4ec9-9de3-82255cc56317%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages