anova with cfa

1,915 views
Skip to first unread message

kathleen...@psy.kuleuven.be

unread,
May 12, 2013, 2:18:22 PM5/12/13
to lav...@googlegroups.com

Dear all,

Is it possible to compare two models that are based on the same dataset, but do not both make use of all variables?

I’m trying to compare the following two cfa models:

mod1 <- 'motion  =~ t6 + t9  + t11 +t12
         color   =~ t5 + t10 + t16'
fit1 <- cfa(mod1, data=fa_subtests_data,ordered=c("t5","t6","t9","t10","t11","t12","t16") )

mod2 <- 'grouping      =~ t2 + t3 + t8 + t9 + t11 + t12
         segmentation  =~ t5 + t6 + t7 + t10 + t16
         shape         =~ t1 + t4+ t14'
fit2 <- cfa(mod2, data=fa_subtests_data,ordered=c("t1","t2", "t3", "t4","t5","t6","t7","t8","t9","t10","t11","t12","t14","t16") )

So model 1 does only include a subset of the variables that are in model 2. anova(fit1,fit2) gives me:

Error in t(Delta1[[g]]) %*% WLS.V[[g]] : non-conformable arguments

Kind regards,

Kathleen

yrosseel

unread,
May 12, 2013, 2:22:11 PM5/12/13
to lav...@googlegroups.com
On 05/12/2013 08:18 PM, kathleen...@psy.kuleuven.be wrote:
> Dear all,
>
> Is it possible to compare two models that are based on the same dataset,
> but do not both make use of all variables?

No. The models are not nested. You could use the AIC/BIC scores.

> So model 1 does only include a subset of the variables that are in model
> 2. anova(fit1,fit2) gives me:
>
> Error in t(Delta1[[g]]) %*% WLS.V[[g]] : non-conformable arguments

Surely, a more elegant error message is needed here...

Yves.

kathleen...@psy.kuleuven.be

unread,
May 12, 2013, 2:32:15 PM5/12/13
to lav...@googlegroups.com
Thank you. Apparently, log likelihood is not available for the WLSMW estimator that is used with ordered variables.
 
> AIC(fit1)
Error in ll(object) : 
  lavaan ERROR: logLik only available if estimator is ML
 
Is there another way to compare the models?
 
Regards,
Kathleen

Jarrett Byrnes

unread,
May 12, 2013, 5:24:36 PM5/12/13
to lav...@googlegroups.com
With different variables, wouldn't you not be able to use AIC/BIC either? You're fitting the models using fundamentally different covariance matrices, which should make their likelihoods not compatible, yes?

You can use different variables, but you need to include the variables you are not using in the covariance matrix as exogenous variables that have no paths to the endogenous variables. I typically do this by having a 0*variable in one part of my model, just to make sure it is in there. Or is there an easier way, Yves?
> --
> You received this message because you are subscribed to the Google Groups "lavaan" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
> To post to this group, send email to lav...@googlegroups.com.
> Visit this group at http://groups.google.com/group/lavaan?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

yrosseel

unread,
May 13, 2013, 3:39:49 AM5/13/13
to lav...@googlegroups.com
On 05/12/2013 11:24 PM, Jarrett Byrnes wrote:
> With different variables, wouldn't you not be able to use AIC/BIC
> either? You're fitting the models using fundamentally different
> covariance matrices, which should make their likelihoods not
> compatible, yes?

And in addition, for WLSMV (or any *LS* based estimator for that
matter), there is no likelihood, so no AIC/BIC.

> You can use different variables, but you need to include the
> variables you are not using in the covariance matrix as exogenous
> variables that have no paths to the endogenous variables. I
> typically do this by having a 0*variable in one part of my model,
> just to make sure it is in there. Or is there an easier way, Yves?

Don't think so...

Yves.

James Patrick Cronin

unread,
May 24, 2013, 2:04:25 PM5/24/13
to lav...@googlegroups.com
Hi Kathleen, Yves, and Jarrett,

Unless I misunderstood the question, I have to disagree with Jarrett.

If the CFAs are nested (e.g., CFA1 uses all the variables and CFA 2 uses a subset of those), you can compare your CFAs using X2 tests, AIC, and BIC.

If the models are not nested (e.g., CFA1 uses variables 1-6 and CFA 2 uses variables 4-10), which I think is similar to your situation, you should not use X2 tests (they are only appropriate for nested model comparisons), but you can use AIC/BIC. AIC/BIC are appropriate for both nested and non-nested situations (e.g., see Burnham et al. 2011. Behav Ecol Sociobiol (2011) 65:23–35).

In both situations, you can also compare the models with the more descriptive statistics, like CLI and TLI.

Best,
James

Terrence Jorgensen

unread,
May 29, 2013, 11:13:19 PM5/29/13
to lav...@googlegroups.com
James,

What you are saying is true for parameter nesting, but only applies when the parameters estimate relationships among the same set of variables.  I think you may have confused parameter nesting with variables nesting (although I have never heard of a mention of variable nesting in practice because models that explain different data are typically not compared).  As an extreme example, consider a situation in which we fit a saturated model to 3 variables, and fit another saturated model to 2 of those 3 variables.  The log-likelihood differs between the two saturated models, as do the AIC and BIC. 

> HS.model <- ' x1 ~~ x2 + x3 ; x2 ~~ x3'
> fit123 <- cfa(HS.model, data=HolzingerSwineford1939)
> HS.model <- ' x1 ~~ x2'
> fit12 <- cfa(HS.model, data=HolzingerSwineford1939)
> sapply(list(X.1.2.and.3 = fit123, X.1.and.2 = fit12), function(x) c(AIC = AIC(x), BIC = BIC(x)))
    X.1.2.and.3 X.1.and.2
AIC    2725.955  1876.066
BIC    2748.197  1887.188


The question is:  What sense would it make to say that the saturated model fit to variables {x1, x2} fits better than the saturated model fit to variables {x1, x2, x3}, when both models fit perfectly?  I think this might conceptually similar to other situations in which you would not compare results that do not share a common basis for comparison.  For example, in a 2-way ANOVA with gender and treatment as predictors, you would not compare the male treatment group to the female control group because you have no way of knowing whether any differences are due to gender or treatment.  Or if you were comparing the proportion of people who would vote Republican or Democrat, but you use a different denominator for each proportion:
  • P_Republicans =  N_Republicans / (N_Republicans + N_Democrats)
  • P_Democrats=  N_Democrats/ (N_Republicans + N_Democrats + N_Independents)
Terry

Dr. Hans Hansen

unread,
May 12, 2014, 8:37:18 AM5/12/14
to lav...@googlegroups.com, redb...@arrr.net
Hi Jarrett,

when i follow your advice on the HolzingerSwineford Data, i get better fit for the complete model. Shouldn't have the model where i drop one variable produce better fit? Or did i misunderstand your suggestion? The model syntax looks to me as the x9-loading is fixed to 0, but i really want it just somewhat free.

Thanks!

HS.model <- ' visual  =~ x1 + x2 + x3
               textual =~ x4 + x5 + x6
               speed   =~ x7 + x8 + x9 '

HS.model2 <- ' visual  =~ x1 + x2 + x3
               textual =~ x4 + x5 + x6
               speed   =~ x7 + x8 + 0*x9 '

fit1 = cfa(HS.model, data=HolzingerSwineford1939)
fit2 = cfa(HS.model2, data=HolzingerSwineford1939)

anova(fit1, fit2)

Jarrett Byrnes

unread,
May 12, 2014, 10:18:31 AM5/12/14
to lav...@googlegroups.com, redb...@arrr.net
So, here's the result:
Chi Square Difference Test

     Df    AIC    BIC   Chisq Chisq diff Df diff Pr(>Chisq)    
fit1 24 7517.5 7595.3  85.305                                  
fit2 25 7603.6 7677.8 173.434     88.128       1  < 2.2e-16 ***

So, when you fix the x9 path to 0, you gain 1DF, but you really change the Chisq. Hence it does not reproduce the same covariance matrix as when it is unconstrained. Hence we would reject it.

One way to see this is with

residuals(fit1)$cov - residuals(fit2)$cov

Then look at the line for x9.

But, yes, you are fixing the parameter to 0. I'm not sure what you mean by 'somewhat free'

Dr. Hans Hansen

unread,
May 12, 2014, 10:22:52 AM5/12/14
to lav...@googlegroups.com, redb...@arrr.net
Thanks for looking into that. I want that the parameter x9 doesn't add anything in the smaller model. i ultimately want to know whether adding x9 is favourable or not necessary.

Jarrett Byrnes

unread,
May 12, 2014, 10:27:27 AM5/12/14
to lav...@googlegroups.com
These results suggest that it is important to include it in the model.


Now, the LR test for fit1 still has a p value <0.001, so, you might want to look into that, but that is another matter.

On May 12, 2014, at 10:22 AM, Dr. Hans Hansen <bea...@gmx.de> wrote:

Thanks for looking into that. I want that the parameter x9 doesn't add anything in the smaller model. i ultimately want to know whether adding x9 is favourable or not necessary.

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at http://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.

Dr. Hans Hansen

unread,
May 12, 2014, 10:50:34 AM5/12/14
to lav...@googlegroups.com
Hm, the problem is:

i have some models in the literature, and some of the authors dropped some of the variables to achieve better fit. so, they don't fix the variable to 0, but just don't use it anymore. unfortunately, they usually look only at cfi and rmsea and not the chi-sq-difference test. now, when i want to test, whether these models are more appropriate than the full model (in my data), i would need some kind of model syntax that makes the cor/cov of x9 to all other variables whatever and then, its residual should be 0 (because it's not determined in the model and can be whatever it wants).

however, when i look at the first and second post, than dropping one variable doesn't result in a nested model, and i cannot test this way, right?

yrosseel

unread,
May 16, 2014, 3:44:08 AM5/16/14
to lav...@googlegroups.com
On 05/12/2014 04:50 PM, Dr. Hans Hansen wrote:
> however, when i look at the first and second post, than dropping one
> variable doesn't result in a nested model, and i cannot test this way,
> right?

I'm not entirely sure what you are looking for, but if you want to
'remove' a variable from a model, without actually removing it from the
set of variables, then you can use something like this:

# 'remove variable x9'
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8

x9 ~~ x9
x9 ~~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8
'
fit <- cfa(HS.model, data=HolzingerSwineford1939)
fit

This should give the same fit (and warnings) as

HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 '

fit <- cfa(HS.model, data=HolzingerSwineford1939)
fit


Yves.

Edward Rigdon

unread,
May 16, 2014, 6:13:49 AM5/16/14
to lav...@googlegroups.com
Unless perhaps if the variable in question is substantially nonnormal and you use a normality-assuming estimation method? Then maybe deleting the variable might yield a different fit.
--Ed Rigdon

Sent from my iPad
Reply all
Reply to author
Forward
0 new messages