semTools permuteMeasEq Question

Allie Choate

unread,

Apr 15, 2019, 3:10:05 PM4/15/19

to lavaan

Hi all,

This is a simple and probably dumb question, but figured I'd ask...

I'm using the 'permuteMeasEq' function in semTools with 1000 permutations and 1500 people within the MIMIC framework (one covariate, 40 variables).

My code has been running for several hours now on a MacBook Pro, and while it says it's running... the progress bar (showProgress=TRUE) for the analysis still says 0%?

I know the more observations, variables, and permutations you have, the longer it will take to run...I just wanted to make sure that having '0%' in the console is normal despite the code running for over four hours now. In the past, when I accidentally mis-specified something, it would still take an hour or so to run just to return an error, so ideally I am trying to avoid wasting time running this for it to eventually terminate from some error of model mis-specification.

Thanks in advance!

Terrence Jorgensen

unread,

Apr 16, 2019, 5:36:02 AM4/16/19

to lavaan

My code has been running for several hours now on a MacBook Pro, and while it says it's running... the progress bar (showProgress=TRUE) for the analysis still says 0%?

Try running it with 2 permutations to see how long that takes.

I know the more observations, variables, and permutations you have, the longer it will take to run...I just wanted to make sure that having '0%' in the console is normal despite the code running for over four hours now. In the past, when I accidentally mis-specified something, it would still take an hour or so to run just to return an error, so ideally I am trying to avoid wasting time running this for it to eventually terminate from some error of model mis-specification.

Again, try it with fewer permutations and check if the fit statistics/indices come back missing, indicating possible errors during estimation.

FYI, I included the MIMIC functionality to facilitate investigation of its expected behavior, but I (nor anyone I know of) has yet checked that it behaves as expected (i.e., nominal Type I error rates), nor has it that particular functionality been used frequently enough to have the advantage of being as thoroughly debugged as the multigroup CFA functionality.

Terrence D. Jorgensen

Assistant Professor, Methods and Statistics

Research Institute for Child Development and Education, the University of Amsterdam

http://www.uva.nl/profile/t.d.jorgensen

Allie Choate

unread,

Apr 16, 2019, 9:58:26 AM4/16/19

to lavaan

Thank you, Dr. Jorgensen!

With a small number of permutations, the fit statistics come out okay. However, with a larger number of permutations it just takes quite some time. I let my computer run over 24 hours and it still was only at 3%.

In that case, it sounds like I should just do the permuteMeasEq function within the multi-group CFA framework. My goal is to detect uniform & non-uniform DIF...but I know the metric and scalar equivalence are fairly comparable anyways.

Thanks again for your help!

Terrence Jorgensen

unread,

Apr 22, 2019, 12:43:32 PM4/22/19

to lavaan

it sounds like I should just do the permuteMeasEq function within the multi-group CFA framework.

That won't take any less time. If anything, it might take a bit longer because more parameters are being estimated.

My goal is to detect uniform & non-uniform DIF...but I know the metric and scalar equivalence are fairly comparable anyways.

Yes, uniform DIF is a violation of scalar equivalence, and nonuniform DIF is a violation of metric equivalence. Unless you are in a position where you can expect inflated Type I error rates from chi-squared (difference) tests, such as small N, categorical data, especially with asymmetric thresholds (in which case, permutation would provide better control of errors: https://doi.org/10.1080/10705511.2017.1421467), then permutation of any fit measure would provide the same result as the chi-squared (difference) tests. In other words, if you are working with approximately normal data, then the only real benefit permutation provides is a test of configural invariance that is not confounded with model fit (and more powerful tests with better error control compared to using fixed rules of thumb, if you prefer focusing on fit indices).

nasseri...@yahoo.com

unread,

Apr 22, 2019, 2:53:09 PM4/22/19

to lavaan

Hi, Terrence,

As a follow-up question. I have a CFA with three groups. The indicators were collected with 7-point likert scales, which I treat as continuous. Data is moderately non-normally distributed. Actually I would use a Satorra-Bentler Chisquare difference test and additionally look at some delta AFI. I understand that the permutation test gives clearer results to interpret than arbitrary cut-off values. For the chisquare difference test I now have at least three options: Satorra-Bentler, chisquare (ML) difference by permutation test or robust chisquare (MLR) by permutation test. Is any variant clearly preferable and, if so, for what reason?

Allie Choate

unread,

Apr 22, 2019, 2:55:15 PM4/22/19

to lav...@googlegroups.com

I had really good luck with the permuteMeasEq function within the MG-CFA framework actually! It ran within several hours versus the MIMIC approach, which was taking days to run and still not finishing.

My goal was to essentially determine which items in my scale were acting differently based on gender, and because this particular scale has quite a few items, I liked that the permutation method also adjusted for Type I errors. My sample is around 2000 people, so I was evaluating measurement invariance based on change in CFI/RMSEA, rather than the results of a LRT, as my understanding was larger samples can lead to over-rejection of measurement invariance tests when based solely on change in chi-square.

--
You received this message because you are subscribed to a topic in the Google Groups "lavaan" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lavaan/l3wgZT838Lg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lavaan+un...@googlegroups.com.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at https://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.

--

Terrence Jorgensen

unread,

Apr 23, 2019, 3:11:38 AM4/23/19

to lavaan

It ran within several hours versus the MIMIC approach, which was taking days to run and still not finishing.

That's surprising, but I'm glad to hear it. I'll keep that in mind to investigate whenever I have time to do a simulation study with the MIMIC approach.

I liked that the permutation method also adjusted for Type I errors

It does, but the simulation study in the Psych Methods paper showed that the same level of control is achievable without permutation, simply by adjusting the p values associated with the lavScoreTest() output. It can get a bit tedious, so I hope one day to generalize the function I wrote within permuteMeasEq() to work more generally with invariance models, including it in semTools to work alongside (or instead of) the partialInvariance() function.

My sample is around 2000 people, so I was evaluating measurement invariance based on change in CFI/RMSEA, rather than the results of a LRT, as my understanding was larger samples can lead to over-rejection of measurement invariance tests when based solely on change in chi-square.

That is a common misperception, resulting from careless language used in the past. The LRT is only biased in small samples, leading to inflated Type I errors with small (not large) samples because it is based on asymptotic assumptions. Like any statistic, it has greater power as N increases; these are not invalid rejections of H0, they are valid rejections. The issue instead is that a test of H0 is not very informative when you have sufficient power to detect trivially small effect sizes. Changes in fit indices are not meaningful effect sizes because that is not what they were developed to do. There have been some recent proposals to define effect sizes specifically for measurement invariance:

https://www.jstor.org/stable/24573192 (uses MIMIC approach, but applies just as well to MG-CFA)

http://dx.doi.org/10.1037/met0000075 (applies to categorical indicators)

Find additional proposed effect sizes in Chapter 23 of Hoyle's (2012) Handbook of SEM.

It has also been proposed to calculate CFI using Rigdon's equal-correlations model as the baseline model, which I consider a vast improvement over calculating changes in CFIs calculated using the default zero-correlations baseline model, because it is specifically designed to evaluate invariance hypotheses: http://dx.doi.org/10.1080/10705511.2014.935928

Here is a script I wrote that implements Lai & Yoon's (2015) proposal (also something I'd like to include in semTools one day, when I have the time to program it):

## baseline model
mod.base <- '
## loadings
  visual =~ c(L1, L1)*x1 + c(L2, L2)*x2 + c(L3, L3)*x3
  textual =~ c(L4, L4)*x4 + c(L5, L5)*x5 + c(L6, L6)*x6
  speed =~ c(L7, L7)*x7 + c(L8, L8)*x8 + c(L9, L9)*x9
## factor correlations
  visual ~~ c(Phi, Phi)*textual + c(Phi, Phi)*speed
  textual ~~ c(Phi, Phi)*speed
## intercepts
  x1 ~ c(T1, T1)*1
  x2 ~ c(T2, T2)*1
  x3 ~ c(T3, T3)*1
  x4 ~ c(T4, T4)*1
  x5 ~ c(T5, T5)*1
  x6 ~ c(T6, T6)*1
  x7 ~ c(T7, T7)*1
  x8 ~ c(T8, T8)*1
  x9 ~ c(T9, T9)*1
## residuals
  x1 ~~ c(D1, D1)*x1
  x2 ~~ c(D2, D2)*x2
  x3 ~~ c(D3, D3)*x3
  x4 ~~ c(D4, D4)*x4
  x5 ~~ c(D5, D5)*x5
  x6 ~~ c(D6, D6)*x6
  x7 ~~ c(D7, D7)*x7
  x8 ~~ c(D8, D8)*x8
  x9 ~~ c(D9, D9)*x9
## define nu
  nu1 := (L1^2) / D1
  nu2 := (L2^2) / D2
  nu3 := (L3^2) / D3
  nu4 := (L4^2) / D4
  nu5 := (L5^2) / D5
  nu6 := (L6^2) / D6
  nu7 := (L7^2) / D7
  nu8 := (L8^2) / D8
  nu9 := (L9^2) / D9
## constrain nu to a single estimate
  nu1 == nu2
  nu2 == nu3
  nu3 == nu4
  nu4 == nu5
  nu5 == nu6
  nu6 == nu7
  nu7 == nu8
  nu8 == nu9
'
fit.base <- cfa(mod.base, data = HolzingerSwineford1939, group = "school", std.lv = TRUE)
summary(fit.base)


## target model
HW.model <- '
visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
'
## fit model with varying degrees of measurement equivalence
fit.inv <- measurementInvariance(model = HW.model, data = HolzingerSwineford1939,
                                 group = "school", strict = TRUE)
summary(fit.inv) # names of models with the list

## test for nesting
net(null = fit.base, config = fit.inv$fit.configural,
    metric = fit.inv$fit.loadings, scalar = fit.inv$fit.intercepts,
    strict = fit.inv$fit.residuals, means = fit.inv$fit.means)

## calculate CFI for each level of measurement equivalence
(ncpEQ <- diff(fitMeasures(fit.base, c("df","chisq")))[[1]])
(ncpconfig <- diff(fitMeasures(fit.inv$fit.configural, c("df","chisq")))[[1]])
(ncpmetric <- diff(fitMeasures(fit.inv$fit.loadings, c("df","chisq")))[[1]])
(ncpscalar <- diff(fitMeasures(fit.inv$fit.intercepts, c("df","chisq")))[[1]])


(CFI.metric <- 1 - max(c(ncpmetric - ncpconfig, 0)) / max(c(ncpEQ - ncpconfig, ncpmetric - ncpconfig, 0)))
(CFI.scalar <- 1 - max(c(ncpscalar - ncpmetric, 0)) / max(c(ncpEQ - ncpmetric, ncpscalar - ncpmetric, 0)))
## Lai & Yoon must have made a typo on p. 242 (column 2), suggesting instead:
#(CFI.scalar <- 1 - max(c(ncpscalar - ncpmetric, 0)) / max(c(ncpEQ - ncpscalar, ncpscalar - ncpmetric, 0)))

Terrence Jorgensen

unread,

Apr 23, 2019, 3:32:36 AM4/23/19

to lavaan

For the chisquare difference test I now have at least three options: Satorra-Bentler, chisquare (ML) difference by permutation test or robust chisquare (MLR) by permutation test.

Actually, you have more choices. SB's (MLM) chi-squared and Yuan-Bentler's (MLR) chi-squared are asymptotically equivalent, so both should yield nominal error rates in large enough samples (how large depends on how big your model is). You can permute either of those as well as the naïve chi-squared because permutation means you do not have to assume any of them actually follow a chi-squared distribution (you use the empirical permutation distribution instead).

Is any variant clearly preferable and, if so, for what reason?

If your sample size is not large relative to the number of variables/parameters in your model, MLM/MLR's asymptotic properties might not have kicked in yet, so permutation could provide better error control if it happens to be a condition where the robust tests are not good chi-squared approximations. But I have never observed them being terribly inflated even when N < 100 (in small models at least), so it might not be that big of an advantage.

If you have a very large sample, I would only use permutation to test configural invariance, because permuting any fit measure will give you the same conclusion for testing equivalence of loadings, etc., without the computation intensity. Actually, permutation results might even become biased if the observed-variable means/variances are quite different across groups, violating the exchangeability assumption (I also haven't had time to conduct a simulation testing how robust it is). Nothing wrong with looking at both to see if they yield the same conclusion, but we don't necessarily have enough information to know which to trust if they disagree.

Regardless, if you have a significant result, I think quantifying effect size would be even more important than knowing whether the rejection of H0 was a Type I error. For ideas about effect size that might be important to you, see the references I posted in response to Allie's last post.

Allie Choate

unread,

Apr 25, 2019, 3:08:22 PM4/25/19

to lavaan

Ah, this is most helpful and good to know regarding the chi-square issue and how to properly calculate change in CFI in this context!

Thanks again for your help and for the paper recommendations as well!

Reply all

Reply to author

Forward