Multi-group Confirmatory Factor Analysis, with mumtiple groupings and mixed numeric-ordinal items

lemos...@gmail.com

unread,

Mar 13, 2018, 3:04:18 PM3/13/18

to lavaan

Dear Professor Yves Rosseel and lavaan community,

I’m working on measurement invariance of religiosity factors derived from the International Social Survey Programme (ISSP) Religion Cumulation dataset (2008 round) via multi-group CFA using lavaan::cfa function. The model with three factors and three items per factor, in which some items are treated as numeric and others as ordinal. I need to test the model across different groupings (or units of analysis) defined as factors, but for simplicity I will only mention “SEX” since this has only two groups. I have been reading the literature on MGCFA with ordinal data and struggling for months to make sure I’m doing the analysis correctly, but have doubts and questions about:

1) The imposition of the right constraints at all levels of invariance, and the identification conditions for the configural and metric invariance models, with mixed numeric-ordinal items;

2) Differences between the defaults for the invariance constraints with ordinal items used in lavaan::cfa, and mentioned in the literature (e.g. Millsap and Yun-Tein, Assessing Factorial Invariance in Ordered-Categorical Measures, Multivariate Behavioral Research, 39(3), 479-515, 2004);

3) Difference of outputs of inspect(<model>, what = “list”) and parTable(<model>), essentially identical, and fitted(<model>), when checking the equality of parameters across groups for different levels of invariance;

4) Handling convergence failures.

I would be extremely grateful for any clarification of my doubts, and correction in case I’m stating or doing something wrong.

SCRIPT

The data are stored in a “religion” data frame, in which the items are coded as numeric values (regardless of how they are treated) and the grouping variables (SEX,AGE,DEGREE,RELIGGRP,COUNTRY) are stored as factors. I wrapped a set of defaults into a function as follows:

# Defaults for WLSMV + categorical

group.cfa <- function(cfa.data, cfa.model, cfa.group,

cfa.ordered = NULL,

cfa.group.equal = "",

cfa.group.partial = "",

cfa.std.lv = FALSE,

cfa.estimator = "WLSMV",

cfa.test = "default",

cfa.se = "default",

cfa.parameterization = "theta",

cfa.bootstrap = 2000,

cfa.zero.add = "default",

cfa.zero.keep.margins = "default")

{

model <- cfa(data = cfa.data, model = cfa.model, group = cfa.group,

ordered = cfa.ordered,

group.equal = cfa.group.equal,

group.partial = cfa.group.partial,

std.lv = cfa.std.lv,

estimator = cfa.estimator,

test = cfa.test,

se = cfa.se,

parameterization = cfa.parameterization,

bootstrap = cfa.bootstrap,

zero.add = cfa.zero.add,

zero.keep.margins = cfa.zero.keep.margins)

return(model)

}

Here is the code for doing the analysis:

# DECLARE ORDINAL ITEMS; ALL OTHER ITEMS ARE TREATED AS NUMERIC

cfa.ordered.values <- c("V30","V31","V32","V33","V35","V37")

# SPECIFY THE MODEL; PICK MARKER ITEMS WITH THE LARGEST LOADING IN EACH FACTOR

model <- 'afterlife =~ NA*V30 + 1*V31 + NA*V32

belief.God =~ NA*V28 + 1*V35 + NA*V37

formation =~ V46 + V47 + V48

'

fit.indices <- c("chisq", "df", "pvalue", "cfi",

"rmsea", "rmsea.ci.lower", "rmsea.ci.upper",

"srmr", "pgfi", "pnfi")

# TEST FOR "SEX"

configural <- group.cfa(cfa.data = religion[-which(is.na(religion$SEX)),],

cfa.model = model,

cfa.ordered = cfa.ordered.values,

cfa.group = "SEX",

cfa.group.equal = "",

cfa.parameterization = "theta")

metric <- group.cfa(cfa.data = religion[-which(is.na(religion$SEX)),],

cfa.model = model,

cfa.ordered = cfa.ordered.values,

cfa.group = "SEX",

cfa.group.equal = c("loadings"),

cfa.parameterization = "theta")

scalar <- group.cfa(cfa.data = religion[-which(is.na(religion$SEX)),],

cfa.model = model,

cfa.ordered = cfa.ordered.values,

cfa.group = "SEX",

cfa.group.equal = c("loadings","thresholds"),

cfa.parameterization = "theta")

strict <- group.cfa(cfa.data = religion[-which(is.na(religion$SEX)),],

cfa.model = model,

cfa.ordered = cfa.ordered.values,

cfa.group = "SEX",

cfa.group.equal = c("loadings","thresholds","residuals"),

cfa.parameterization = "theta")

fitted.configural <- fitted(configural)

fitted.metric <- fitted(metric)

fitted.scalar <- fitted(scalar)

fitted.strict <- fitted(strict)

modelDiff <- compareFit(configural,metric,scalar,strict)@fit[fit.indices]

I used the following code to check which items are equal in the two groups for each model

# INSPECT EQUALITY OF LOADINGS, INTERCEPTS, THRESHOLDS

# (PICK ANY MODEL, HERE “SCALAR”)

tab <- inspect(scalar, what = "list", list.by.group = TRUE)

ngroup <- max(tab$group)

group.estimates <- list()

for (i in 1:ngroup) {

group.estimates[[i]] <- tab[which(tab$group == i),names(tab)[c(1:4,7:8,14:15)]]

}

# COMPARE GROUPS 1 and 2

group.estimates[[1]][which(abs(group.estimates[[1]]$est - group.estimates[[2]]$est) < 0.0001),]

QUESTIONS & DOUBTS – IDENTIFICATION AND INVARIANCE CONSTRAINTS

1. In the configural model, the parameters that are equal for the two groups are the loadings of the marker items (=1), the factor means (=0), the variances of the ordinal items (=1), and the intercepts of the ordinal items (=0). If I’m interpreting correctly, these are the minimum conditions for model identification. The first two are the usual ones for fixing the location and scale of the latent variables for models with continuous items; the last two are for fixing the location and scale of the latent response variables that together with the thresholds relate the observed probability structure of the ordinal items with the latent variables. Is this correct?

2. Millsap and Yun-Tein (2004) propose setting the following parameters equal across groups for ordinal items: two thresholds for marker ordinal items, and one threshold for the other ordinal items. If I’m interpreting correctly, setting two thresholds for the marker items fixes the location and scale of the latent response variables and, with the scale fixed on each factor, the other ordinal items only need the location fixed (one equal threshold is enough). Is this correct? If so, this is equivalent to the defaults in lavaan::cfa, but using the Millsap & Yun-Tein constraints:
a. would require setting up different models if more than one group is to be tested and groupings have different number of units;

b. could possibly interfere with the lavaan::cfa defaults (also, I don’t know what would happen with the Millsap & Yun-Tein constraints for items with different number of levels loading on the same factor).

3. I saw a post where group.equal = “means” was suggested for the configural and group.equal = c(“means”,”loadings”) for the metric models. I don’t know if this really needs to be imposed, or why.

4. In the metric invariance model all loadings are equal for all items (numeric and ordinal). The variances of the ordinal items (=1) and the intercepts of the ordinal items (=0) are also equal across groups. The thresholds are different, and therefore metric invariance does not have the same practical significance as for numeric items, because the loadings are related to the observed ordinal items via the thresholds. Is this correct?

5. In the scalar model all loadings, thresholds and intercepts, and in the strict model all loadings, thresholds, intercepts, and residual variances are equal across groups. I think the theta parameterization is working correctly because the scalar and strict models usually come out with different fit measures. As far as I could check the lavaan::cfa function forces the intercepts of numeric items to be equal across groups even though I did not specify “intercepts” in the “group.equal” argument.

6. The intercept for ordinal items is fixed to zero in all models. I think this is because changing the intercept would shift the thresholds, so it remains an identification constraint.

QUESTIONS & DOUBTS – OUTPUT OF inspect() vs fitted()

When I use inspect to check equality of parameters in the “est” column, I seem to get things right. For example, using the code snippet above the parameters that are equal across groups for the scalar model are:

id lhs op rhs group free est se

1 1 afterlife =~ V30 1 1 0.443 0.014

2 2 afterlife =~ V31 1 0 1.000 0.000

3 3 afterlife =~ V32 1 2 0.554 0.017

4 4 belief.importance.God =~ V28 1 3 0.627 0.014

5 5 belief.importance.God =~ V35 1 0 1.000 0.000

6 6 belief.importance.God =~ V37 1 4 0.621 0.015

7 7 formation =~ V46 1 0 1.000 0.000

8 8 formation =~ V47 1 5 0.956 0.013

9 9 formation =~ V48 1 6 0.985 0.013

10 10 V30 | t1 1 7 -1.532 0.021

11 11 V30 | t2 1 8 -0.475 0.017

12 12 V30 | t3 1 9 0.840 0.018

13 13 V31 | t1 1 10 -2.665 0.073

14 14 V31 | t2 1 11 -0.725 0.040

15 15 V31 | t3 1 12 1.708 0.048

16 16 V32 | t1 1 13 -1.081 0.021

17 17 V32 | t2 1 14 0.267 0.020

18 18 V32 | t3 1 15 1.528 0.024

19 19 V35 | t1 1 16 -2.535 0.049

20 20 V35 | t2 1 17 -1.156 0.031

21 21 V35 | t3 1 18 -0.013 0.025

22 22 V35 | t4 1 19 1.846 0.037

23 23 V37 | t1 1 20 -1.205 0.019

24 24 V37 | t2 1 21 0.032 0.017

25 25 V37 | t3 1 22 0.967 0.019

26 26 V37 | t4 1 23 2.158 0.026

47 47 V30 ~1 1 0 0.000 0.000

48 48 V31 ~1 1 0 0.000 0.000

49 49 V32 ~1 1 0 0.000 0.000

50 50 V28 ~1 1 34 4.308 0.018

51 51 V35 ~1 1 0 0.000 0.000

52 52 V37 ~1 1 0 0.000 0.000

53 53 V46 ~1 1 35 5.184 0.023

54 54 V47 ~1 1 36 4.431 0.023

55 55 V48 ~1 1 37 5.390 0.025

We see that the thresholds are equal (checking in the full output also). If I now use fitted(scalar), I get different thresholds for the two groups:

$Female$th

V30|t1 V30|t2 V30|t3 V31|t1 V31|t2 V31|t3 V32|t1 V32|t2 V32|t3 V35|t1 V35|t2 V35|t3 V35|t4 V37|t1 V37|t2 V37|t3 V37|t4

-0.858 -0.266 0.470 -0.764 -0.208 0.490 -0.514 0.127 0.726 -0.994 -0.453 -0.005 0.724 -0.682 0.018 0.547 1.221

…

$Male$th

V30|t1 V30|t2 V30|t3 V31|t1 V31|t2 V31|t3 V32|t1 V32|t2 V32|t3 V35|t1 V35|t2 V35|t3 V35|t4 V37|t1 V37|t2 V37|t3 V37|t4

-0.533 0.023 0.714 -0.388 0.115 0.747 -0.194 0.410 0.975 -0.666 -0.155 0.269 0.959 -0.399 0.263 0.762 1.399

What is happening?

Also, I would be grateful if you could tell how to obtain the factor means and factor variance-covariance matrix for comparing group factor structures.

CONVERGENCE FAILURES

For more complicated groupings with many units, particularly countries, I’m often stuck with convergence failures. In some cases, I could improve the fit indices slightly by e.g. allowing two items within the same factor to correlate (so as to keep the model congeneric and avoid “factor variance contamination”; this requires adding “residual.covariances” to the group.equal options for the strict model), but this does not solve (and in some cases aggravates) the convergence problems. The only thing that seems to work is to delete units with the largest Chi-square (inspected using summary()), which of course restricts the validity of the results. It once happened that I could get the configural and strict models, but not the metric and scalar. I would be very grateful if you could give me some advice on how to mitigate this problem.

Thank you for all your attention to this post.

Sincerely,

Carlos M. Lemos

Terrence Jorgensen

unread,

Mar 15, 2018, 6:16:04 AM3/15/18

to lavaan

Welcome! This is quite a book you wrote. While it is much appreciated to provide sufficient information to describe your problem, it is difficult to provide help when there are so many issues you bring up, sometimes without even a clear question.

1) The imposition of the right constraints at all levels of invariance, and the identification conditions for the configural and metric invariance models, with mixed numeric-ordinal items;

2) Differences between the defaults for the invariance constraints with ordinal items used in lavaan::cfa, and mentioned in the literature (e.g. Millsap and Yun-Tein, Assessing Factorial Invariance in Ordered-Categorical Measures, Multivariate Behavioral Research, 39(3), 479-515, 2004);

Millsap & Tein's (2004) suggestions are outdated and based on a limited investigation of the topic. Wu & Estabrook (2016) have released a more comprehensive discussion of the topic, and provide somewhat different advice.

https://link.springer.com/article/10.1007/s11336-016-9506-0

Quick summary: after your configural model, they advise adding equality constraints (and releasing identification constraints) in the following order:

thresholds
loadings
intercepts
residual variances

You can find details and justifications in the article.

3) Difference of outputs of inspect(<model>, what = “list”) and parTable(<model>), essentially identical, and fitted(<model>), when checking the equality of parameters across groups for different levels of invariance;

There are half-a-dozen ways to view your estimates. fitted() is not one of them: that is for obtaining model-implied moments. See the help page:

class?lavaan

Can you state clearly and simply what you would like to view, so we can provide a helpful suggestion?

4) Handling convergence failures.

Millsap & Tein's specification often leads to convergence problems in lavaan, even when an equivalent configural model (the default when setting parameterization = "delta" or "theta") converges fine. Yves has not been able to figure out the problem, but I've seen in my own simulations that it definitely happens, and I have not been able to find out what it is about the data that "cause" the problem.

1. In the configural model, the parameters that are equal for the two groups are the loadings of the marker items (=1), the factor means (=0), the variances of the ordinal items (=1), and the intercepts of the ordinal items (=0). If I’m interpreting correctly, these are the minimum conditions for model identification. The first two are the usual ones for fixing the location and scale of the latent variables for models with continuous items; the last two are for fixing the location and scale of the latent response variables that together with the thresholds relate the observed probability structure of the ordinal items with the latent variables. Is this correct?

I would not choose a marker variable unless you are already certain it is invariant. The default configural model that lavaan uses is a fine starting point. parameterization = "theta" is only necessary if you are interested in testing strict invariance (which it appears you are. See the paper I linked to (same note for questions 2-6).

using the code snippet above the parameters that are equal across groups for the scalar model are:

You only posted part of the parameter table, so I can't verify what you are seeing. Also, it might be easier for you to verify in the summary() output because it includes (identical) labels for equality-constrained parameters.

If I now use fitted(scalar), I get different thresholds for the two groups:

Because fitted() does not return estimated parameters.

how to obtain the factor means and factor variance-covariance matrix for comparing group factor structures.

inspect(fit, "mean.lv")

inspect(fit, "cov.lv")

Or if you want their associated Wald tests, use summary() or parameterEstimates()

CONVERGENCE FAILURES

I would be very grateful if you could give me some advice on how to mitigate this problem.

If you would like help with that, please provide a working example that does not involve your own wrapper function that needlessly complicates it from our perspective. Hopefully, your convergence problems will not persist once you follow more up-do-date advice, but if you have a model that does not converge, please simply provide R syntax to fit that one model to data (and if possible, provide some data so we can reproduce the problem and perhaps locate the cause).

Good luck!

Terrence D. Jorgensen

Postdoctoral Researcher, Methods and Statistics

Research Institute for Child Development and Education, the University of Amsterdam

UvA web page: http://www.uva.nl/profile/t.d.jorgensen

lemos...@gmail.com

unread,

Mar 15, 2018, 4:01:08 PM3/15/18

to lavaan

Dear Dr. Terrence Jorgensen,

First, I must tell that I immensely appreciated your help, your care, and above all your patience. I cannot thank Professor Yves Rosseel enough for creating lavaan and making it freely accessible, and both of you for creating and maintaining this fantastic forum. Second, I apologise for the extension of my post, but in fact I had many doubts (which thanks to you I know were well-founded) and I thought it would be even more annoying to pour them one by one. Third, I also apologise for not having read the help of fitted() with enough attention, thus making you loose unnecessary time.

Your response has extremely helpful. It takes homework to digest it in full, but in the meanwhile I want to give some feedback.

Millsap & Tein's (2004) suggestions are outdated and based on a limited investigation of the topic. Wu & Estabrook (2016) have released a more comprehensive discussion of the topic, and provide somewhat different advice.

https://link.springer.com/article/10.1007/s11336-016-9506-0

Quick summary: after your configural model, they advise adding equality constraints (and releasing identification constraints) in the following order:
thresholds
loadings
intercepts
residual variances

You can find details and justifications in the article.

Thank you very much for sending this paper. It is hard reading, but I think I already figured out a couple of points (please correct if I'm wrong):
- Imposing equality of the thresholds first (as done in LISREL) has the effect of stepping from the transformation in eq.(4) to that in eq.(3) in the paper, after which the sequence (and hierarchy) of constraints follows as for the continuous case. If this is correct, it means that in the ordinal case metric invariance is subdivided in two steps (equal thresholds first, then equal loadings);
- I have to study carefully which/how identification constraints are released as the invariance constraints are imposed, without interfering with lavaan::cfa defaults created by the group.equal argument. This means careful reading of section 5 in the paper, and maybe setting up different models for each invariance level;

There are half-a-dozen ways to view your estimates. fitted() is not one of them: that is for obtaining model-implied moments. See the help page:

class?lavaan

Can you state clearly and simply what you would like to view, so we can provide a helpful suggestion?

Sorry, I apologise for this. This point is clear.

4)      Handling convergence failures.

Millsap & Tein's specification often leads to convergence problems in lavaan, even when an equivalent configural model (the default when setting parameterization = "delta" or "theta") converges fine. Yves has not been able to figure out the problem, but I've seen in my own simulations that it definitely happens, and I have not been able to find out what it is about the data that "cause" the problem.

1.       In the configural model, the parameters that are equal for the two groups are the loadings of the marker items (=1), the factor means (=0), the variances of the ordinal items (=1), and the intercepts of the ordinal items (=0). If I’m interpreting correctly, these are the minimum conditions for model identification. The first two are the usual ones for fixing the location and scale of the latent variables for models with continuous items; the last two are for fixing the location and scale of the latent response variables that together with the thresholds relate the observed probability structure of the ordinal items with the latent variables. Is this correct?

I would not choose a marker variable unless you are already certain it is invariant. The default configural model that lavaan uses is a fine starting point.   parameterization = "theta" is only necessary if you are interested in testing strict invariance (which it appears you are. See the paper I linked to (same note for questions 2-6).

Thank you very much for the information about Millsap & Yun-Tein. The convergence problems are not coming from this specification, because I was not using it. They may be due to the identification constraints not being released properly, or to problems in the model(s). Overriding the marker items default marginally improved the fit indices but had no practical consequences (e.g. preventing convergence problems), so I will drop it. The "theta" parameterization is for testing strict invariance.

how to obtain the factor means and factor variance-covariance matrix for comparing group factor structures.

inspect(fit, "mean.lv")
inspect(fit, "cov.lv")

Or if you want their associated Wald tests, use summary() or parameterEstimates()

Sorry. My fault. Thank you also for the information on the Wald tests.

CONVERGENCE PROBLEMS

If you would like help with that, please provide a working example that does not involve your own wrapper function that needlessly complicates it from our perspective. Hopefully, your convergence problems will not persist once you follow more up-do-date advice, but if you have a model that does not converge, please simply provide R syntax to fit that one model to data (and if possible, provide some data so we can reproduce the problem and perhaps locate the cause).

The wrapper function comes from earlier experiments in which I varied the estimator methods and treatment of the variables. I will remove it from the code. I intend to create separate models so as to control the imposition/release of parameter constraints, once I figure out how to do that without interfering with the defaults of lavaa::cfa. Then I will come back with a shorter post, cleaner code and (hopefully) better results.

Good luck!

Terrence D. Jorgensen
Postdoctoral Researcher, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam
UvA web page: http://www.uva.nl/profile/t.d.jorgensen

I will need it!

Thank you very, very much.
Sincerely yours,

Carlos M. Lemos

Terrence Jorgensen

unread,

Mar 18, 2018, 11:38:49 AM3/18/18

to lavaan

- Imposing equality of the thresholds first (as done in LISREL) has the effect of stepping from the transformation in eq.(4) to that in eq.(3) in the paper, after which the sequence (and hierarchy) of constraints follows as for the continuous case. If this is correct, it means that in the ordinal case metric invariance is subdivided in two steps (equal thresholds first, then equal loadings);

Correct.

- I have to study carefully which/how identification constraints are released as the invariance constraints are imposed, without interfering with lavaan::cfa defaults created by the group.equal argument. This means careful reading of section 5 in the paper, and maybe setting up different models for each invariance level;

Yes, read carefully, and I would not advise using the group.equal= argument at all. Specify your models carefully, without relying on default tricks that are only meant for simple cases.

lemos...@gmail.com

unread,

Mar 19, 2018, 5:04:31 AM3/19/18

to lavaan

Dear Dr. Terrence Jorgensen,

Thank you once again for your help and advice. I went over the code of semTools::measurementInvarianceCat (which does not treat the mixed continuous-ordinal case) and read parts of the Mplus user's manual that mention constraints and convergence problems with MGCFA of ordinal items.
In line with your advice, I found that implementing the procedure in Wu & Estabrook requires building the models "by hand". This is because the intercepts for the ordinal items must be freed (not fixed at zero, which restricts line 3 of the transformation in eq.(4) of Wu & Estabrook), and the identification conditions I'm using for fixing the location and scales of the factors are different from those in Wu & Estabrook (mine are based on marker items).
I'm thinking of alternatives to do that, via the parameter table.

lemos...@gmail.com

unread,

Mar 28, 2018, 6:51:43 AM3/28/18

to lavaan

Dear Dr. Terrence Johnson,

Following your advice, I tried to implement a sequence of invariance tests for ordinal (or mixed continuous-ordinal) items based on the paper by Wu & Estabrook, in the suggested order, trying to implement the corresponding equations in the papers' sections. I first produced a parameter table “as close as possible to the desired level” via “cfa” with “do.fit = FALSE”. Then, using the auxiliary functions for freeing and fixing parameters and imposing constraints in “measurementInvarianceCat” by Sunthud Pornprasertmanit, Yves Rosseel and you, I released/fixed the desired parameters. Finally, I computed each solution with a call with the parTable instead of the model with “do.fit = TRUE”. I examined the parameter tables and tested the method with a simplified model with a single factor measured by four ordinal items (V30, V31, V32, and V33, all related to afterlife beliefs). In the configural model, I compared the MYT parameterization with the lavaan::cfa defaults. The difference is that in the latter the scales of the latent response variables are identified independently for all groups (as if they were reference), whereas in the former the scales for non-reference groups are picked from the reference group via the constrained thresholds. Both parameterizations yielded the same fit measures, so I used the lavaan::cfa default.

Here is my code, for a single factor two-group (female, male) model:

# ===============================================================================================

> free.intercepts.fix.means <- function(pt,ngroups,intercepts.to.free,means.to.fix) {

+ for(var in intercepts.to.free) pt <- freeParTable(pt,var,"~1","",2:ngroups,NA)

+ #for(var in residuals.to.free) pt <- freeParTable(pt,var,"~~","",2:ngroups,NA) # lavaan default

+ for (fact in means.to.fix) pt <- fixParTable(pt,fact,"~1","",2:ngroups,0)

+ return(pt)

+ }

>

> fix.all.intercepts <- function(pt,ngroups,intercepts.to.fix) {

+ for (group in 1:ngroups) {

+ for(var in intercepts.to.fix) {

+ pt <- fixParTable(pt,var,"~1","",group,0)

+ }

+ return(pt)

+ }

> # ===============================================================================================

>

> # PART I - LOAD THE DATA FRAME, SELECT FACTOR ITEMS AND GROUPS

> # ============================================================

> religion <- get(load("../ISSP_CFA_Christian_And_Unaffiliated_Data_Frame.RData"))

> source("../lavaanParTable.R") # Auxiliary functions freeParTable, fixParTable

>

> # THESE ARE THE ITEMS FOR ALL FACTORS EXAMINED FOR RELIABILITY IN EFA:

> cfa.selected.items <- c("V14","V15","V28","V29","V30","V31","V32","V33","V35","V37",

+ "V46","V47","V48","V49","V51","ATTEND")

>

> # THESE ARE THE SOCIODEMOGRAPHIC VARIABLES (GROUPINGS) FOR MGCFA ANALYSIS:

> cfa.groups <- c("AGE","MARITAL","SEX","DEGREE","WRKST","RELIGGRP","COUNTRY.NAME")

> religion <- religion[, c(cfa.selected.items,cfa.groups)]

>

> # CONVERT ITEMS IN DATA FRAME TO "NUMERIC"

> for (i in 1:length(cfa.selected.items)) religion[,i] <- as.numeric(religion[,i])

>

> # PART II - SPECIFY THE MODEL(s) AND COMPUTE MEASUREMENT INVARIANCE

> # =================================================================

> # DECLARE ORDINAL ITEMS; ALL OTHER ITEMS ARE TREATED AS NUMERIC

> ordered.items <- c("V30","V31","V32","V33","V35","V37")

>

> # SPECIFY THE MODEL

> model <- 'afterlife.beliefs =~ V30 + V31 + V32 + V33

+ '

>

> fit.indices <- c("chisq", "df", "pvalue", "cfi",

+ "rmsea", "rmsea.ci.lower", "rmsea.ci.upper",

+ "srmr", "pgfi", "pnfi")

>

> # TEST FOR "SEX"

> # ==============

> # CONFIGURAL MODEL (LAVAAN DEFAULTS, EACH GROUP IDENTIFIED INDEPENDENTLY)

> configural.SEX <- cfa(data = religion[-which(is.na(religion$SEX)),], model = model,

+ ordered = ordered.items,

+ group = "SEX", group.equal = "",

+ estimator = "WLSMV",

+ parameterization = "theta", do.fit = TRUE)

>

> # Number of groups, variable and factor names for invariance tests

> ngroups <- max(partable(configural.SEX)$group)

> intercepts.to.free <- c("V30","V31","V32","V33")

> intercepts.to.fix <- intercepts.to.free

> residuals.to.free <- intercepts.to.free # not used, lavaan default

> means.to.fix <- "afterlife.beliefs"

>

> # INVARIANT THRESHOLDS (WU & ESTABROOK, SEC. 5.1, EQ.(15)

> pt <- cfa(data = religion[-which(is.na(religion$SEX)),], model = model,

+ ordered = ordered.items,

+ group = "SEX", group.equal = "thresholds",

+ parameterization = "theta", do.fit = FALSE)@ParTable

> pt <- free.intercepts.fix.means(pt,ngroups,intercepts.to.free,means.to.fix)

> thresholds.SEX <- cfa(data = religion[-which(is.na(religion$SEX)),], model = pt,

+ ordered = ordered.items,

+ group = "SEX",

+ estimator = "WLSMV",

+ parameterization = "theta", do.fit = TRUE)

>

> # METRIC: INVARIANT THRESHOLDS AND LOADINGS (WU & ESTABROOK, SEC. 5.3, EQ.(19)

> pt <- cfa(data = religion[-which(is.na(religion$SEX)),], model = model,

+ ordered = ordered.items,

+ group = "SEX", group.equal = c("thresholds","loadings"),

+ parameterization = "theta", do.fit = FALSE)@ParTable

> pt <- free.intercepts.fix.means(pt,ngroups,intercepts.to.free,means.to.fix)

> metric.SEX <- cfa(data = religion[-which(is.na(religion$SEX)),], model = pt,

+ ordered = ordered.items,

+ group = "SEX",

+ estimator = "WLSMV",

+ parameterization = "theta", do.fit = TRUE)

>

> # SCALAR: INVARIANT THRESHOLDS, LOADINGS AND INTERCEPTS, WU & ESTABROOK SEC. 5.7, EQ.(27)

> pt <- cfa(data = religion[-which(is.na(religion$SEX)),], model = model,

+ ordered = ordered.items,

+ group = "SEX", group.equal = c("thresholds","loadings","intercepts"),

+ parameterization = "theta", do.fit = FALSE)@ParTable

> pt <- fix.all.intercepts(pt,ngroups,intercepts.to.fix)

> scalar.SEX <- cfa(data = religion[-which(is.na(religion$SEX)),], model = pt,

+ ordered = ordered.items,

+ group = "SEX",

+ estimator = "WLSMV",

+ parameterization = "theta", do.fit = TRUE)

>

> # STRICT: INVARIANT THRESHOLDS, LOADINGS, INTERCEPTS AND RESIDUAL VARIANCES, SEC. 5, EQ.(9)

> pt <- cfa(data = religion[-which(is.na(religion$SEX)),], model = model,

+ ordered = ordered.items,

+ group = "SEX", group.equal = c("thresholds","loadings","intercepts","residuals"),

+ parameterization = "theta", do.fit = FALSE)@ParTable

> pt <- fix.all.intercepts(pt,ngroups,intercepts.to.fix)

> strict.SEX <- cfa(data = religion[-which(is.na(religion$SEX)),], model = pt,

+ ordered = ordered.items,

+ group = "SEX",

+ estimator = "WLSMV",

+ parameterization = "theta", do.fit = TRUE)

>

> # RESULTS FOR "SEX"

> invarianceTable.SEX <- lavInspect(configural.SEX,what = "fit")[fit.indices]

> invarianceTable.SEX <- rbind(invarianceTable.SEX,lavInspect(thresholds.SEX,what = "fit")[fit.indices])

> invarianceTable.SEX <- rbind(invarianceTable.SEX,lavInspect(metric.SEX,what = "fit")[fit.indices])

> invarianceTable.SEX <- rbind(invarianceTable.SEX,lavInspect(scalar.SEX,what = "fit")[fit.indices])

> invarianceTable.SEX <- rbind(invarianceTable.SEX,lavInspect(strict.SEX,what = "fit")[fit.indices])

> rownames(invarianceTable.SEX) <- c("configural","thresholds","loadings","intercepts","residuals")

> invarianceTable.SEX

chisq df pvalue cfi rmsea rmsea.ci.lower rmsea.ci.upper srmr pgfi pnfi

configural 190.1588 4 0 0.9997429 0.05594906 0.04932433 0.06286872 0.01226153 0.1110806 0.3332458

thresholds 199.6971 8 0 0.9997352 0.04014614 0.03543365 0.04505954 0.01226152 0.2221585 0.6664828

loadings 201.5413 11 0 0.9997368 0.03413337 0.03009711 0.03833838 0.01226922 0.3054671 0.9164115

intercepts 313.5511 14 0 0.9995862 0.03793608 0.03435204 0.04164074 0.01220892 0.3887197 1.1661614

residuals 388.9744 18 0 0.9994876 0.03723203 0.03406380 0.04049561 0.01324059 0.4997345 1.4991941

> lavInspect(scalar.SEX, what = "mean.lv")

$Female

afterlife.beliefs

0

$Male

afterlife.beliefs

-0.579

The results look coherent. In particular, they confirm the well known fact that women have stronger beliefs than men. But I have some doubts:

- - In the original paper the factor variance-covariance matrix is set to I for all groups, but in my case I guess this is replaced by the marker item loading providing the scale;

- - I’m not sure the condition for fixing group means (kappa^(g) = 0 in the Wu & Estabrook paper) via fixing “~1” for the factor is correct. I found that the group means are only free and comparable for the scalar and strict levels (which seems correct);

- - For the scalar and strict levels the thresholds are invariant, so they could be picked from the first group. But fixing them explicitly reduced the number of df and slightly changed the fit measures. I think this was done correctly.

This seems to be working for this single factor model. I managed to compute invariance tables for 26 countries without convergence problems. But when I stepped to the three-factor model in the previous post (with marker items chosen by default) and tested for AGE, I got the following error:

# METRIC: INVARIANT THRESHOLDS AND LOADINGS (WU & ESTABROOK, SEC. 5.3, EQ.(19)

> pt <- cfa(data = religion[-which(is.na(religion$AGE)),], model = model,

+ ordered = ordered.items,

+ group = "AGE", group.equal = c("thresholds","loadings"),

+ parameterization = "theta", do.fit = FALSE)@ParTable

> pt <- free.intercepts.fix.means(pt,ngroups,intercepts.to.free,means.to.fix)

> metric.AGE <- cfa(data = religion[-which(is.na(religion$AGE)),], model = pt,

+ ordered = ordered.items,

+ group = "AGE",

+ estimator = "WLSMV",

+ parameterization = "theta", do.fit = TRUE)

Error in nlminb(start = start.x, objective = minimize.this.function, gradient = GRADIENT, :

NA/NaN gradient evaluation

In addition: Warning messages:

1: In sqrt(diag(COV)) : NaNs produced

2: In sqrt(dsigma) : NaNs produced

3: In sqrt(dsigma) : NaNs produced

4: In sqrt(dsigma) : NaNs produced

5: In sqrt(dsigma) : NaNs produced

6: In sqrt(dsigma) : NaNs produced

although the solution converged for the next (scalar, strict) levels. For the countries, this message appears for all models and no solution is computed. I don’t know if this is due to an identification problem, or imposing incorrect constraints, or to problems in the data.

Many thanks for your great help so far!

Best regards,

Carlos M. Lemos

Terrence Jorgensen

unread,

Mar 28, 2018, 8:55:46 AM3/28/18

to lavaan

Dear Dr. Terrence Johnson,

Jorgensen :-) Terrence is fine.

I first produced a parameter table “as close as possible to the desired level” via “cfa” with “do.fit = FALSE”.

I appreciate starting with a simpler, 1-factor model as an example. May I also suggest starting with 2 groups before expanding to 26?

I also understand why you want to automate this in some way, given you have 26 groups, which would make the syntax quite hard to specify manually. However, the hidden semTools functions (freeParTable, fixParTable) are not keeping up to date with the changes to the parameter table that lavaan specifies. Eventually, I will phase those out. I also recommend not relying on them when asking for help here, because it limits who can/will offer help.

You can still automate using the paste() function. For example, here is how you can specify the parameter labels for factor loadings of one item (start small, then work up):

V <- 31 # variable 31
G <- 1:3 # 3 groups, for proof-of-concept
(parLabels <- paste0("L", V, ".g", G, collapse = ", ")) # "L31.g1, L31.g2, L31.g3"
paste0("F1 =~ c(", parLabels, ")*V", V)

[1] "F1 =~ c(L31.g1, L31.g2, L31.g3)*V31"

Capitalizing on the fact that you can pass a vector of character strings as the lavaan model (rather than everything in a single pair of quotation marks), you can loop through variables to create your model:

model <- character(0)
for (V in 30:33) {
  parLabels <- paste0("L", V, ".g", G, collapse = ", ")
  model <- c(model, paste0("F1 =~ c(", parLabels, ")*V", V))
}
cat(model, sep = "\n")

F1 =~ c(L30.g1, L30.g2, L30.g3)*V30

F1 =~ c(L31.g1, L31.g2, L31.g3)*V31

F1 =~ c(L32.g1, L32.g2, L32.g3)*V32

F1 =~ c(L33.g1, L33.g2, L33.g3)*V33

Then it is easy enough to scale up by setting G <- 1:26, and to loop over another set of V variables to define a different factor, and simply add that to the vector to make a larger model. It is also simple to adjust what is being pasted to define different parameters ("~~" and "~1").

To equality-constrain parameters across groups, give them the same labels across groups (i.e., get rid of the ".g", G differences in the labels) by repeating the same label with the rep() function. To make equal loadings:

model <- character(0)
for (V in 30:33) {
  parLabels <- paste0("L", rep(V, length(G)), collapse = ", ")
  model <- c(model, paste0("F1 =~ c(", parLabels, ")*V", V))
}
cat(model, sep = "\n")

F1 =~ c(L30, L30, L30)*V30

F1 =~ c(L31, L31, L31)*V31

F1 =~ c(L32, L32, L32)*V32

F1 =~ c(L33, L33, L33)*V33

Again, same principal applies for specifying other types of parameters. If you are going to need to ask for help (understandable, given how complex this is, and too new to expect tutorial applications yet), I would recommend this because it makes easily reproducible syntax by any reader, and it gives you a way to check your model specifications visually in a way that is probably easier than inspecting a parameter table with so many rows.

In the configural model, I compared the MYT parameterization with the lavaan::cfa defaults. The difference is that in the latter the scales of the latent response variables are identified independently for all groups (as if they were reference), whereas in the former the scales for non-reference groups are picked from the reference group via the constrained thresholds. Both parameterizations yielded the same fit measures, so I used the lavaan::cfa default.

That is what I recommend. As the article shows, the models are equivalent, but lavaan's (actually, R's) optimizer has trouble sometimes with the MYT parameterization. You seem to want to test equivalence of residual variances, so make sure to set parameterization = "theta" rather than the default "delta".

The results look coherent.

I would very carefully count the number of parameters you fix and free between each step (configural to thresholds, thresholds to loadings, etc.), so that you can verify that you gain as many df as you expect.

- In the original paper the factor variance-covariance matrix is set to I for all groups, but in my case I guess this is replaced by the marker item loading providing the scale;

Is MYT the "original"? They estimated the factor covariance matrix and fixed a loading. I do not recommend using a marker variable, because it biases subsequent tests of which item(s) has/have DIF, assuming a particular level of invariance is rejected.

- - I’m not sure the condition for fixing group means (kappa^(g) = 0 in the Wu & Estabrook paper) via fixing “~1” for the factor is correct. I found that the group means are only free and comparable for the scalar and strict levels (which seems correct);

That is correct. Fix latent means to zero in all groups, until you constrain intercepts to equality, in which case only 1 group's mean has to be fixed to zero.

- - For the scalar and strict levels the thresholds are invariant, so they could be picked from the first group. But fixing them explicitly reduced the number of df and slightly changed the fit measures. I think this was done correctly.

You don't fix them, you just equality-constrain the estimates. They don't belong to any one group, they are simultaneously the estimates for all groups.

This seems to be working for this single factor model. I managed to compute invariance tables for 26 countries without convergence problems. But when I stepped to the three-factor model in the previous post (with marker items chosen by default) and tested for AGE, I got the following error:

To narrow it down further, try fitting 1-factor models for each factor, to see where the problem is before trying a 3-factor model. If all three 1-factor models converge, then the 3-factor model might just be too large for the optimizer to handle. The gradient is the first derivate of the vector of model parameters, and it tells the optimizer how to update the estimates in the next iteration. So if the gradient cannot be calculated, convergence can't happen because the optimizer just doesn't know where to look for the solution.

lemos...@gmail.com

unread,

Mar 29, 2018, 7:29:21 AM3/29/18

to lavaan

Dear Terrence,

Thank you very much once again for your invaluable help. I have more homework, but I also want to send some feedback.

I appreciate starting with a simpler, 1-factor model as an example. May I also suggest starting with 2 groups before expanding to 26?

Yes, I always start with two groups (SEX) which is my first simpler test. Then I check if it works for other groupings to see if I get a coherent and convergent solution. The most difficult is countries, which involves a large number of parameter.

I also understand why you want to automate this in some way, given you have 26 groups, which would make the syntax quite hard to specify manually. However, the hidden semTools functions (freeParTable, fixParTable) are not keeping up to date with the changes to the parameter table that lavaan specifies. Eventually, I will phase those out. I also recommend not relying on them when asking for help here, because it limits who can/will offer help.

You can still automate using the paste() function. For example, here is how you can specify the parameter labels for factor loadings of one item (start small, then work up):

Thank you very much for the suggestion and the fine code snippets. That will require revising the code entirely, but I will see how fast I can do it.

Is MYT the "original"? They estimated the factor covariance matrix and fixed a loading. I do not recommend using a marker variable, because it biases subsequent tests of which item(s) has/have DIF, assuming a particular level of invariance is rejected.

Sorry for my imprecision - here "original" was Wu & Estabrook. I used the marker items to let the factor variances free. If I fix them I get a factor correlation matrix, which should be fine. I would have to fix the factor variances in the model and use std.lv = TRUE, and see if this leads to more convergence issues.

To narrow it down further, try fitting 1-factor models for each factor, to see where the problem is before trying a 3-factor model. If all three 1-factor models converge, then the 3-factor model might just be too large for the optimizer to handle. The gradient is the first derivate of the vector of model parameters, and it tells the optimizer how to update the estimates in the next iteration. So if the gradient cannot be calculated, convergence can't happen because the optimizer just doesn't know where to look for the solution.

I think I cannot test each factor separately because in the 3 factor model there are only 3 items per factor (I only use V30, V31 and V32 to simplify the full model and avoid convergence problems). In the full model I also have the problem that one of the factors has an item that is best treated as continuous (V28) and the other two as ordinal (V35 and V37). I will have to see how to deal with this.

It's strange that sometimes the full model breaks down at e.g. (thresholds + loadings) invariance but converges for the more restrictive levels. I think I'm doing the identification(s) correctly, because when the solutions converge the standard errors are computed (the routines issue warnings when an identification problem is suspected to exist).

Once again, thank you very much for your help. I will now try to implement your suggestions.

Terrence Jorgensen

unread,

Mar 30, 2018, 10:38:08 AM3/30/18

to lavaan

I would have to fix the factor variances in the model and use std.lv = TRUE, and see if this leads to more convergence issues.

When you use that argument to fix the latent variances to 1, make sure you specify in the model syntax that all but the first group's variance is free when loadings are constrained to equality. In a 2-group example, use NA to say that the value should be estimated for group 2:

F1 ~~ c(1, NA)*F1

This corresponds to the diag(Phi) column in Wu & Estabrook's tables (which are easier to interpret if you standardize the common factor and estimate all the loadings).

I think I cannot test each factor separately because in the 3 factor model there are only 3 items per factor

3 items per factor is a just-identified model, so yes you can fit it in a model by itself. The configural model would have df = 0, but adding equality constraints would still allow you to test invariance by comparing more restricted to less restricted models.

one of the factors has an item that is best treated as continuous (V28) and the other two as ordinal (V35 and V37). I will have to see how to deal with this.

In the configural model, the continuous item's loading, intercept, and residual variance will be free across groups. When constraining the thresholds of the 2 ordinal items to equality, just leave the continuous item as it is (free loading, intercept, and residual variance across groups). After that, you apply constraints to loadings of all items, then intercepts of all items. When you test residual variances for equality, you will constrain (but still estimate) the residual variance of the continuous item, whereas you must fix the residual variance of each categorical item to 1 (which is what the first group's value is fixed to for identification).

It's strange that sometimes the full model breaks down at e.g. (thresholds + loadings) invariance but converges for the more restrictive levels. I think I'm doing the identification(s) correctly, because when the solutions converge the standard errors are computed (the routines issue warnings when an identification problem is suspected to exist).

Another reason it is a good idea to calculate your expected degrees of freedom. You can use the formulas in Wu & Estabrook's tables to calculate your expected degrees of freedom for each model, and consult the figures to calculate the expected change in df (although I found those a little more difficult to follow than the tables).

lemos...@gmail.com

unread,

Apr 4, 2018, 10:08:14 AM4/4/18

to lavaan

Dear Terrence,

After testing the process with the 1-factor model I’m now working on the 9 item/3 factor model:

'afterlife.beliefs =~ V30 + V31 + V32,

belief.importance.God =~ V28 + V35 + V37,

religious.formation =~ V46 + V47 + V48 '

I still did not implement your suggestion of building the models with character strings for a number of reasons (mostly time pressure, and the need to check that I will not miss anything while building the five complete models without any “template” and that "cfa" will not do anything unwanted; on top of this I have 5 different groupings to test).

However, I implemented your other general suggestion of using factor covariances instead of marker items for identifying the factors’ scales, and wrote two very simple functions that show which parameters are fixed, free and constrained for each group. This allowed me to check that the freeParTable and fixParTable functions worked correctly, and that the fixed and free parameters at each step – configural, thresholds, metric (thresholds + loadings), scalar and strict – were set as specified by Wu & Estabrook. Here is an example of the free parameters for groups 1 and 2 of “SEX” for the above model with metric invariance, with all items in the first two factors treated as ordinal and in the third as continuous (more on this later):

> show.free(metric.SEX@ParTable,1)

id lhs op rhs free label plabel ustart

1 1 afterlife.beliefs =~ V30 1 .p1. .p1. NA

2 2 afterlife.beliefs =~ V31 2 .p2. .p2. NA

3 3 afterlife.beliefs =~ V32 3 .p3. .p3. NA

4 4 belief.importance.God =~ V28 4 .p4. .p4. NA

5 5 belief.importance.God =~ V35 5 .p5. .p5. NA

6 6 belief.importance.God =~ V37 6 .p6. .p6. NA

7 7 religious.formation =~ V46 7 .p7. .p7. NA

8 8 religious.formation =~ V47 8 .p8. .p8. NA

9 9 religious.formation =~ V48 9 .p9. .p9. NA

10 10 V30 | t1 10 .p10. .p10. NA

11 11 V30 | t2 11 .p11. .p11. NA

12 12 V30 | t3 12 .p12. .p12. NA

13 13 V31 | t1 13 .p13. .p13. NA

14 14 V31 | t2 14 .p14. .p14. NA

15 15 V31 | t3 15 .p15. .p15. NA

16 16 V32 | t1 16 .p16. .p16. NA

17 17 V32 | t2 17 .p17. .p17. NA

18 18 V32 | t3 18 .p18. .p18. NA

19 19 V28 | t1 19 .p19. .p19. NA

20 20 V28 | t2 20 .p20. .p20. NA

21 21 V28 | t3 21 .p21. .p21. NA

22 22 V28 | t4 22 .p22. .p22. NA

23 23 V28 | t5 23 .p23. .p23. NA

24 24 V35 | t1 24 .p24. .p24. NA

25 25 V35 | t2 25 .p25. .p25. NA

26 26 V35 | t3 26 .p26. .p26. NA

27 27 V35 | t4 27 .p27. .p27. NA

28 28 V37 | t1 28 .p28. .p28. NA

29 29 V37 | t2 29 .p29. .p29. NA

30 30 V37 | t3 30 .p30. .p30. NA

31 31 V37 | t4 31 .p31. .p31. NA

38 38 V46 ~~ V46 32 .p38. NA

39 39 V47 ~~ V47 33 .p39. NA

40 40 V48 ~~ V48 34 .p40. NA

44 44 afterlife.beliefs ~~ belief.importance.God 35 .p44. NA

45 45 afterlife.beliefs ~~ religious.formation 36 .p45. NA

46 46 belief.importance.God ~~ religious.formation 37 .p46. NA

59 59 V46 ~1 38 .p59. NA

60 60 V47 ~1 39 .p60. NA

61 61 V48 ~1 40 .p61. NA

> show.free(metric.SEX@ParTable,2)

id lhs op rhs free label plabel ustart

65 65 afterlife.beliefs =~ V30 41 .p1. .p65. NA

66 66 afterlife.beliefs =~ V31 42 .p2. .p66. NA

67 67 afterlife.beliefs =~ V32 43 .p3. .p67. NA

68 68 belief.importance.God =~ V28 44 .p4. .p68. NA

69 69 belief.importance.God =~ V35 45 .p5. .p69. NA

70 70 belief.importance.God =~ V37 46 .p6. .p70. NA

71 71 religious.formation =~ V46 47 .p7. .p71. NA

72 72 religious.formation =~ V47 48 .p8. .p72. NA

73 73 religious.formation =~ V48 49 .p9. .p73. NA

74 74 V30 | t1 50 .p10. .p74. NA

75 75 V30 | t2 51 .p11. .p75. NA

76 76 V30 | t3 52 .p12. .p76. NA

77 77 V31 | t1 53 .p13. .p77. NA

78 78 V31 | t2 54 .p14. .p78. NA

79 79 V31 | t3 55 .p15. .p79. NA

80 80 V32 | t1 56 .p16. .p80. NA

81 81 V32 | t2 57 .p17. .p81. NA

82 82 V32 | t3 58 .p18. .p82. NA

83 83 V28 | t1 59 .p19. .p83. NA

84 84 V28 | t2 60 .p20. .p84. NA

85 85 V28 | t3 61 .p21. .p85. NA

86 86 V28 | t4 62 .p22. .p86. NA

87 87 V28 | t5 63 .p23. .p87. NA

88 88 V35 | t1 64 .p24. .p88. NA

89 89 V35 | t2 65 .p25. .p89. NA

90 90 V35 | t3 66 .p26. .p90. NA

91 91 V35 | t4 67 .p27. .p91. NA

92 92 V37 | t1 68 .p28. .p92. NA

93 93 V37 | t2 69 .p29. .p93. NA

94 94 V37 | t3 70 .p30. .p94. NA

95 95 V37 | t4 71 .p31. .p95. NA

96 96 V30 ~~ V30 72 .p96. NA

97 97 V31 ~~ V31 73 .p97. NA

98 98 V32 ~~ V32 74 .p98. NA

99 99 V28 ~~ V28 75 .p99. NA

100 100 V35 ~~ V35 76 .p100. NA

101 101 V37 ~~ V37 77 .p101. NA

102 102 V46 ~~ V46 78 .p102. NA

103 103 V47 ~~ V47 79 .p103. NA

104 104 V48 ~~ V48 80 .p104. NA

105 105 afterlife.beliefs ~~ afterlife.beliefs 81 .p105. NA

106 106 belief.importance.God ~~ belief.importance.God 82 .p106. NA

107 107 religious.formation ~~ religious.formation 83 .p107. NA

108 108 afterlife.beliefs ~~ belief.importance.God 84 .p108. NA

109 109 afterlife.beliefs ~~ religious.formation 85 .p109. NA

110 110 belief.importance.God ~~ religious.formation 86 .p110. NA

117 117 V30 ~1 87 .p117. NA

118 118 V31 ~1 88 .p118. NA

119 119 V32 ~1 89 .p119. NA

120 120 V28 ~1 90 .p120. NA

121 121 V35 ~1 91 .p121. NA

122 122 V37 ~1 92 .p122. NA

123 123 V46 ~1 93 .p123. NA

124 124 V47 ~1 94 .p124. NA

125 125 V48 ~1 95 .p125. NA

We can confirm e.g. that thresholds and loadings are constrained; the intercepts of the ordinal items are free in the second group (main point in Wu & Estabrook); the variances of the factors are free in the second group (but not in the first) because the loadings were constrained across groups; and the intercepts of items loading on “religious.formation” (treated as continuous) are free because the group means are (still) fixed.

I checked the fixed/free parameters for reference/non-reference groups at all levels and I think that inspecting the parameter table is indeed very useful. It allowed me to detect some interesting details. For example, I found that when the thresholds of the ordinal items are fixed, “cfa” also constrains the intercepts of the continuous items to be equal across groups (as if these items were ordinal). I successfully circumvented this using “group.partial” to keep the three intercepts (V46, V47 and V48) independent across groups (I can post the code, if you want).

I also followed your very important suggestion of counting the degrees of freedom at each levels. Here is a table for the model above, with two groups (Male, Female):

> round(invarianceTable.SEX,4)

chisq df pvalue cfi rmsea rmsea.ci.lower rmsea.ci.upper srmr pgfi pnfi

configural 365.7118 48 0 0.9997 0.0226 0.0205 0.0248 0.0196 0.3749 0.6664

thresholds 383.3535 58 0 0.9997 0.0208 0.0188 0.0228 0.0196 0.4530 0.8053

loadings 395.6849 64 0 0.9997 0.0200 0.0181 0.0219 0.0199 0.4999 0.8886

intercepts 583.5853 70 0 0.9995 0.0238 0.0220 0.0256 0.0199 0.5466 0.9717

residuals 655.4654 79 0 0.9995 0.0237 0.0221 0.0254 0.0205 0.6169 1.0965

Assuming that the configural model is OK ("cfa" default), I don’t understand why the df of the thresholds-invariant model is 58 (there are 22 thresholds in group 2 = group 1, but 6 free intercepts for the categorical items in group 2 need to be estimated). For invariant loadings the count is 58 + 9 loadings in group 2 = group 1, - 3 factor variances in group 2 that need to be estimated, = 64 (OK). For invariant intercepts the count is 64 + 9 intercepts in group 2 = group 1, - 3 factor means in group 2 that need to be estimated, = 70 (OK). For full invariance, it is 70 + 9 residual variances in group 2 = group 1, making 79 (OK).

I computed invariance tables for the 3-factor model treating all items as ordinal (slightly simpler code), or treating two as ordinal and one as continuous (code that produced the tables above). The conclusions are the same, but the ordinal + continuous treatment yields slightly worse fit indices and creates more convergence problems.

Before trying to change the code of the models (via character strings) my doubts are:

What am I missing with the df count for the thresholds-invariant model?
Is the treatment of the continuous factor correct? The identification constraints for this factor were set as follows: THRESHOLDS – factor means and variances fixed at 0 and 1 for all groups (standard, as in CONFIGURAL); THRESHOLDS + LOADINGS (METRIC): factor means fixed at 0, but factor variances free for groups 2:ngroups (once loadings are constrained, factor scales of non-reference groups are freed); THRESHOLDS + LOADINGS + INTERCEPTS (SCALAR): factor means and variances free for groups 2:ngroups (as intercepts are also constrained, factor means for non-reference groups are freed, and factor means can be compared if invariance holds); FULL: same as SCALAR.
I suspect that the convergence problems have something to do with the data. For example, I know that for several countries there are very few counts on some item categories (e.g. low/high frequency of attendance to religious services). These problems also tend to occur for the scalar model, while the strict model converges. They happen mostly with the “DEGREE” and “COUNTRY” groupings. Is there another robust estimator that I can try (DWLS does not work, it’s worse than WLSMV)? Are there any options that can be passed to the numerical routines that might help minimizing these problems (e.g. under-relaxation parameters, iteration limits, etc.)?
How can I test for structural differences of factor means and factor correlation matrices across groups? Just setting additional contraints on "means", "lv.variances" and "lv.covariances" in "group.equal" or equivalent, and using the same RMSEA and Delta_CFI, etc, criteria as for measurement invariance?

Anyway, I am extremely grateful to you for all your help and suggestions! With these, I was able to make significant progress.

Carlos Miguel Lemos cmrso@iscte-iul.pt

unread,

Apr 4, 2018, 5:38:21 PM4/4/18

to lav...@googlegroups.com

I have one correction to my previous mail: treating one factor as continuous (V46, V47 and V48 have 9 levels) yields better fit measures, not worse. But it also leads to convergence problems with the scalar model (thresholds+loadings+intercepts) when testing for educational degree and country, which do not occur when I treat all items and factors as ordinal.

Best,

Carlos

--
You received this message because you are subscribed to a topic in the Google Groups "lavaan" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lavaan/ol7rsMXzN5w/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lavaan+unsubscribe@googlegroups.com.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at https://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.

Terrence Jorgensen

unread,

Apr 5, 2018, 8:13:17 AM4/5/18

to lavaan

What am I missing with the df count for the thresholds-invariant model?

In addition to freeing 6 intercepts in group 2, you also free 6 residual variances in group 2 when you constrain 22 thresholds across groups. So overall you gain 22 - 6- 6 = 10 df, which is correct.

Is the treatment of the continuous factor correct?

To be clear, the common factors are all assumed normally distributed (therefore continuous). The indicators can be continuous or ordinal.

The identification constraints for this factor were set as follows: THRESHOLDS – factor means and variances fixed at 0 and 1 for all groups (standard, as in CONFIGURAL); THRESHOLDS + LOADINGS (METRIC): factor means fixed at 0, but factor variances free for groups 2:ngroups (once loadings are constrained, factor scales of non-reference groups are freed); THRESHOLDS + LOADINGS + INTERCEPTS (SCALAR): factor means and variances free for groups 2:ngroups (as intercepts are also constrained, factor means for non-reference groups are freed, and factor means can be compared if invariance holds); FULL: same as SCALAR.

Correct.

I suspect that the convergence problems have something to do with the data. For example, I know that for several countries there are very few counts on some item categories (e.g. low/high frequency of attendance to religious services). These problems also tend to occur for the scalar model, while the strict model converges. They happen mostly with the “DEGREE” and “COUNTRY” groupings. Is there another robust estimator that I can try (DWLS does not work, it’s worse than WLSMV)?

WLSMV is the DWLS estimator, with a mean- and variance-adjusted (MV) test statistic. Categories with few counts can indeed cause problems, since there is very little information with which to estimate thresholds with any precision.

Are there any options that can be passed to the numerical routines that might help minimizing these problems (e.g. under-relaxation parameters, iteration limits, etc.)?

People sometimes collapse categories out of necessity when the same categories were not observed in all groups (e.g., if no men endorsed category 4 for V30). I suppose you could do the same if there were simply too-small counts in extreme categories. You would lose some information about individual differences, but obviously not very much information if the counts are so low that they lead to estimation problems.

How can I test for structural differences of factor means and factor correlation matrices across groups? Just setting additional contraints on "means", "lv.variances" and "lv.covariances" in "group.equal" or equivalent, and using the same RMSEA and Delta_CFI, etc, criteria as for measurement invariance?

The latent means are identified by fixing them to zero in the first group. To test whether they are equal in group 2, also set those equal to zero, then compare the restricted to unrestricted model. Factor correlations are not fixed for identification, so you can add labels to your syntax / parameter tables to constrain the estimates to equality across groups. Not that this is only possible if you first (test whether you can) constrain factor variances to equality across groups by fixing them to 1 in the group 1 and 2 (that way, all the factor covariances will be standardized, i.e., correlations).

kma...@aol.com

unread,

Apr 5, 2018, 8:41:52 AM4/5/18

to lavaan

Carlos,
Terrence has answered your questions but I just wanted to add some more general advice about strategy.

First, attempting to debug both the model specification and data-specific issues in one pass can be unnecessarily complicated. When working with a complex analysis, it can be helpful to separate these two aspects of the analysis into sequential steps. First, make use of the simulateData() function to simulate data based on your model specification. Use that as proof of concept that the proposed analysis accomplishes what you intend and to work out the specification without having to worry about data-specific issues. Once you have that sorted, then apply the resulting models to your data as a separate step.

Second, parameter tables are great. However, when you are puzzled by df, it can also be very helpful to look at the same information another way, in matrix form, by requesting the free parameters. This will assign a unique number to each parameter and show the location of the parameter in the matrix representation of the model.

lavInspect(fit, what='free')

Keith
------------------------
Keith A. Markus
John Jay College of Criminal Justice, CUNY
http://jjcweb.jjay.cuny.edu/kmarkus
Frontiers of Test Validity Theory: Measurement, Causation and Meaning.
http://www.routledge.com/books/details/9781841692203/

lemos...@gmail.com

unread,

Apr 5, 2018, 9:02:46 AM4/5/18

to lavaan

Thank you very much for your clarifications!
I will now try to overcome the convergence problems by working on the data and start building the models via character strings as you suggested.
I found that convergence problems with datasets of this type, and with mixed categorical-numerical variables, have been reported by other authors.
Once again, many thanks for all your help, care and patience!

lemos...@gmail.com

unread,

Apr 5, 2018, 9:08:33 AM4/5/18

to lavaan

Hi Keith,

Thank you also for your very nice advice, which complements Terrence's!
It has been a fantastic experience to interact with this forum - and use lavaan.
Best regards,

Carlos

Reply all

Reply to author

Forward