DIF multiple groups: how can I equalize standardized loadings (instead of unstandardized loadings)?

Michael Paul Grosz

unread,

Sep 4, 2014, 3:11:14 PM9/4/14

to lav...@googlegroups.com

Dear all,

As part of a DIF analysis, I did an "omnibus test" by setting all loadings and intercept parameters equal across groups via the group.equal argument (group="gender",group.equal=c("loadings","intercepts"))
However, when I looked at the standardized loadings ("Std.all") they were not equal. According to my understanding, the standardized loadings are better than the unstandardized loadings for comparison across groups, at least when one wants to compare groups with regard to how precisely an item measures a latent trait and how strong the standing on the latent trait and the answers on a particular item are correlated, respectively. Thus, I would rather like to equalize the standardized loadings than the unstandardized loadings.
Does anybody know a way how I can do that in lavaan?

I already tried to equalize everything that is possible to equalize in the group.equal argument (i.e., "loadings","intercepts","lv.variances","lv.covariances","residual.covariances","residuals","means") but the standardized loadings were never equal. I think the reason is that the observed variance varies across groups and so the standardized and unstandardized loadings can't be simultaneously equal across groups. So, I assume we would need to free the unstandardized loadings before we can equalize the standardized loading.

Furthermore, if anyone has an argument/idea why it is more sensible to equalize the unstandardized loadings than the standardized ("Std.all") loadings, I would be happy to hear it.

best,
Michael

Sunthud Pornprasertmanit

unread,

Sep 4, 2014, 4:30:28 PM9/4/14

to lav...@googlegroups.com

Dear Michael,

To my knowledge, I would say that constraining unstandardized loadings and constraining standardized loadings answer different questions. For unstandardized loadings constraints, we are sure that if factor scores increased by 1 in both groups, the item scores are expected to change by the same amount. This is the goal of measurement that we assign the same scores if the underlying latent variables have the same values in both groups. However, standardized loadings constraints would answer whether the correlations between latent variables and item scores (for a congeneric factor model) are the same across groups.

Best,

Sunthud

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To post to this group, send email to lav...@googlegroups.com.
Visit this group at http://groups.google.com/group/lavaan.
For more options, visit https://groups.google.com/d/optout.

Terrence Jorgensen

unread,

Sep 4, 2014, 4:42:57 PM9/4/14

to lav...@googlegroups.com

Furthermore, if anyone has an argument/idea why it is more sensible to equalize the unstandardized loadings than the standardized ("Std.all") loadings, I would be happy to hear it.

I agree with Sunthud. To elaborate, the point of equating measurement parameters is not so that you can compare measurement parameters across groups/times. Rather, latent parameters (means, SDs, factor correlations, regressions between latent variables) can only be meaningfully compared across groups/times if the (unstandardized) measurement parameters are equal. If you are merely interested in comparing measurement parameters across groups, you do not need to compare them -- fit the configural model (allowing loadings and intercepts to differ between groups) and compare your completely standardized estimates, which will tell you how much more strongly each item is related to the common factor in one group than another. You can still make this comparison after constraining the loadings to equality, allowing latent variances to differ across groups (which would already be the case if you use a marker variable to set the scale), because there is no stipulation that equal factor loadings implies equal item-factor correlations (i.e., standardized loadings, unless an item measures more than one factor, in which case its loadings are semi-partial item-factor correlations).

If you want to constrain standardized factor loadings, that would only be possible if not only the factor loadings themselves were equivalent, but also the factor variances and residual variances. To implement this constraint in practice, as you seem to have noticed, you would need to start by making the indicators' total variances equivalent across groups (e.g., by transforming raw scores to z scores), then constrain all parameters across groups. In this case, you are completely equating all parameters across groups, in which case you may as well just analyze the data as a single group, ignoring any between-group differences. But if you are interested in comparing latent regressions or means across groups, then equating unstandardized measurement parameters is necessary (at least one per factor), whereas equating standardized measurement parameters has nothing to do with that goal.

Terry

Michael Paul Grosz

unread,

Sep 6, 2014, 12:43:40 PM9/6/14

to lav...@googlegroups.com

Dear Sunthud and Terrence,

Thank you very much for your comments.

I would like to investigate differential item functioning, in particular, whether a one point difference in the standing on the latent trait has (on average) the same consequences on answer behavior to a particular item for women and for men.

I overlooked the fact that unstandardized loadings could actually be useful for that purpose when the latent trait is scaled equally for all groups because then the unstandardized loadings describe the relationship/correlation between the standing on latent trait of a person and the answer to a specific item. And equalizing the unstandardized loadings to investigate DIF would then be reasonable. A remaining question is/was : Is the latent trait equally scaled for men and women? So, does a one point difference in factor score correspond to the exact same difference in standing on the latent trait for men and for women?
I assume it does because by the group.equal=c("loadings","intercepts") argument I only equalize the loadings and intercepts but the factor scores for men and women are still calculated on the same scale, right?

best,

Michael

--
You received this message because you are subscribed to a topic in the Google Groups "lavaan" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lavaan/eRRgQwWrkt4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lavaan+un...@googlegroups.com.

Michael Paul Grosz

unread,

Sep 7, 2014, 11:34:48 AM9/7/14

to lav...@googlegroups.com

But then again, is the latent trait scaled equally for all groups (in this case: women and men)?
In order to fix the units of measurement of each latent trait scale, the loading of the first item of every scale are fixed to 1 and that is done separately for men and women. So, the scaling (units of measurement) of the latent trait depends on the relationship between the first item and the latent trait and that relationship might be different for men and women (in the presence of DIF for item 1). Thus, the latent trait might be scaled differently for women and men and the same unstandardized loading might express a strong relationship for women and a less strong relationship for men. So, equalizing the unstandardized loadings might not help to investigate whether there is DIF with regard to the loadings/slopes of the items.

So, I am still not completely convinced that comparing a model with equalized unstandardized loadings to a model with freely varying unstandardized loadings is a valid way to investigate DIF.
But please do correct me if my reasoning went wrong somewhere.

It might help to first identify DIF-free indicators of the latent trait and than use one of these items as the first item of the scale to fix the scale.

best,
Michael

To unsubscribe from this group and all its topics, send an email to lavaan+unsubscribe@googlegroups.com.

Terrence Jorgensen

unread,

Sep 8, 2014, 1:28:43 PM9/8/14

to lav...@googlegroups.com

In order to fix the units of measurement of each latent trait scale, the loading of the first item of every scale are fixed to 1 and that is done separately for men and women. So, the scaling (units of measurement) of the latent trait depends on the relationship between the first item and the latent trait and that relationship might be different for men and women (in the presence of DIF for item 1). Thus, the latent trait might be scaled differently for women and men and the same unstandardized loading might express a strong relationship for women and a less strong relationship for men. So, equalizing the unstandardized loadings might not help to investigate whether there is DIF with regard to the loadings/slopes of the items.

The use of the word "scale" can be ambiguous, so let's first consider an example with an observable variable: height (in centimeters). The mean height differs between sexes, and potentially the SD of height varies as well. But a one-unit change in height is the same for both sexes (i.e., a 1-cm difference). If men have more variable heights than women, that does not mean that they aren't on the same measurement scale as women's heights; it simply means that men's distribution of heights covers a greater range of the continuum of all possible heights, whereas women's distribution of heights covers a smaller range in the same continuum. Thus, different within-group variances do NOT imply different "scales" of measurement for those groups.

This fact is a bit less obvious when we use arbitrary scales such as 5-point Likert scales, but if men and women in your sample both responded on the same arbitrary "continuum" (actually, an ordinal categorical approximation of an assumed underlying continuum), then a 1-point change for men is the same as a 1-point change for women. Differing variances between sexes simply means that a 1-point (unstandardized) difference might translate to a 1/2-SD (standardized) difference for men but a 1/4-SD (standardized) difference for women. Again, those standardized (units of SD) differences are not in the "actual" units (even if those are arbitrary Likert-scale units), so they give you different information. That is, a 1-SD difference among men might be less extreme than a 1-SD difference among women, but that is ONLY informative about relative differences between scores within the same group, not about the differences between individual scores irrespective of groups that can be made on the original unstandardized scale.

A remaining question is/was : Is the latent trait equally scaled for men and women? So, does a one point difference in factor score correspond to the exact same difference in standing on the latent trait for men and for women?

Yes -- although the scale is arbitrary because we haven't actually observed the common factor(s), an unstandardized 1-unit difference on the latent trait is assumed to be equivalent (same scale) in both groups IF at least one factor loading can be constrained across groups.

I assume it does because by the group.equal=c("loadings","intercepts") argument I only equalize the loadings and intercepts but the factor scores for men and women are still calculated on the same scale, right?

Correct -- equating loadings and intercepts does not constrain the factor M and SD to be equal, but it does make it possible to compare men's to women's factor M and SD because they are now using a common (but arbitrary) scale of measurement. If you do not equate any loadings/intercepts, then the latent variables are not using the same scale of measurement, so factor means and SDs cannot be compared.

It might help to first identify DIF-free indicators of the latent trait and than use one of these items as the first item of the scale to fix the scale.

Yes, you should absolutely use a DIF-free item (called an "anchor item" in the IRT literature) as the marker variable if that is how you set the scale for the latent factor(s).

Terry

Michael Paul Grosz

unread,

Sep 8, 2014, 7:28:14 PM9/8/14

to lav...@googlegroups.com

Dear Terrence,

Thanks for your elaborated reply.
I think we now mainly agree on two points
(1) standardizing loadings are not helpful when one wants to investigate DIF
and (2) unstandardized loadings are also problematic if the first item that fixes the scale is not DIF-free (i.e., DIF free anchor item). If there is (non-uniform) DIF in the item that sets the scale than a one-unit difference corresponds to an unequal difference in standing on the latent trait for men compared to women.

So, in the end we just need to find a way to find a DIF-free anchor item and then work with the unstandardized loadings. Although that might be tricky because in order to find a DIF-free anchor item we seem to need a DIF-free anchor item, there are ways to handle this issue, at least in the IRT context (e.g., Meade & Wright (2012): Solving the Measurement Invariance Anchor Item Problem in Item Response Theory). Some of them might be also applicable in the lavaan framework.

cheers,
Michael

Reply all

Reply to author

Forward