Negative factor loadings in measurement invaraince

ahmad

unread,

Aug 13, 2025, 10:13:26 AMAug 13

to lavaan

Hi,

I am examining measurement invariance of ADHD items in neurodevelopmental populations. When I used the scale in the full sample, all factor loadings showed positive relationships with the latent factor. However, in the ADHD group, three out of five items had negative factor loadings.

I checked the correlations between items outside of the CFA in the ADHD group and found that these three items were negatively correlated with the other two items. This is not due to reversed scoring or data errors. It reflects the actual data pattern.

My question is: how should I handle these items when testing measurement invariance? In the ADHD group, three items show negative factor loadings while the other two show positive loadings, yet in the full sample, all five items load positively. There are two main issues:

The contrast between the full sample and the ADHD group.
The contrast within the ADHD group itself.

I would appreciate guidance on addressing this to establish measurement invariance.

Best,

A

Jeremy Miles

unread,

Aug 13, 2025, 12:53:27 PMAug 13

to lav...@googlegroups.com

This does not surprise me. I don't think there's anything to address.

You are splitting the sample on the thing that you are measuring, so you are restricting the range. This will reduce the variances of the items, and will reduce the correlations. It's possible that they will shrink to zero and if the correlations are small and negative you're just seeing random variation around zero.

To get an ADHD diagnosis, you need to score highly on at least one of the dimensions of ADHD. By splitting the sample, you are selecting people who scored high on one and regression to the mean means that they are likely to score low on others.

Here's an example: Think of an assessment of kids in school for various athletic activities. You assess kids on 100m runs, 1 mile run, 5 mile run, high jump, soccer, basketball, etc. You would probably find that all of the variables are highly correlated, and you might find a single factor - which you might call sportiness. People who are good at athletic activities tend to be good at many. (When they run a marathon, the fastest marathon runner runs every 100 meters in 17 seconds. I don't think I could do that for 100 meters, and every mile in 4.6 minutes - I definitely can't do that.) I would imagine that person would also beat me at soccer, basketball, long jump, and anything else you can think of.

But take the kids who score very highly on the athletics factor - or use some external criterion, like kids who represented their school at athletics, and calculate the correlations. Some will be better at 1 mile run, and worse at 100 meters (compared to the other sporty kids, not compared to the population in general). Some will be good at soccer, and worse at basketball. If you look at the correlations, you might not be surprised to find that they are negative - you don't train for 100 meter runs and 5 mile runs. So you will find measurement invariance.

I wouldn't test invariance. I don't think it's an interesting thing to do, except from the perspective of demonstrating that this effect happens.

(You could play around with the simsem package to show that this happens [or does not happen, if I'm wrong]. I suspect that if everything is multivariate normal, you won't see this effect - but that's rather rare).

Jeremy

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/lavaan/ed1d3dd8-987e-49c2-8ae3-3d035f562d4dn%40googlegroups.com.

ahmad

unread,

Aug 14, 2025, 7:57:19 AMAug 14

to lavaan

Thank you for your response, Jeremey.

However, I need to know how to proceed despite this issue as I need to use multigroup SEM to compare these groups.

Best,

A

Jeremy Miles

unread,

Aug 14, 2025, 12:07:09 PMAug 14

to lav...@googlegroups.com

On Thu, 14 Aug 2025 at 04:57, ahmad <valik...@gmail.com> wrote:

Thank you for your response, Jeremey.

However, I need to know how to proceed despite this issue as I need to use multigroup SEM to compare these groups.

I don't see any reason you can't use multigroup SEM. The scale will probably not be invariant. You should not be surprised by this.

Out of interest, why do you need to do this?

Jeremy

ahmad

unread,

Aug 15, 2025, 7:37:36 AMAug 15

to lavaan

So you’re saying that even if some factor loadings are negative while others are positive in the ADHD group, whereas all factor loadings are positive in the full sample, I can still proceed to run multiple SEMs, right?

If I cannot establish measurement invariance, can I still examine my model separately in the ADHD population, even if some factor loadings are positive and some are negative? Is that correct?

I’m doing this because I have a mediation model, and I want to see whether the direct and indirect paths are the same across groups.

Thanks,

A

Edward Rigdon

unread,

Aug 15, 2025, 10:08:12 AMAug 15

to lav...@googlegroups.com

Ahmad--

The failure of measurement invariance means that you cannot dismiss this confound: results may differ across groups either because the behaviors of the factors are different or because the behaviors of the indicators are different. This, in turn, may mean that comparisons of the factors' behaviors across the two groups are not interesting or not persuasive as evidence for a theoretical model that proposes differences or similarities in the factors' behaviors. Your results could be valuable for understanding the behavior of your indicators in the neurodivergent population, especially if there is a history of the indicators being used in such a population.

--
You received this message because you are subscribed to the Google Groups "lavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lavaan+un...@googlegroups.com.

To view this discussion visit https://groups.google.com/d/msgid/lavaan/ff75c9a2-c90e-4d3d-9028-bd1eacd5449an%40googlegroups.com.

ahmad

unread,

Aug 15, 2025, 11:03:58 AMAug 15

to lavaan

Thank you for your response. In your opinion, what would be the best practice here? Should I run the mediation model separately for each group?

Best,

A

Jeremy Miles

unread,

Aug 15, 2025, 1:59:33 PMAug 15

to lav...@googlegroups.com

Ed: Thanks for chipping in.

Ahmad: You can't interpret a mediation model where the measurement models are different, because if the parameters are different, this doesn't tell if you if it's because there are differences in the structural model or the measurement model.

Where does the ADHD measure come in the mediation model: is it predictor (X), mediator (M) or outcome (Y). Sorry to apply the xy problem to this, but why are you doing it? It seems like you are splitting your sample based on a measure, and then putting that measure into the model (or maybe it's not the measure, but it is presumably highly correlated with the model). This is what you would do if you were testing some sort of interaction effect or some sort of non-linear effect (depending on where the adhd measure sits in the meditation model). Is that what you want to do? If so, i think there might be better ways.

Jeremy

To view this discussion visit https://groups.google.com/d/msgid/lavaan/7ebe51e6-69be-44f5-89ad-19ea796fb754n%40googlegroups.com.

Alejandro Martínez Menéndez

unread,

Aug 16, 2025, 3:18:12 AMAug 16

to lavaan

Which groups are you trying to compare and why? I see no real point in conducting invariance analyses between a reduced, specific subsample (ADHD) and the whole sample (WS) (by "whole sample" I assume you mean the whole sample minus the ADHD subsample, as otherwise you'd be partially comparing observations with themselves). I guess that it would be reasonable for such two models, ADHD and WS, to hold configural invariance (same factor structure). Non-invariance should not really be a concern (it would be a valid and interesting result, even) unless you're trying to validate a new instrument/measure.

In this specific scenario, failing to establish metric invariance (for simplicity, let's obviate a potential threshold invariance as we don't know if ov's are categorical) would mean that the established "ADHD construct" has significantly different meanings between actually diagnosed ADHD individuals and normotypical subjects. Consequently, you could not reliably measure the "degree of ADHD" that an undiagnosed person may have, as we already know that non-ADHD respondents are "understanding" a completely diffrerent construct than ADHD individuals even when using the same set of questions/exercises/trials.

Normally, for a test to (basically) accomplish proper measurement, metric invariance should (at least partially) hold (we know the items answer for the same construct regardless of sociodemo group), while scalar invariance would, most likely, not hold given that ADHD individuals would be supposed to have (stat. significant) different observable means/intercepts from non-ADHD people. On the contrary, residual invariance should, ideally, hold as it would mean that both groups have a similar unexplained variance in each item.

If you're not doing a valiadtion study then, as Jeremy said, you don't really have any kind of issue here.

ahmad

unread,

Aug 16, 2025, 5:26:41 AMAug 16

to lavaan

ADHD is modelled as a latent predictor. I divided the sample based on the ADHD scale, and now I would like to examine whether the mediation model is also applicable to the ADHD group using the same ADHD scale.

Shu Fai Cheung (張樹輝)

unread,

Aug 16, 2025, 6:57:21 AMAug 16

to lavaan

A quick demo of the problem Jeremy explained in the first reply:

``` r
library(lavaan)
#> This is lavaan 0.6-19
#> lavaan is FREE software! Please report any bugs.
library(psych)
#>
#> Attaching package: 'psych'
#> The following object is masked from 'package:lavaan':
#>
#> cor2cov

mod_pop <-
"
f =~ .7 * x1 + .7 * x2 + .7 * x3 + .7 * x4
"

# "empirical = TRUE" is optional. Used here for demo
dat <- simulateData(model = mod_pop,
sample.nobs = 10000,
standardized = TRUE,
empirical = TRUE,
seed = 2345)

mod <-
"
f =~ x1 + x2 + x3 + x4
"

fit <- cfa(mod, dat, std.lv = TRUE)
parameterEstimates(fit)
#> lhs op rhs est se z pvalue ci.lower ci.upper
#> 1 f =~ x1 0.70 0.01 70.849 0 0.681 0.719
#> 2 f =~ x2 0.70 0.01 70.849 0 0.681 0.719
#> 3 f =~ x3 0.70 0.01 70.849 0 0.681 0.719
#> 4 f =~ x4 0.70 0.01 70.849 0 0.681 0.719
#> 5 x1 ~~ x1 0.51 0.01 52.223 0 0.491 0.529
#> 6 x2 ~~ x2 0.51 0.01 52.223 0 0.491 0.529
#> 7 x3 ~~ x3 0.51 0.01 52.223 0 0.491 0.529
#> 8 x4 ~~ x4 0.51 0.01 52.223 0 0.491 0.529
#> 9 f ~~ f 1.00 0.00 NA NA 1.000 1.000

# Divide into two groups using the scale scores
dat$score <- rowMeans(dat[, 1:4])
score_mean <- mean(dat$score)
score_sd <- sd(dat$score)
# Try changing the cutoff and see what happens
cutoff <- score_mean + .5 * score_sd
dat$group <- ifelse(dat$score < cutoff,
"lower",
"upper")
head(dat)
#> x1 x2 x3 x4 score group
#> 1 0.6132076 -0.3032094 -0.05372075 -1.3581287 -0.2754628 lower
#> 2 -0.2562410 -2.2107157 -0.32678316 -0.2849683 -0.7696771 lower
#> 3 1.4765655 1.3920863 0.18409890 0.5586193 0.9028425 upper
#> 4 -0.4380003 -0.3737520 0.82165120 1.0398746 0.2624434 lower
#> 5 -0.2400874 -1.0359636 -0.85575084 -1.3957762 -0.8818945 lower
#> 6 0.0552557 0.1245552 0.46427104 -0.5391987 0.0262208 lower
table(dat$group)
#>
#> lower upper
#> 6922 3078

# Full sample correlations and distribution
round(cor(dat[, 1:4]), 2)
#> x1 x2 x3 x4
#> x1 1.00 0.49 0.49 0.49
#> x2 0.49 1.00 0.49 0.49
#> x3 0.49 0.49 1.00 0.49
#> x4 0.49 0.49 0.49 1.00
describe(dat, range = FALSE)
#> vars n mean sd skew kurtosis se
#> x1 1 10000 0.00 1.00 -0.05 0.04 0.01
#> x2 2 10000 0.00 1.00 -0.01 0.03 0.01
#> x3 3 10000 0.00 1.00 -0.01 -0.02 0.01
#> x4 4 10000 0.00 1.00 -0.04 -0.04 0.01
#> score 5 10000 0.00 0.79 -0.04 0.04 0.01
#> group* 6 10000 1.31 0.46 0.83 -1.31 0.00
# Compare the correlations
round(cor(dat[dat$group == "lower", 1:4]), 2)
#> x1 x2 x3 x4
#> x1 1.00 0.25 0.26 0.25
#> x2 0.25 1.00 0.25 0.26
#> x3 0.26 0.25 1.00 0.26
#> x4 0.25 0.26 0.26 1.00
round(cor(dat[dat$group == "upper", 1:4]), 2)
#> x1 x2 x3 x4
#> x1 1.00 0.07 0.06 0.11
#> x2 0.07 1.00 0.05 0.06
#> x3 0.06 0.05 1.00 0.07
#> x4 0.11 0.06 0.07 1.00
# Compare the distributions
describe(dat[dat$group == "lower", ], range = FALSE)
#> vars n mean sd skew kurtosis se
#> x1 1 6922 -0.40 0.83 -0.29 0.12 0.01
#> x2 2 6922 -0.40 0.82 -0.23 0.22 0.01
#> x3 3 6922 -0.40 0.83 -0.21 0.10 0.01
#> x4 4 6922 -0.39 0.84 -0.20 0.05 0.01
#> score 5 6922 -0.40 0.55 -0.85 0.72 0.01
#> group* 6 6922 1.00 0.00 NaN NaN 0.00
describe(dat[dat$group == "upper", ], range = FALSE)
#> vars n mean sd skew kurtosis se
#> x1 1 3078 0.89 0.73 0.19 0.12 0.01
#> x2 2 3078 0.91 0.72 0.20 0.21 0.01
#> x3 3 3078 0.90 0.72 0.23 0.09 0.01
#> x4 4 3078 0.88 0.72 0.22 0.01 0.01
#> score 5 3078 0.90 0.40 1.08 1.04 0.01
#> group* 6 3078 1.00 0.00 NaN NaN 0.00

# Multigroup CFA
fit_gp <- cfa(mod, dat, group = "group")
(est <- parameterEstimates(fit_gp))
#> lhs op rhs block group est se z pvalue ci.lower ci.upper
#> 1 f =~ x1 1 1 1.000 0.000 NA NA 1.000 1.000
#> 2 f =~ x2 1 1 0.984 0.046 21.589 0.000 0.895 1.074
#> 3 f =~ x3 1 1 0.999 0.046 21.652 0.000 0.908 1.089
#> 4 f =~ x4 1 1 1.033 0.048 21.737 0.000 0.940 1.126
#> 5 x1 ~~ x1 1 1 0.522 0.012 43.690 0.000 0.499 0.546
#> 6 x2 ~~ x2 1 1 0.512 0.012 43.879 0.000 0.489 0.535
#> 7 x3 ~~ x3 1 1 0.513 0.012 43.446 0.000 0.490 0.536
#> 8 x4 ~~ x4 1 1 0.526 0.012 42.776 0.000 0.502 0.550
#> 9 f ~~ f 1 1 0.174 0.011 15.507 0.000 0.152 0.195
#> 10 x1 ~1 1 1 -0.397 0.010 -39.553 0.000 -0.416 -0.377
#> 11 x2 ~1 1 1 -0.404 0.010 -40.764 0.000 -0.424 -0.385
#> 12 x3 ~1 1 1 -0.402 0.010 -40.421 0.000 -0.422 -0.383
#> 13 x4 ~1 1 1 -0.392 0.010 -38.679 0.000 -0.412 -0.372
#> 14 f ~1 1 1 0.000 0.000 NA NA 0.000 0.000
#> 15 f =~ x1 2 2 1.000 0.000 NA NA 1.000 1.000
#> 16 f =~ x2 2 2 0.626 0.163 3.845 0.000 0.307 0.944
#> 17 f =~ x3 2 2 0.619 0.162 3.832 0.000 0.303 0.936
#> 18 f =~ x4 2 2 1.011 0.263 3.844 0.000 0.496 1.527
#> 19 x1 ~~ x1 2 2 0.478 0.020 23.577 0.000 0.439 0.518
#> 20 x2 ~~ x2 2 2 0.503 0.015 33.778 0.000 0.474 0.532
#> 21 x3 ~~ x3 2 2 0.502 0.015 33.880 0.000 0.473 0.531
#> 22 x4 ~~ x4 2 2 0.468 0.020 22.953 0.000 0.428 0.508
#> 23 f ~~ f 2 2 0.056 0.017 3.249 0.001 0.022 0.090
#> 24 x1 ~1 2 2 0.892 0.013 67.680 0.000 0.866 0.918
#> 25 x2 ~1 2 2 0.909 0.013 69.577 0.000 0.883 0.934
#> 26 x3 ~1 2 2 0.905 0.013 69.365 0.000 0.879 0.930
#> 27 x4 ~1 2 2 0.882 0.013 67.496 0.000 0.856 0.907
#> 28 f ~1 2 2 0.000 0.000 NA NA 0.000 0.000
# Compare the loadings
cbind(est[1:4, "est"],
est[15:18, "est"])
#> [,1] [,2]
#> [1,] 1.0000000 1.0000000
#> [2,] 0.9842726 0.6255983
#> [3,] 0.9985078 0.6193028
#> [4,] 1.0328542 1.0112341
```

<sup>Created on 2025-08-16 with [reprex v2.1.1](https://reprex.tidyverse.org)</sup>

The demo assumes that one group has more cases than the other (mean + .5 SD as the cutoff), which I believe is what happens in your dataset.

The CFA model is the true model in this example. However, when the sample is split into two using the scale scores, the correlations in both groups are smaller than those in the full sample. Some factor loadings are also substantially different between groups.

For this multivariate normal dataset, there is a value of cutoff that can yield similar factor loadings between groups. However, the correlations will still be smaller than those in the full sample.

A side issue: Even though the full population is multivariate normal, the two subpopulations are not. The skewness and kurtosis values can demonstrate this for the two groups.

This demo assumes that the CFA model is the true model and that there is "no group", that is, no heterogeneous subpopulations, for this model. This assumption may not be sensible theoretically, but this is what we assume when we fit the CFA model to the full sample. If this is indeed the model you believe in, then the sample should not be split into two. The latent factor behind the items is assumed to be continuous in this model.

The problem demonstrated above occurred when you used the factor, or the scale mean based on the factor, to create the two groups. If you have a third variable to divide the samples into two groups, such as clinical diagnosis, then you can do multigroup analysis and examine measurement invariance. The issue above may still be present because the latent factor in the CFA model may, or should, still be related to the third variable. However, in this case, the full-sample model is no longer relevant because the two samples are considered to be from two different populations. Actually, with this third variable, it may not make sense to analyze the full sample using a single-group model, even if measurement invariance can be established (because factor means may be different).

If you do not have other variables to form the group, but think that, theoretically, there are two heterogeneous populations underlying the data, and these two populations may have different factor means, factor loadings, etc., perhaps what you want to try is a finite mixture model.