Negative variance/marker method in mediation model

Phaedra Longhurst

unread,

Jun 7, 2024, 3:03:50 PM6/7/24

to lavaan

Hi folks,

A PhD student and R newbie here. I'm trying to run a mediation model (sample n = 384) with BAS mediating the relationship between SIS and Flourishing (SIS -> BAS -> Flourishing). SIS consists of a higher-order factor structure – specifically, it is formed by two latent factors, self-identification (with subscales solidarity, satisfaction, centrality) and self-definition (subscales stereotyping and homogeneity).

Here is the script:

model<-'
self_invest =~ Solidarity + Satisfaction + Centrality
self_def =~ Stereotyping + Homogeneity
SIS1 =~ self_invest+ self_def

Flourishing ~ b*BAS + c*SIS1
BAS ~ a*SIS1

#confounders
BAS ~ GENDER+ RACIAL + BODYSIZE +HEALTH + SEXUAL
SIS1 ~ GENDER+RACIAL + BODYSIZE +HEALTH + SEXUAL
Flourishing ~ GENDER+RACIAL + BODYSIZE +HEALTH + SEXUAL

direct1 := c
indirect1 := a*b
total effect1 := c+a*b'

fit<-sem(model, data=data, estimator = "MLM")

Fit indices are largely adequate (CFI = . 958, TLI = 918, RMSEA = .058, SRMR = .042. However, self-investment is coming up with negative variance:

self_invest -0.500 0.205 -2.442 0.015 -0.497 -0.497

I also want to stop lavaan from automatically using marker method, as it is override the first subscale, solidarity:

self_invest =~

Solidarity 1.000 1.003 0.720

Is anyone able to identify: 1. what might be causing the negative variance, and 2. how to rid the marker method? Thanks!

Serena

unread,

Jun 7, 2024, 11:21:21 PM6/7/24

to lavaan

Hi there,

Is this your path analysis model? What are the correlation results of all variables including the first-order factors?

Best,

Serena

Phaedra Longhurst

unread,

Jun 8, 2024, 1:37:42 AM6/8/24

to lavaan

Hi Serena,

Yes this is the path/mediation analysis model. Here are the correlations (used SPSS for this and the output is clearer): Screenshot 2024-06-08 at 06.34.46.png

Yago Luksevicius de Moraes

unread,

Jun 8, 2024, 3:38:18 PM6/8/24

to lavaan

Hi.

As far as I know, it is not possible to isolate the cause of a negative variance (by the way, if you have a negative variance, fit indexes are not so relevant because they are saying your model is mathematically possible despite being empirically impossible). You need to check your data carefully.

For instance, negative variances are common when polychoric correlations are involved and sample size is small. In this case, just collect more data.

In your case, I suspect it is because you have very big zero-order correlations (|r| > 0.7), leading to multicollinearity issues. I would drop some items or collapse them.

Regarding the marker method, you can set `std.lv=TRUE` to release the first factor loading of each latent variable and identify the model by setting latent means to zero and their variances to 1. I don't know if, for second-order models, this will restrict the mean and variance only of the higher order factor or of all them.

Best,

Yago

Serena

unread,

Jun 9, 2024, 12:38:59 AM6/9/24

to lavaan

Hi,

I was also thinking that high correlation might be the reason. They seem fine for the first-order factors. However, the first-order factors highly correlate with second-order factors (>.90). High correlations between second-order and first-order factors might suggest redundancy, indicating that the second-order factor may not provide additional meaningful information beyond the first-order factors. But it always needs to follow the theory you are using.

What are the factor loadings? Have you checked the skewness and kurtosis of your data? Maybe check the basic information about your data to understand it first.

Best,

Serena

Phaedra Longhurst

unread,

Jun 9, 2024, 6:48:01 AM6/9/24

to lavaan

Thanks, both.

Mardia skewness and kurtosis indicate the data for each subscale is neither univariate or multivariate normal; therefore, I have been using MLM and Satorra-Bentler correction. I have ran a CFA for just SIS with each Subscale:

SIScfa<- '
investment =~NA*Solidarity + Satisfaction + Centrality
investment ~~1*investment
definition =~NA*Stereotyping + Homogeneity
definition ~~1*definition'

I then got the following output (after scrapping marker method):

Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
investment =~
Solidarity 1.138 0.071 16.143 0.000 1.138 0.817
Satisfaction 0.858 0.075 11.379 0.000 0.858 0.642
Centrality 0.879 0.086 10.274 0.000 0.879 0.588
definition =~
Stereotyping 1.349 0.071 19.125 0.000 1.349 0.964
Homogeneity 0.928 0.080 11.578 0.000 0.928 0.671

Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
investment ~~
definition 0.795 0.043 18.275 0.000 0.795 0.795

As you can see, the first variable looks dodgy and the covariance between the two higher-order factors is high but potentially acceptable. Any immediate thoughts? I've only ever done CFA for unidimensional measures, so I'm now learning to navigate multi-dimensional CFA!

Yago Luksevicius de Moraes

unread,

Jun 10, 2024, 10:10:58 AM6/10/24

to lavaan

Hi, Phaedra.

If this submodel has good fit, I think you can continue. If the problem returns, maybe add one variable at a time and see what happens can shed some light on the issue.

However, considering how strong is the correlation between your first order factors and that you hypothesized there is a second order factor in your first post, I would try to fit an unidimensional model. You can then run `lavaan::anova` to see if they are statistically different and choose whatever has the lowest AIC/BIC.