Negative variance/marker method in mediation model

85 views
Skip to first unread message

Phaedra Longhurst

unread,
Jun 7, 2024, 3:03:50 PM6/7/24
to lavaan
Hi folks, 

A PhD student and R newbie here. I'm trying to run a mediation model (sample n = 384) with BAS mediating the relationship between SIS and Flourishing (SIS -> BAS -> Flourishing). SIS consists of a higher-order factor structure – specifically, it is formed by two latent factors, self-identification (with subscales solidarity, satisfaction, centrality) and self-definition (subscales stereotyping and homogeneity). 

Here is the script: 
model<-'
self_invest =~ Solidarity + Satisfaction + Centrality
self_def =~ Stereotyping + Homogeneity
SIS1 =~ self_invest+ self_def

Flourishing ~ b*BAS + c*SIS1
BAS ~ a*SIS1

#confounders
BAS ~ GENDER+ RACIAL + BODYSIZE +HEALTH + SEXUAL
SIS1 ~ GENDER+RACIAL + BODYSIZE +HEALTH + SEXUAL
Flourishing ~ GENDER+RACIAL + BODYSIZE +HEALTH + SEXUAL

direct1 := c
indirect1 := a*b
total effect1 := c+a*b'
fit<-sem(model, data=data, estimator = "MLM")

Fit indices are largely adequate (CFI = . 958, TLI = 918, RMSEA = .058, SRMR = .042. However, self-investment is coming up with negative variance: 
self_invest      -0.500    0.205   -2.442    0.015   -0.497   -0.497

I also want to stop lavaan from automatically using marker method, as it is override the first subscale, solidarity: 
self_invest =~                                                        
 Solidarity        1.000                               1.003    0.720

Is anyone able to identify: 1. what might be causing the negative variance, and 2. how to rid the marker method? Thanks!

Serena

unread,
Jun 7, 2024, 11:21:21 PM6/7/24
to lavaan
Hi there,

Is this your path analysis model? What are the correlation results of all variables including the first-order factors? 

Best,
Serena

Phaedra Longhurst

unread,
Jun 8, 2024, 1:37:42 AM6/8/24
to lavaan
Hi Serena, 

Yes this is the path/mediation analysis model. Here are the correlations (used SPSS for this and the output is clearer): Screenshot 2024-06-08 at 06.34.46.png

Yago Luksevicius de Moraes

unread,
Jun 8, 2024, 3:38:18 PM6/8/24
to lavaan
Hi.

As far as I know, it is not possible to isolate the cause of a negative variance (by the way, if you have a negative variance, fit indexes are not so relevant because they are saying your model is mathematically possible despite being empirically impossible). You need to check your data carefully.

For instance, negative variances are common when polychoric correlations are involved and sample size is small. In this case, just collect more data.
In your case, I suspect it is because you have very big zero-order correlations (|r| > 0.7), leading to multicollinearity issues. I would drop some items or collapse them.

Regarding the marker method, you can set `std.lv=TRUE` to release the first factor loading of each latent variable and identify the model by setting latent means to zero and their variances to 1. I don't know if, for second-order models, this will restrict the mean and variance only of the higher order factor or of all them.

Best,
Yago

Serena

unread,
Jun 9, 2024, 12:38:59 AM6/9/24
to lavaan
Hi,

I was also thinking that high correlation might be the reason. They seem fine for the first-order factors. However, the first-order factors highly correlate with second-order factors  (>.90). High correlations between second-order and first-order factors might suggest redundancy, indicating that the second-order factor may not provide additional meaningful information beyond the first-order factors. But it always needs to follow the theory you are using.

What are the factor loadings? Have you checked the skewness and kurtosis of your data? Maybe check the basic information about your data to understand it first.

Best,
Serena

Phaedra Longhurst

unread,
Jun 9, 2024, 6:48:01 AM6/9/24
to lavaan
Thanks, both. 

Mardia skewness and kurtosis indicate the data for each subscale is neither univariate or multivariate normal; therefore, I have been using MLM and Satorra-Bentler correction. I have ran a CFA for just SIS with each Subscale: 

SIScfa<- '
investment =~NA*Solidarity + Satisfaction + Centrality
investment ~~1*investment
definition =~NA*Stereotyping + Homogeneity
definition ~~1*definition'
 
I then got the following output (after scrapping marker method): 

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  investment =~                                                        
    Solidarity        1.138    0.071   16.143    0.000    1.138    0.817
    Satisfaction      0.858    0.075   11.379    0.000    0.858    0.642
    Centrality        0.879    0.086   10.274    0.000    0.879    0.588
  definition =~                                                        
    Stereotyping      1.349    0.071   19.125    0.000    1.349    0.964
    Homogeneity       0.928    0.080   11.578    0.000    0.928    0.671

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  investment ~~                                                        
    definition        0.795    0.043   18.275    0.000    0.795    0.795

As you can see, the first variable looks dodgy and the covariance between the two higher-order factors is high but potentially acceptable. Any immediate thoughts? I've only ever done CFA for unidimensional measures, so I'm now learning to navigate multi-dimensional CFA!

Yago Luksevicius de Moraes

unread,
Jun 10, 2024, 10:10:58 AM6/10/24
to lavaan
Hi, Phaedra.

If this submodel has good fit, I think you can continue. If the problem returns, maybe add one variable at a time and see what happens can shed some light on the issue.
However, considering how strong is the correlation between your first order factors and that you hypothesized there is a second order factor in your first post, I would try to fit an unidimensional model. You can then run `lavaan::anova` to see if they are statistically different and choose whatever has the lowest AIC/BIC.

Best,
Yago

Reply all
Reply to author
Forward
0 new messages