Blavaan testing moderation / mediation / independent effects differences

43 views
Skip to first unread message

Carolien T.

unread,
Dec 4, 2025, 2:29:29 PMDec 4
to blavaan
Hi there,

I want to compare three models but I wonder how to do it the cleanest way. It is a mediator, moderator and independent effects model with two covariates which I specified as follows:

H1 moderator: Model1 <- "
 BAG_z ~ gender_beh + age_cen_within
 COG_z ~ CRxBAG_z + BAG_z + CR_z + gender_beh + age_cen_within
"

H2: mediator: Model2 <- "
  BAG_z ~ a*CR_z + gender_beh + age_cen_within
  COG_z ~ b*BAG_z + c*CR_z + gender_beh + age_cen_within

  indirect := a*b
  total    := c + (a*b)
"
H3: independent effects: Model3 <- "
 BAG_z ~  gender_beh + age_cen_within
 COG_z ~ CR_z + BAG_z + gender_beh + age_cen_within
"

I wonder about the covariate specification here, because I only added it to models 1 and 3 to make it more comparable to model 2 in terms of N parameters, but it is actually not NEEDED for the models, because in this model COG_z is the only dependent variable. So I feel like I am including a regression line which is not actually necessary, but without it, I have a different nuisance structure in models 1 and 3 vs. model 2. This is especially relevant, since I am using the multigroup function (with three groups) and I need to constrain the regression paths between those groups except for the covariates (so age and gender effects are allowed and expected to vary between groups while constraining the other paths, and then in a next step also comparing constrained vs. an all  unconstrained model...)

What would you advise? Remove them and accept the rather large difference in parameters or leave them in?

I hope my questions are a bit clear. The comparisons are basically done with z elpd (from loo).

Thanks so much for your advise.

Carolien Torenvliet

Mauricio Garnier-Villarreal

unread,
Dec 8, 2025, 7:39:02 AMDec 8
to blavaan
Hi Carolien

I would recommend to include all the variables in the 3 models. And if you want to remove their effects you can do so by fixing them to 0, but that would make those parameters fixed to 0 part of the test. As doing the loo comparison is an overall comparison. 

The number of N parameters is not that relevant using loo comparison, as it allows to also test for non-nested models. 

Hope this helps

Ed Merkle

unread,
Dec 8, 2025, 11:10:39 AMDec 8
to Carolien T., blavaan
Carolien,

Following on to what Mauricio said: I agree that the number of parameters does not necessarily matter for the model comparison. But what does matter is that the same observed variables are included in each model. This is because the loo comparisons are only meaningful if each model involves the same observed variables.

But this issue can get tricky in blavaan due to the fixed.x argument. When fixed.x = TRUE, the exogenous covariates will not be modeled and will instead be treated similarly to a predictor variable in a regression model. So you could easily encounter a situation where an observed variable is included in two models, but the observed variable is only "modeled" in one of the models. This could be avoided by setting fixed.x = FALSE, so that all the observed variables are explicitly modeled.

On the other hand, fixed.x = TRUE could be useful for doing a model comparison where the exogenous covariates change across models, and you are comparing the models' abilities to predict all the other observed variables (except for the exogenous covariates).

Ed
--
You received this message because you are subscribed to the Google Groups "blavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blavaan+u...@googlegroups.com.

Carolien T.

unread,
Dec 9, 2025, 7:52:37 AMDec 9
to blavaan
Hi Mauricio and Ed,

Thank you so much for your replies. This is helpful! 

I understand that the number of parameters don't really matter for a looic comparison, but it doesn't seem like a fair comparison at the moment (the looic differences are huge) so I started to think of a different approach and the above is what I came up with (adding BAG_z ~ gender + age to all models such that the number of to be explained variabels is the same?). But based on what you are saying , I would also be adding:

BAG_z ~ 0*CR_z + gender_beh + age_cen_within 

To models 1 and 3 to make it more comparable? And 0*CRxBAG to models 2 and 3?

I am not sure I follow you on what fixed.x = FALSE tackles here exactly?

Best,
Carolien

Op maandag 8 december 2025 om 17:10:39 UTC+1 schreef Ed Merkle:

Ed Merkle

unread,
Dec 9, 2025, 2:35:08 PMDec 9
to Carolien T., blavaan
Carolien,

Thanks, and I am not immediately sure that the "0*" parts will accomplish what you want. My first concern would be to ensure that the modeled variables are the same in each model, so that the log-likelihoods are comparable. This is related to the fixed.x thing, which I think is easiest to illustrate with a simple example. Code is below, with explanation after that.

library(lavaan)
data(HolzingerSwineford1939)
hs39 <- HolzingerSwineford1939

m1 <- ' x1 ~ x2 + x3 '

m1fit <- sem(m1, data = hs39)
fitMeasures(m1fit, c('chisq', 'logl'))

m2 <- ' x1 ~ x2 + x3
x2 ~ x3 '

m2fit <- sem(m2, data = hs39)
fitMeasures(m2fit, c('chisq', 'logl'))


These are two simple models of 3 observed variables. The fitMeasures will show that the chisq=0 for both, implying perfect fit. But the log-likelihoods are very different for the two models. It is because of fixed.x = TRUE. In the first model, both x2 and x3 are exogenous covariates so get removed from the model log-likelihood. In the second model, x2 has become endogenous so now becomes included in the model log-likelihood.

The loo comparisons will use these log-likelihoods. So, in the above models, m1fit would look much better but it is because the log-likelihood of x2 is only considered in the second model. And if you re-run those models with fixed.x = FALSE, then the log-likelihoods become equal.

Ed

Carolien T.

unread,
Dec 10, 2025, 5:14:59 AMDec 10
to blavaan

Hi Ed,

Thank you very much for this example — it really helped clarify what setting fixed.x = FALSE does.

What I initially tried to achieve was a comparison of competing theoretical models (moderation, mediation, and independent effects) in which the same observed variables are included, but parameters not implied by a given theory are fixed to zero. For my theoretical question, however, this felt rather stringent: my goal is not so much to test whether constraining specific parameters to zero improves fit, but to evaluate which theoretical structure best explains the data. In that sense, the independent-effects model does not need the moderation effect to be exactly zero to be preferred; it might simply be favored on grounds of parsimony when predictive performance is similar.

Using this approach, my models were specified as follows (with zero-fixed paths included to keep the covariate structure identical across models):

Model1 <- "
  BAG_z ~ 0*CR_z + gender_dum + age_cen_within    # these are included to ensure that the covariate structure is the same, for comparability.
  COG_z ~ BAG_z + CR_z + CRxBAG_z + gender_dum + age_cen_within
"

Model2_constrained <- "
  BAG_z ~ a*CR_z + gender_dum + age_cen_within      
  COG_z ~ b*BAG_z + c*CR_z + 0*CRxBAG_z + gender_dum + age_cen_within


  indirect := a*b
  total    := c + (a*b)
"
Model3 <- "
  BAG_z ~ 0*CR_z + gender_dum + age_cen_within      
  COG_z ~ BAG_z + CR_z + 0*CRxBAG_z + gender_dum + age_cen_within
"

With these specifications, the models showed very similar predictive performance (LOO comparisons with z ELPD differences around .5), such that no clear favorite emerged, although one could argue that the independent-effects model would be preferred based on parsimony.

I now tried an alternative specification using fixed.x = FALSE, removing the zero-fixed paths (including the covariates) and allowing the likelihood to be defined over all observed variables:

Model1 <- "
 COG_z ~ CRxBAG_z + BAG_z + CR_z + gender_dum + age_cen_within
"
Model2_constrained <- "
  BAG_z ~ a*CR_z + gender_dum + age_cen_within
  COG_z ~ b*BAG_z + c*CR_z + gender_dum + age_cen_within


  indirect := a*b
  total    := c + (a*b)
"

Model3 <- "
 COG_z ~ CR_z + BAG_z + gender_dum + age_cen_within
"

With this approach, Models 2 and 3 are preferred, with the mediation model slightly favored over the independent-effects model. However, across these models the posterior estimates consistently indicate that CR does not reliably predict BAG, nor does BAG predict cognition. Given this, it seems somewhat counterintuitive that the mediation model is favored.

Would you say that this latter comparison is therefore addressing a different question, namely, which model best explains the joint distribution of all observed variables, including BAG, whereas the former comparison focuses more directly on predictive performance for cognition?

And in that case, when using fixed.x = FALSE, is it still necessary to explicitly include the interaction term (e.g., CR × BAG) in all models to ensure comparability, or is that no longer required?

Sorry about the long message.. I hope it comes across well.
Many thanks again for your help with this, it has been very insightful.

Best,
Carolien

Op dinsdag 9 december 2025 om 20:35:08 UTC+1 schreef Ed Merkle:

Ed Merkle

unread,
Dec 12, 2025, 9:37:53 AMDec 12
to Carolien T., blavaan
Hi Carolien,

Here are a few thoughts in response to your message:

- About fixing regression weights to 0: if you just want to include a variable in a model but not include it in a regression equation, you could consider a line like

variable_name ~ 1

though covariances with that variable might be added automatically, and you would want to consider whether you want those covariances.


- You are correct that fixed.x = FALSE involves a joint model over all observed variables. So the exogenous variables that you (maybe) don't care about play a larger role in the model evaluation. And you shouldn't necessarily need the interaction term to do a model comparison using loo: the important thing is that the same observed variables are in the model, as opposed to the same combinations of observed variables.


- It would also be possible to keep fixed.x=TRUE and then check each model to make sure the same observed variables are being included in the likelihood. If "fit" is a fitted model, you can do

lavNames(fit) ## observed variables in the model
lavNames(fit, 'ov.x') ## exogenous observed variables in the model

You would want the output of both those commands to be the same for each model.

Ed

Carolien T.

unread,
Dec 16, 2025, 4:38:41 AM (14 days ago) Dec 16
to blavaan
Hi Ed,

Thanks for your helpful message.

With my current strategy: include all *0 for non-relevant variables/regression lines and fixed.x = TRUE, the lavNames provides exactly what we want, so I think this should be a fair comparison. The only thing is that these model comparisons are more strict towards parsimonious models than your suggestion (var ~ 1), as we specify these variables to be zero exactly, so I will mention that in the paper. Just as a sanity check I'll also run var ~ 1 instead of *0. The CR*BAG is then an extra observed variable in the moderation model, but I think you're saying that that doesn't matter because it consists of two included variables in the other model? (if I check lavNames, then CRxBAG is only in the moderation model)

I think the fixed.x is false approach doesn't work so well, as the LOO outcomes really don't match with the outcomes of the regressions, so I think explaining another dependent variable in any way (even though nothing contributes that much) already makes LOO much better while when we inspect the actual outcomes, we see that it isn't in any way helpful.

Thanks again, I think I am really almost there ;) 
Carolien

Op vrijdag 12 december 2025 om 15:37:53 UTC+1 schreef Ed Merkle:

Ed Merkle

unread,
Dec 19, 2025, 4:21:07 PM (10 days ago) Dec 19
to blavaan
No problem, and some follow-ups:

About ~1 vs *0: in some cases, I think this will get you the same model. The "*0" basically says that a certain variable is not involved in the regression, which is similar to "~1". But there could also be differences depending on whether a variable is exogenous, and that is the tricky part.

About the interaction term: if that interaction term is exogenous (in the "ov.x" output), then I believe it should not matter.

Ed
Reply all
Reply to author
Forward
0 new messages