conditional means for multi-group sem model

Niels Skovgaard-Olsen

unread,

Jun 22, 2024, 7:21:35 AM6/22/24

to blavaan

Hi!

I am fitting a multigroup sem in blavaan:

model <- '
Israel ~ c(a1.g1,a1.g2,a1.g3,a1.g4)*political +
c(pi.g1,pi.g2,pi.g3,pi.g4)*P +
c(wi.g1,wi.g2,wi.g3,wi.g4)*IWAH +
c(b1i.g1,b1i.g2,b1i.g3,b1i.g4)*Authority +
c(b2i.g1,b2i.g2,b2i.g3,b2i.g4)*Fairness +
c(b3i.g1,b3i.g2,b3i.g3,b3i.g4)*Harm +
c(b4i.g1,b4i.g2,b4i.g3,b4i.g4)*Ingroup +
c(b5i.g1,b5i.g2,b5i.g3,b5i.g4)*Purity

political ~ c(b1.g1,b1.g2,b1.g3,b1.g4)*Authority +
c(b2.g1,b2.g2,b2.g3,b2.g4)*Fairness +
c(b3.g1,b3.g2,b3.g3,b3.g4)*Harm +
c(b4.g1,b4.g2,b4.g3,b4.g4)*Ingroup +
c(b5.g1,b5.g2,b5.g3,b5.g4)*Purity

donation ~ c(d1.g1,d1.g2,d1.g3,d1.g4)*political +
c(pd.g1,pd.g2,pd.g3,pd.g4)*P +
c(wd.g1,wd.g2,wd.g3,wd.g4)*IWAH
'

m_blavaan <- bsem(model, data = df,group = "Classification",
bcontrol=list(cores = cores),dp=mydp,
n.chains = cores, burnin = 7500, sample = 15000,
mcmcfile = TRUE, save.lvs = TRUE)

Since blavaan 0.5-4 enabled use of the "newdata" argument for the blavPredict() function, I would like to sample posterior conditional means based on a newdata reference grid, where I set all the other predictors to their mean values:

col_names <- c("Israel","donation", "political", "P", "IWAH", "Authority", "Fairness", "Harm", "Ingroup", "Purity")

predictor_means <- colMeans(dw_2[, col_names])

new_data <- data.frame(matrix(nrow=4,ncol=11))
colnames(new_data) <- c(col_names, "Classification")
new_data[1, -11] <- t(predictor_means)
new_data[2, -11] <- t(predictor_means)
new_data[3, -11] <- t(predictor_means)
new_data[4, -11] <- t(predictor_means)
new_data[,11] <- as.factor(unique(dw_2$Classification))

I just re-installed blavaan from CRAN ( blavaan 0.5-5 ) and when I run this command:

posterior_samples <- blavPredict(m_blavaan, type = "ov", newdata = new_data)

I get an error saying that the index of sims[,,i] is outside its limits:

Fehler in sims[, , i] : Indizierung außerhalb der Grenzen

Thanks in advance!

Niels

Niels Skovgaard-Olsen

unread,

Jun 22, 2024, 8:32:28 AM6/22/24

to blavaan

changing type to "ypred" produces the same error:

posterior_samples <- blavPredict(Models_latent_M1[[2]], type = "ypred", newdata = new_data)

Fehler in sims[, , i] : Indizierung außerhalb der Grenzen

Best,

Niels

Ed Merkle

unread,

Jun 23, 2024, 1:38:23 PM6/23/24

to Niels Skovgaard-Olsen, blavaan

Niels,

Thanks for the report. Just to make sure, your model has no latent variables? I think that may be causing the problem, and I need to look at it some more. I also just fixed a related problem for models with latent variables.

Ed

--
You received this message because you are subscribed to the Google Groups "blavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blavaan+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/blavaan/e0ea9470-f8ff-4550-8840-073cb757a574n%40googlegroups.com.

Ed Merkle

unread,

Jun 24, 2024, 10:52:47 PM6/24/24

to Niels Skovgaard-Olsen, blavaan

It may take me a few days to dig in to this. But I wonder whether the fixed.x = TRUE argument helps at all. Right now, blavaan is modeling the predictors alongside the response variables, which may be why the predictions are different from what you expect. The fixed.x = TRUE argument treats predictors more like they would be treated in traditional regression.

Ed

On Sun, 2024-06-23 at 22:07 +0200, Niels Skovgaard-Olsen wrote:

Hi Ed,

it appears that within blav_Predict(), tmpres produces posterior predictions for the predictions set, e.g., to the following values [,4:10]:

Israel political donation P IWAH Authority Fairness Harm Ingroup
1 0.5 0.8333333 0.00 0.42857143 0.1944444 0.3666667 0.2592593 0.2142857 0.4333333
2 0.0 0.6666667 0.09 0.28571429 0.4444444 0.5000000 0.6296296 0.7142857 0.3000000
3 0.5 0.8333333 0.00 0.04761905 0.4722222 0.6666667 0.8148148 0.8214286 0.8000000
4 0.0 0.3333333 1.00 0.26190476 0.5555556 0.5666667 0.5925926 0.7857143 0.5000000
     Purity
1 0.5333333
2 0.5666667
3 0.8333333
4 0.6666667

Whereas the values I set the predictors to in newdata were [,4:10] in the following data frame:

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0 0 0 0.2168682 0.4997795 0.5253086 0.6957672 0.7243008 0.4376543 0.4845679
[2,] 0 0 0 0.2168682 0.4997795 0.5253086 0.6957672 0.7243008 0.4376543 0.4845679
[3,] 0 0 0 0.2168682 0.4997795 0.5253086 0.6957672 0.7243008 0.4376543 0.4845679
[4,] 0 0 0 0.2168682 0.4997795 0.5253086 0.6957672 0.7243008 0.4376543 0.4845679

What I was hoping for was for a way to sample predictions once the predictors were set to the same values across the different groups in the multi-group SEM, to be able to calculate contrast effects and HDI for the group differences in predictions based on the model.

Best,
Niels

Am So., 23. Juni 2024 um 19:42 Uhr schrieb Niels Skovgaard-Olsen <n.s....@gmail.com>:
Hi!

yes, it appears that the error occurs, because there are no latent variables in the model, which makes the third dimension of the sims[,,i] array empty.

By modifying:
1) the "blav_fill_newdata()" function to check for the presence of latent variables with "any(grepl("^eta", object@ParTable$lhs))":

if (lvs && any(grepl("^eta", object@ParTable$lhs))) { newlvs <- samp_lvs(object@external$mcmcout, object@Model, object@ParTable, smd, eeta = NULL, categorical = FALSE) lvsumm <- as.matrix(rstan::monitor(newlvs, print=FALSE)) cmatch <- match(colnames(object@external$stansumm), colnames(lvsumm)) stansumm <- object@external$stansumm lvcols <- grep("^eta", rownames(stansumm)) if (length(lvcols) > 0) stansumm <- stansumm[-lvcols, ] object@external$stansumm <- rbind(stansumm, lvsumm[,cmatch]) object@external$stanlvs <- newlvs }

and 2) the "samp_lvs()" function to return an empty array, if there are no latent variables:

if (standata$w9use + standata$w9no == 0) { nsamps <- nrow(mcmcout) nchain <- length(mcmcout) return(array(dim = c(nsamps, nchain, 0)))}

the "blavPredict(Models_latent_M1[[2]], type = "yhat", newdata = new_data)" function call no longer produces the "sim[,,i] index out of boundary" error with my model.

However, one thing puzzles me:
for the new data argument, I supplied the grand mean of the predictors across all levels of the grouping variable (in four rows for four group levels).
Yet, when I inspect the samples generated by type = "yhat" and "ypred", then the predictors differ in their values across the rows.

A printout of
str(object@Data@X)

from blav_fill_newdata() shows that internal to it, the newdata is correctly formated with the same grandmean for the predictors (and here supplied NA values for the outcome variables to be predicted by the model):

List of 4
$ : num [1, 1:10] NA NA NA 0.217 0.5 ...
$ : num [1, 1:10] NA NA NA 0.217 0.5 ...
$ : num [1, 1:10] NA NA NA 0.217 0.5 ...
$ : num [1, 1:10] NA NA NA 0.217 0.5 ...

I then inspected:

standata <- object@external$mcmcdata

and it confirmed that the first three columns of the outcome variables were set to zero and the following columns were correctly set to their grandmeans (as I had provided in the newdata argument):

new_dat_fill <- blav_fill_newdata(Models_latent_M1[[2]], new_data)

new_dat_fill@external$mcmcdata$YXbar
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0 0 0 0.2168682 0.4997795 0.5253086 0.6957672 0.7243008 0.4376543 0.4845679
[2,] 0 0 0 0.2168682 0.4997795 0.5253086 0.6957672 0.7243008 0.4376543 0.4845679
[3,] 0 0 0 0.2168682 0.4997795 0.5253086 0.6957672 0.7243008 0.4376543 0.4845679
[4,] 0 0 0 0.2168682 0.4997795 0.5253086 0.6957672 0.7243008 0.4376543 0.4845679

I am right now trying to locate the exact place within blavPredict() where these grandmean predictors get overwritten by different values of the predictors of each row (which I try to avoid so that I can calculate contrast effects for the predicted values across rows).

Best,
Niels

I just re-installed blavaan from CRAN (blavaan 0.5-5) and when I run this command:

Niels Skovgaard-Olsen

unread,

Jun 25, 2024, 12:13:53 PM6/25/24

to Ed Merkle, blavaan

Hi Ed,

thanks for your reply!

I did not explicitly set "fixed.x = TRUE" when fitting the model. I got the impression that it was set to TRUE for exogenous covariates that appear in the structural part of the model, which do not have a measurement model, but perhaps this is wrong and I should set it explicitly and refit the model.

It is also possible that blavPredict() has a different intended use. I assumed that providing conditional means based on a newdata argument would be similar (for a model without latent factors) to the calculation of conditional_effects(..., newdata, categorical = TRUE) via the newdata argument in brms.

But it occurred to me that for the special case of calculating contrast effects for the outcome variable based on the predictors set to grand means that for z-transformed variables, such contrasts can be calculated for the intercepts retrieved via blavInspect(..., "mcmc").

This approach would also apply to ordinal outcome variables, but further complications arise, since the intercepts are now on a latent scale and the thresholds need to be set equal across groups for multi-group SEM models (group.equal = "thresholds"). Still contrast effects in the posterior draws across groups on the latent scale can be calculated, and the mean intercepts from the blavaan summary can be compared to the fixed thresholds across groups to interpret which ordinal outcome is predicted for each group, once the predictors are held constant at zero ( = their means for z-transformed predictors).

Best,

Niels

--

PD Dr. Dr. Niels Skovgaard-Olsen

https://www.researchgate.net/profile/Niels_Skovgaard_Olsen

Niels Skovgaard-Olsen

unread,

Jun 25, 2024, 12:17:28 PM6/25/24

to Ed Merkle, blavaan

...since ordinal outcome variables have not yet been implemented in blavPredict(.., newdata), the comparison would be to conditional_effects(..., newdata) - and not apply to ordinal variables....

Mauricio Garnier-Villarreal

unread,

Jun 26, 2024, 5:55:42 AM6/26/24

to blavaan

Niels

I havent done this type of conditional means with SEM, but if you can do this approach with lavaan you could write a function to calculate them as lavaan objects. Then pass that function to the blavaan object with the ppmc() function. So, that the function is calculated across the saved posterior draws

hope this helps

Niels Skovgaard-Olsen

unread,

Jun 26, 2024, 10:28:06 AM6/26/24

to Ed Merkle, blavaan

--
PD Dr. Dr. Niels Skovgaard-Olsen

https://www.researchgate.net/profile/Niels_Skovgaard_Olsen

Niels Skovgaard-Olsen

unread,

Jun 26, 2024, 10:28:06 AM6/26/24

to Ed Merkle, blavaan

Hi!

yes, it appears that the error occurs, because there are no latent variables in the model, which makes the third dimension of the sims[,,i] array empty.

By modifying:

1) the "blav_fill_newdata()" function to check for the presence of latent variables with " any(grepl("^eta", object@ParTable$lhs)) ":

if (lvs && any(grepl("^eta", object@ParTable$lhs))) { newlvs <- samp_lvs(object@external$mcmcout, object@Model, object@ParTable, smd, eeta = NULL, categorical = FALSE) lvsumm <- as.matrix(rstan::monitor(newlvs, print=FALSE)) cmatch <- match(colnames(object@external$stansumm), colnames(lvsumm)) stansumm <- object@external$stansumm lvcols <- grep("^eta", rownames(stansumm)) if (length(lvcols) > 0) stansumm <- stansumm[-lvcols, ] object@external$stansumm <- rbind(stansumm, lvsumm[,cmatch]) object@external$stanlvs <- newlvs }

and 2) the "samp_lvs()" function to return an empty array, if there are no latent variables:

if (standata$w9use + standata$w9no == 0) { nsamps <- nrow(mcmcout) nchain <- length(mcmcout) return(array(dim = c(nsamps, nchain, 0))) }

the "blavPredict(Models_latent_M1[[2]], type = "yhat", newdata = new_data)" function call no longer produces the "sim[,,i] index out of boundary" error with my model.

However, one thing puzzles me:

for the new data argument, I supplied the grand mean of the predictors across all levels of the grouping variable (in four rows for four group levels).

Yet, when I inspect the samples generated by type = "yhat" and "ypred", then the predictors differ in their values across the rows.

A printout of

str(object@Data@X)

from blav_fill_newdata() shows that internal to it, the newdata is correctly formated with the same grandmean for the predictors (and here supplied NA values for the outcome variables to be predicted by the model):

List of 4 $ : num [1, 1:10] NA NA NA 0.217 0.5 ... $ : num [1, 1:10] NA NA NA 0.217 0.5 ... $ : num [1, 1:10] NA NA NA 0.217 0.5 ... $ : num [1, 1:10] NA NA NA 0.217 0.5 ...

I then inspected:

standata <- object@external$mcmcdata

and it confirmed that the first three columns of the outcome variables were set to zero and the following columns were correctly set to their grandmeans (as I had provided in the newdata argument):

new_dat_fill <- blav_fill_newdata(Models_latent_M1[[2]], new_data)

new_dat_fill@external$mcmcdata$YXbar [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 0 0 0 0.2168682 0.4997795 0.5253086 0.6957672 0.7243008 0.4376543 0.4845679 [2,] 0 0 0 0.2168682 0.4997795 0.5253086 0.6957672 0.7243008 0.4376543 0.4845679 [3,] 0 0 0 0.2168682 0.4997795 0.5253086 0.6957672 0.7243008 0.4376543 0.4845679 [4,] 0 0 0 0.2168682 0.4997795 0.5253086 0.6957672 0.7243008 0.4376543 0.4845679

I am right now trying to locate the exact place within blavPredict() where these grandmean predictors get overwritten by different values of the predictors of each row (which I try to avoid so that I can calculate contrast effects for the predicted values across rows).

Best,

Niels

Am So., 23. Juni 2024 um 19:38 Uhr schrieb Ed Merkle <ecme...@gmail.com>:

Reply all

Reply to author

Forward