PPCs for predicted mean values of observed variables (like rstan)

124 views
Skip to first unread message

Garret Hall

unread,
Oct 26, 2020, 8:58:41 AM10/26/20
to blavaan
Hello,

I am working on a latent change model in blavaan, and I would like to be able to produce posterior predictive checks for the observed means at each measurement wave in a way that’s similar to the ppc_dens_overlay function in rstan and bayesplot (see here: 
https://mc-stan.org/bayesplot/reference/PPC-distributions.html). I saw a recent post about using lavPredict() for something similar to this, but I’m still not sure that produces what I’m looking for. I’m interested in how well the latent change score model fits in terms of wave-specific means, rather than traditional SEM fit indices like chi-square or RMSEA. Would this be possible in blavaan?

Thanks,
Garret

Mauricio Garnier-Villarreal

unread,
Oct 26, 2020, 10:18:03 AM10/26/20
to blavaan
Garret

If I follow what you are looking for, should be able to do it with the ppmc() function in blavaan. Using the lavPredict(type="ov"), you will get posterior distributions of predicted values for each observe variable.

This is an example with the bcfa example



library(blavaan)


# The Holzinger and Swineford (1939) example
HS.model <- ' visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '

fit <- bcfa(HS.model, std.lv=T,
            data=HolzingerSwineford1939)
summary(fit)


?ppmc

discFUN <- list(ov_pred = function(fit) {
  preds <- lavPredict(fit, type="ov")
})

out <- ppmc(fit, discFUN = discFUN)
summary(out)



### compare observed scores with posterior means
m_dif <- summary(out)$EAP - HolzingerSwineford1939[,paste0("x",1:9)]
colMeans(m_dif)


### density plot of predicted values

#### variable mean at eash posterio predicted values
post_means <- t(sapply(out@obsDist$ov_pred, colMeans))
head(post_means)
dim(post_means)

## plot te predicted means
## add red observed mean
plot(density(post_means[,"x1"]))
abline(v=mean(HolzingerSwineford1939[,"x1"]), col="red")

Ed Merkle

unread,
Oct 26, 2020, 11:21:11 AM10/26/20
to Mauricio Garnier-Villarreal, blavaan
Garret, Mauricio,

The lavPredict(, type="ov") gives you the "marginal" means, which are the same for all observations and which exclude the latent variables in the model. It would also be possible to get means that incorporate the latent variables and that are unique to each observation, but it takes a little extra work.

@Garret, do you want the latent variables included in your mean predictions?

Ed
--
You received this message because you are subscribed to the Google Groups "blavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blavaan+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/blavaan/eece7456-5855-4b7d-965e-b08c094aa5ban%40googlegroups.com.

Mauricio Garnier-Villarreal

unread,
Oct 26, 2020, 12:45:30 PM10/26/20
to blavaan
Ed

When I run the lavPredict(,type="ov") I get distintive predictive values for each subject.

Also, when I looked into the function that does the predicting lavaan:::lav_predict_yhat, seems for "ov" lavaan first predicts the factor scpres for each subject, and then includes the model and factor scores in the estimation of predicted scores

library(lavaan)

## The famous Holzinger and Swineford (1939) example

HS.model <- ' visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '

fit <- cfa(HS.model, data = HolzingerSwineford1939, std.lv=T)
summary(fit, fit.measures = TRUE)

lpred <- lavPredict(fit, type="ov")

head(lpred)
head(HolzingerSwineford1939[,paste0("x",1:9)])

Ed Merkle

unread,
Oct 26, 2020, 12:54:48 PM10/26/20
to Mauricio Garnier-Villarreal, blavaan
Sorry, I was mistaken about lavPredict. Then I think the issue is that lavPredict() does not use the sampled latent variables from the Bayesian model, instead using the point prediction that would come from a frequentist model (which is roughly the mean of the posterior distribution). This would lead to less uncertainty in the posterior means than actually exists.

I am not sure that ppmc() can currently make use of the sampled latent variables because they cannot be accessed by a simple lavaan function (but I could be wrong). Later, I will send an example of how to do it outside of ppmc().

Ed

Garret Hall

unread,
Oct 26, 2020, 4:34:34 PM10/26/20
to Ed Merkle, Mauricio Garnier-Villarreal, blavaan
Hi all,

Thanks for the assistance with this. Ideally I would like the latent variables included in the prediction along with retaining the uncertainty properties of the full Bayesian model.

Garret

--
You received this message because you are subscribed to the Google Groups "blavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blavaan+u...@googlegroups.com.

Ed Merkle

unread,
Oct 26, 2020, 4:59:53 PM10/26/20
to blavaan
I think you can do this using a few unexported functions. Below is a cfa example. The cmns object is a list whose length is the number of posterior samples. Within each entry, we have another list (whose length is the number of groups in the data). That second list entry contains the posterior mean (including lvs) for every observation in the group.

library(blavaan)


HS.model <- ' visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '
     
fit <- bcfa(HS.model, data=HolzingerSwineford1939, burnin=100, sample=100, save.lvs=TRUE)

psamps <- do.call("rbind", blavInspect(fit, 'mcmc'))

lvsamps <- do.call("rbind", blavInspect(fit, 'lvs'))

cmns <- lapply(seq(nrow(psamps)), function(i){
  lavmodel <- blavaan:::fill_params(psamps[i,], fit@Model, fit@ParTable)
  eta <- blavaan:::fill_eta(lvsamps[i,], fit@Model, fit@ParTable,
                            fit@SampleStats, fit@Data)
  lavaan:::computeYHAT(lavmodel, lavmodel@GLIST, fit@SampleStats, ETA = eta)
  })

## example of obtaining posterior means for y[4,2]
y42 <- sapply(cmns, function(x) x[[1]][4,2])



Ed Merkle

unread,
Oct 26, 2020, 5:00:50 PM10/26/20
to blavaan
I should automate this sometime, though a couple other things are taking priority at the moment...

Mauricio Garnier-Villarreal

unread,
Oct 26, 2020, 6:16:21 PM10/26/20
to blavaan
I would only add that if you want the variable means to compare with the observed mean, you can adjust the sapply() call. This way the y_colM bject is a matrix with rows number of samples, and columns number of observed indicators

y_colM <- t(sapply(cmns, function(x) colMeans(x[[1]]) ))
dim(y_colM)
head(y_colM)

### plot X2 means
plot(density(y_colM[,2]))
abline(v=mean(HolzingerSwineford1939[,"x2"]), col="red")
abline(v=mean(y_colM[,2]) )

Garret Hall

unread,
Oct 26, 2020, 10:15:58 PM10/26/20
to Mauricio Garnier-Villarreal, blavaan
Thank you, Mauricio and Ed, for all of this helpful information. 

Garret

--
You received this message because you are subscribed to the Google Groups "blavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blavaan+u...@googlegroups.com.

Garret Hall

unread,
Oct 27, 2020, 12:03:28 AM10/27/20
to Ed Merkle, blavaan
Would it be possible to obtain the entire predicted distribution from each sample, not just the mean? It looks like the code you provided produces the posterior samples of mean values, but I'm also interested in the full posterior predicted distribution (i.e., predicted mean + predicted variance) from each posterior sample to produce something similar the plot from bayesplot below. I would also like this so I can do ppc checks for specific parts of the posterior samples besides the mean (e.g, quantiles).

Thank you again,
Garret

fit <- stan_glm(mpg ~ ., data = mtcars)
 posterior <- as.matrix(fit)
 ppc_dens_overlay(y = fit$y,
                yrep = posterior_predict(fit, draws = 50))

image.png

--
You received this message because you are subscribed to the Google Groups "blavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blavaan+u...@googlegroups.com.

Mauricio Garnier-Villarreal

unread,
Oct 27, 2020, 5:42:01 AM10/27/20
to blavaan
Garret

If you include this code, you would get the density of the observed data and the density of 20 posterior draws (randomly selected). Here I wrote it for the 9 variables of the HZ data, and set them in a matrix of plots (this could be done with a loop too)


rand_samp <- sample(1:nrow(psamps), 20)
rand_samp

par(mfrow=c(3,3))
plot(density(HolzingerSwineford1939[,"x1"]),
     col="red", lwd=4, ylim=c(0,.6))
for(k in 1:length(rand_samp)){
  lines(density(cmns[[k]][[1]][,1]) )
}

plot(density(HolzingerSwineford1939[,"x2"]),
     col="red", lwd=4, ylim=c(0,1.05))
for(k in 1:length(rand_samp)){
  lines(density(cmns[[k]][[1]][,2]) )
}

plot(density(HolzingerSwineford1939[,"x3"]),
     col="red", lwd=4, ylim=c(0,.8))
for(k in 1:length(rand_samp)){
  lines(density(cmns[[k]][[1]][,3]) )
}

plot(density(HolzingerSwineford1939[,"x4"]),
     col="red", lwd=4, ylim=c(0,.5))
for(k in 1:length(rand_samp)){
  lines(density(cmns[[k]][[1]][,4]) )
}

plot(density(HolzingerSwineford1939[,"x5"]),
     col="red", lwd=4, ylim=c(0,.5))
for(k in 1:length(rand_samp)){
  lines(density(cmns[[k]][[1]][,5]) )
}

plot(density(HolzingerSwineford1939[,"x6"]),
     col="red", lwd=4, ylim=c(0,.6))
for(k in 1:length(rand_samp)){
  lines(density(cmns[[k]][[1]][,6]) )
}

plot(density(HolzingerSwineford1939[,"x7"]),
     col="red", lwd=4, ylim=c(0,1))
for(k in 1:length(rand_samp)){
  lines(density(cmns[[k]][[1]][,7]) )
}

plot(density(HolzingerSwineford1939[,"x8"]),
     col="red", lwd=4, ylim=c(0,.8))
for(k in 1:length(rand_samp)){
  lines(density(cmns[[k]][[1]][,8]) )
}

plot(density(HolzingerSwineford1939[,"x9"]),
     col="red", lwd=4, ylim=c(0,.8))
for(k in 1:length(rand_samp)){
  lines(density(cmns[[k]][[1]][,9]) )
}
par(mfrow=c(1,1))




Garret Hall

unread,
Oct 27, 2020, 9:15:27 AM10/27/20
to Mauricio Garnier-Villarreal, blavaan
Great -- this is exactly what I needed! Thanks so much.

Garret

hal...@gmail.com

unread,
Apr 28, 2022, 9:55:19 AM4/28/22
to blavaan
Hi Ed,

I'm seeking some clarification on what this function you provided above does in comparison with blavPredict(data, type = "ypred").  blavPredict "ypred" is conditioning on the sampled latent variables, and the blavaan documentation says to use postdata() if one is interested in posterior predictives that do not condition on the latent variables (I can't find postdata(), though, so I'm also wondering if that's available yet). The code above is "including" latent variables in the posterior distributions it creates, so I'm just wondering whether the code above is more like postdata() or blavPredict("ypred")? 

I used the code you provided above to plot posterior distributions from the cfa above (for variable "x1"), and I did the same thing with blavPredict("yred"), and the distributions of the posteriors look slightly different (image attached; R code below). The means of each distribution are nearly identical, but blavPredict has a larger SD for each distribution, so I am assuming these two methods of getting posteriors are doing different things, I'm just not clear on what differs between the methods. 

Thank you for your input!
Garret



library(blavaan)
library(tidyverse)


HS.model <- ' visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '

fit <- bcfa(HS.model, data=HolzingerSwineford1939, burnin=100, sample=100, save.lvs=TRUE)

# recommendation from Ed Merkle to get posterior distribution

psamps <- do.call("rbind", blavInspect(fit, 'mcmc'))

lvsamps <- do.call("rbind", blavInspect(fit, 'lvs'))

cmns <- lapply(seq(nrow(psamps)), function(i){
  lavmodel <- blavaan:::fill_params(psamps[i,], fit@Model, fit@ParTable)
  eta <- blavaan:::fill_eta(lvsamps[i,], fit@Model, fit@ParTable,
                            fit@SampleStats, fit@Data)
  lavaan:::computeYHAT(lavmodel, lavmodel@GLIST, fit@SampleStats, ETA = eta)
})

#blavPredict
pred <- blavPredict(fit, type = "ypred")

#plot the data
pred2 <- bind_rows(lapply(pred, as.data.frame), .id = "df")
cmns2 <- bind_rows(lapply(cmns, as.data.frame), .id = "df")
cmns2$type <- "Ed's code"
pred2$type <- "blavPredict 'ypred'"
cmns2 <- cmns2 %>% rename(x1 = X1,
                          x2 = X2,
                          x3 = X3,
                          x4 = X4,
                          x5 = X5,
                          x6 = X6,
                          x7 = X7,
                          x8 = X8,
                          x9 = X9)
allpred <- rbind(cmns2, pred2)

allpred %>% ggplot(aes(x = x1, group = df)) + geom_density() + facet_wrap(~type)


Posterior Samples CFA.png

hal...@gmail.com

unread,
Apr 28, 2022, 10:05:43 AM4/28/22
to blavaan
Apologies, Ed -- I answered my own question. Using blavPredict("yhat") produces identical posterior samples as the code you provided (which should have been obvious since it does not appear that residual variances are included in the code you provided). 
blavpredict yhat.png
Reply all
Reply to author
Forward
0 new messages