SampleData () specify sample size

22 views
Skip to first unread message

Yadi Li

unread,
Nov 18, 2025, 8:44:43 AM (5 days ago) Nov 18
to blavaan
Hi all,

We are trying to use sampleData() function to generate data from a fitted blavaan model. We are wondering if there is a way to specify the sample size so that we can generate data with a different sample size than the original data?

Thank you!
Yadi Li

Ed Merkle

unread,
Nov 18, 2025, 2:39:24 PM (4 days ago) Nov 18
to blavaan
Yadi,

The sampleData() command does not currently have a sample size argument. If you use sampleData with conditional = FALSE (the default), then a workaround is below. I don't think it makes sense to use conditional = TRUE here because that conditions on the estimated latent variables in the model. And a different sample size would usually imply that you have new people (rows) in your data, so that we should not condition on the latent variables that were already sampled.

Assuming your blavaan model is called "fit" and you want conditional = FALSE, you could manipulate the nrep argument and then create individual datasets of the size that you want:

desired_n <- 500
desired_reps <- 50

dat <- sampleData(fit, nrep = ceiling((desired_n * desired_reps) / nobs(fit)), simplify = TRUE)

dat <- do.call("rbind.data.frame", dat)
dat <- dat[1:(desired_n * desired_reps),]
dat <- split(dat, rep(1:desired_reps, each = desired_n))

It would be a little trickier for multiple group models, because sampleData returns nested lists in that case (each group is a separate list entry).

Ed

Yadi Li

unread,
Nov 18, 2025, 10:39:05 PM (4 days ago) Nov 18
to blavaan
Thank you Ed! This is very helpful.

Just wanted to make sure I understand this: the sampleData() function generates each dataset based on one draw of the parameters from the posterior distribution. If we do this workaround method you suggested - which seems like combining the generated datasets and then splitting them by desired sample size - does that mean in any one of the resulted dataset, the observations may be generated from different posterior draws of the parameters? 

If we would like a dataset (different sample size than the original data) where all observations are generated from the same posterior draw, would that be possible?  


Thanks again!
Yadi

Ed Merkle

unread,
Nov 19, 2025, 5:32:03 PM (3 days ago) Nov 19
to blavaan
That is a good point. I was focused on the "conditional" argument and forgot about the fact that each list entry is generated from one posterior sample. So I think you could instead call sampleData() multiple times and glue together the results, something like this:

desired_n <- 500
desired_reps <- 50
ngen <- ceiling(desired_n / nobs(fit))

dat <- lapply(1:ngen, function(i) sampleData(fit, nrep = desired_reps, simplify = TRUE))

dat2 <- vector("list", length = desired_reps)

for (j in 1:desired_reps) {
  dat2[[j]] <- do.call("rbind", sapply(dat, function(x) x[j]))
  dat2[[j]] <- dat2[[j]][1:desired_n,]
}

Yadi Li

unread,
Nov 19, 2025, 10:25:01 PM (3 days ago) Nov 19
to blavaan
That sounds perfect, thank you so much!
Reply all
Reply to author
Forward
0 new messages