Batched Sampling Analysis Question

24 views
Skip to first unread message

Monish Shah

unread,
Aug 1, 2025, 3:42:10 PMAug 1
to blavaan
Hi all,

I have a dataset with 42k points that I split into 4 ~10k sets each. I am running the blavaan (version 0.5-8) and saving the models to a directory. I am new to Bayesian analysis, so I wanted to know if there was a way to compare the model estimates/posterior samples across the subsets that would allow me to make meaningful interpretations? 

Basically, what I am asking is: is there a mathematical equivalent to running the blavaan model on the full dataset for my batched analysis approach? My system times out before I can run the full dataset in 1 go, so I have no choice but to split this.

Thank you in advance!

Monish

Ed Merkle

unread,
Aug 2, 2025, 12:20:32 PMAug 2
to blavaan
Monish,

For some models, it is possible to fit the model to the observed covariance matrix instead of raw data. This is advantageous because it can scale to large datasets, but it slows down for models with missing data, ordinal data, and many observed variables. blavaan tries to automatically do some speedups here, though it might be helpful to see your model syntax to see whether something else could be done.

A more general solution, at least in theory, involves using the posterior distribution from the first batch as the prior distribution for the second batch. Then the posterior distribution for the second batch becomes the prior for the third batch, and so on. This becomes difficult because the posterior distribution is multivariate over all model parameters, so that univariate priors on individual model parameters often do not adequately represent the full posterior distribution. There is some good discussion of this idea and possible solutions on the Stan discourse site: https://discourse.mc-stan.org/t/composing-stan-models-posterior-as-next-prior/8093

Ed

Monish Shah

unread,
Aug 2, 2025, 11:13:42 PMAug 2
to blavaan
Hi Ed,

Thanks for the link, I will see if its something I can implement with my current software. The issue I am running into is a problem with Palantir Foundry's Code Workspaces, with the license timing out after a certain period of time. Would I be able to start a specific batch using the previous prior distribution (by saving the last completed model or some other method)? I have included my model with the data types, but it seems like since fitting the model to the observed covariance slows down for ordinal data, the first idea you suggested might not be possible. 

model <- '
  outcomeVar ~ # ordinal
    age_group_1 + age_group_2 + age_group_3 + age_group_4 + #all binary
    i1 + i2 + i3 + # all binary
    i4 + i5 + # all binary
    i1*14 + i2*i4 + i3*i4 + # binary interaction terms
    i1*i5 + i2*i5 + i3*i5 + # binary interaction terms 
    + sex + race_ethnicity_group_1 + race_ethnicity_group_2 + race_ethnicity_group_3 # all binary
'

Thank you for your time and help,
Monish

Ed Merkle

unread,
Aug 3, 2025, 11:12:13 PMAug 3
to blavaan
It may be possible to obtain some approximation of the previous posterior distribution, then send it to the next model you are estimating as the prior. It will probably take some work to automate it, and blavaan won't help you much here (other than allowing you to specify priors).

Is it correct your model does not involve any latent variables, just ordinal and binary observed variables? If so, I believe that the brms and/or rstanarm packages might be better optimized to estimate this ordinal regression model. For example:


Ed

Mauricio Garnier-Villarreal

unread,
Aug 6, 2025, 6:09:44 PMAug 6
to blavaan
 A couple of ideas, 

Could run the model with each separate data set and
-  use the multiple imputation rules to combine parameter estimates and SD
- or use meta-analysis rules to combine the estimates from the separate data sets

take care
Reply all
Reply to author
Forward
0 new messages