Blavaan incredible slow...speed-up solutions?

783 views
Skip to first unread message

Gabe Avakian Orona

unread,
Feb 23, 2022, 7:10:19 PM2/23/22
to blavaan
Hi All,

I'm working with a dataset of 260 rows (no missing data). I'm trying to run a cross-lagged model with several predictors and informative priors on some of the parameters. The estimation is incredibly slow: ~7 hours to complete. Any help in speeding things up? I paste some of my code below in case trouble-shooting can help. 

Thank you very much!

bl1<-'
 # Latent Variables
#_ _ _ _ _ _ _ _ _
CCR1 =~  
  z_ana1 + prior("normal(.5,.1)")*z_ana1  +  
  z_syn1 + prior("normal(.5,.1)")*z_syn1  +  
  z_win1 + prior("normal(.5,.1)")*z_win1  +
  z_ptk1 + prior("normal(.5,.1)")*z_ptk1    
CCR2 =~  
  z_ana2 + prior("normal(.5,.1)")*z_ana2  +  
  z_syn2 + prior("normal(.5,.1)")*z_syn2  +  
  z_win2 + prior("normal(.5,.1)")*z_win2  +
  z_ptk2 + prior("normal(.5,.1)")*z_ptk2  
CUR =~ NFC + EC + openess
EFF =~ Optim + ASE + consci
#_ _ _ _ _ _ _ _ _
#Regressions
CCR2 ~ zOuterBreadth + prior("normal(.13,.5)")*zOuterBreadth +
      CCR1 + prior("normal(.7,.05)")*CCR1 +
      CUR + EFF+
      msf19jtsk2+ msf19nfge1+ polisoc+  
      male+  Urm  +  sesIndex+ junior +age+full_time +
      d_pubhealth + d_bio + d_bus + d_soe + d_engi+ d_hum+ d_info+
      d_nurs +d_phys +d_seco+ d_ss+ d_art
#_ _ _ _ _ _ _ _ _
# Variance/CUs
CCR2 ~~ CCR2
 z_syn1 ~~  z_syn2
z_win1 ~~  z_win2
z_ptk1 ~~  z_ptk2
z_ana1 ~~ z_ana2
 NFC ~~ openess
EC ~~ openess
EC ~~     ASE
'

#Fit
bl1 <- bsem(bl1, data=dat, std.lv=T,
           n.chains = 3, burnin=5000,
           sample=1000, target = "stan")


Best,
Gabe

Ed Merkle

unread,
Feb 23, 2022, 7:50:27 PM2/23/22
to Gabe Avakian Orona, blavaan
That is incredibly slow! Some misc thoughts:

- bsem() automatically adds some covariance parameters between latent variables here, and it might not be what you want (not sure). You might try specifying all the desired parameters (maybe you already have), then use blavaan() instead of bsem().

- I wonder whether you have tried sem() from lavaan for this model. That can provide a hint about whether it is a problem with the Bayesian estimation, or a more general problem with the model.

- Try a simpler model without the regressions, to see whether the problem remains.

- In my experience, if you need more than 500 burnin iterations, these models will never converge.

- For lines like 

z_ana1 + prior("normal(.5,.1)")*z_ana1

you only need 

prior("normal(.5,.1)")*z_ana1

I don't think that will change anything, but it will make the code easier to read.


Ed
--
You received this message because you are subscribed to the Google Groups "blavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blavaan+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/blavaan/b59b3983-854c-4351-ac10-dd0882cab7a5n%40googlegroups.com.

Mauricio Garnier-Villarreal

unread,
Feb 24, 2022, 8:39:58 AM2/24/22
to blavaan
Gabe

can you show us the sessionInfo()?

To see if there is something to pay attention there

Mauricio Garnier-Villarreal

unread,
Feb 25, 2022, 11:16:40 AM2/25/22
to blavaan
Gabe

Based on the sessionInfo(), i would also recommend to do an update of the packages, as both lavaan, blvaan, and rstan have newer versions. And the latest version of blavaan had some speed improvements

update.packages(ask=F, checkBuilt=T)




> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base    

other attached packages:
[1] blavaan_0.3-15     RcppParallel_5.1.4 Rcpp_1.0.6         lavaan_0.6-8      

loaded via a namespace (and not attached):
  [1] utf8_1.2.1           questionr_0.7.4      tidyselect_1.1.1     lme4_1.1-27          htmlwidgets_1.5.3    grid_4.1.0           munsell_0.5.0        codetools_0.2-18    
  [9] effectsize_0.4.5     DT_0.18              future_1.21.0        miniUI_0.1.1.1       withr_2.4.2          colorspace_2.0-1     OpenMx_2.19.5        highr_0.9          
 [17] knitr_1.33           rstudioapi_0.13      stats4_4.1.0         listenv_0.8.0        bayesplot_1.8.1      mi_1.0               emmeans_1.7.0        rstan_2.21.2        
 [25] mnormt_2.0.2         MCMCpack_1.5-0       parallelly_1.26.0    coda_0.19-4          vctrs_0.3.8          generics_0.1.0       TH.data_1.1-0        xfun_0.24          
 [33] R6_2.5.0             markdown_1.1         arm_1.11-2           rstanarm_2.21.1      assertthat_0.2.1     promises_1.2.0.1     scales_1.1.1         multcomp_1.4-17    
 [41] nnet_7.3-16          gtable_0.3.0         globals_0.14.0       rethinking_2.13      conquer_1.0.2        processx_3.5.2       mcmc_0.9-7           sandwich_3.0-1      
 [49] MatrixModels_0.5-0   timeDate_3043.102    rlang_0.4.11         splines_4.1.0        checkmate_2.0.0      inline_0.3.19        yaml_2.2.1           reshape2_1.4.4      
 [57] abind_1.4-5          threejs_0.3.3        crosstalk_1.1.1      backports_1.2.1      httpuv_1.6.1         rsconnect_0.8.18     Hmisc_4.5-0          tools_4.1.0        
 [65] psych_2.1.3          ggplot2_3.3.4        ellipsis_0.3.2       RColorBrewer_1.1-2   Rsolnp_1.16          stargazer_5.2.2      ggridges_0.5.3       plyr_1.8.6          
 [73] base64enc_0.1-3      purrr_0.3.4          rockchalk_1.8.144    ps_1.6.0             prettyunits_1.1.1    rpart_4.1-15         pbapply_1.4-3        zoo_1.8-9          
 [81] qgraph_1.6.9         haven_2.4.1          cluster_2.1.2        magrittr_2.0.1       data.table_1.14.0    nonnest2_0.5-5       openxlsx_4.2.4       SparseM_1.81        
 [89] colourpicker_1.1.0   truncnorm_1.0-8      tmvnsim_1.0-2        mvtnorm_1.1-2        matrixcalc_1.0-4     matrixStats_0.59.0   hms_1.1.0            shinyjs_2.0.0      
 [97] mime_0.10            evaluate_0.14        xtable_1.8-4         shinystan_2.5.0      XML_3.99-0.6         jpeg_0.1-8.1         gridExtra_2.3        shape_1.4.6        
[105] rstantools_2.1.1     compiler_4.1.0       tibble_3.1.2         V8_3.4.2             crayon_1.4.1         minqa_1.2.4          StanHeaders_2.21.0-7 htmltools_0.5.1.1  
[113] corpcor_1.6.9        later_1.2.0          semTools_0.5-4       Formula_1.2-4        tidyr_1.1.3          DBI_1.1.1            kutils_1.70          MASS_7.3-54        
[121] see_0.6.4            boot_1.3-28          Matrix_1.3-3         cli_3.0.0            parallel_4.1.0       insight_0.14.5       igraph_1.2.6         forcats_0.5.1      
[129] pkgconfig_2.0.3      sem_3.1-11           foreign_0.8-81       dygraphs_1.1.1.6     pbivnorm_0.6.0       CompQuadForm_1.4.3   estimability_1.3     stringr_1.4.0      
[137] callr_3.7.0          digest_0.6.27        parameters_0.14.0    semPlot_1.1.2        rmarkdown_2.9        htmlTable_2.2.1      lisrelToR_0.1.4      curl_4.3.1          
[145] quantreg_5.86        shiny_1.6.0          gtools_3.9.2         nloptr_1.2.2.2       lifecycle_1.0.0      nlme_3.1-152         glasso_1.11          jsonlite_1.7.2      
[153] carData_3.0-4        fansi_0.5.0          labelled_2.8.0       pillar_1.6.1         lattice_0.20-44      loo_2.4.1            fastmap_1.1.0        pkgbuild_1.2.0      
[161] survival_3.2-11      glue_1.4.2           xts_0.12.1           bayestestR_0.10.0    zip_2.2.0            fdrtool_1.2.16       png_0.1-7            shinythemes_1.2.0  
[169] stringi_1.6.1        regsem_1.8.0         latticeExtra_0.6-29  dplyr_1.0.6          future.apply_1.7.0 

Ed Merkle

unread,
Feb 25, 2022, 2:29:58 PM2/25/22
to Mauricio Garnier-Villarreal, blavaan
I agree with Mauricio that a package update might lead to some small improvements. I have also corresponded with Gabe off-list, and we found that some of the default priors on SD parameters were problematic. For example, some of the observed variables ranged from 0-100, so that a gamma(1,.5) on the corresponding residual SD is problematic. Additionally, some priors on regression weights were highly informative, and potentially conflicted with the data.

Maybe it is possible to have blavaan scale all observed variables to have an SD of 1 here, so that the default priors work better. But I worry that it would be problematic for models with certain equality constraints. For example, if you constrain a residual variance to be equal across groups, but then you scale the observed variables, perhaps the constraint leads to a different model than you started with.

Ed

Mauricio Garnier-Villarreal

unread,
Feb 25, 2022, 2:36:33 PM2/25/22
to blavaan
Gabe

Please keep the answers in the google gorup, instead of individual emails. Make it harder of us to keep track of the conversation

Base on the last conflict of versions that you send me

> library(blavaan)
Error: package or namespace load failed for ‘blavaan’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]):
 namespace ‘rlang’ 0.4.11 is already loaded, but >= 1.0.1 is required

You need to install the latest version of rlang

install.packages("rlang",dep=T)

Ed Merkle

unread,
Feb 25, 2022, 8:27:18 PM2/25/22
to Mauricio Garnier-Villarreal, blavaan
Beyond Mauricio's recommendation, sometimes packages can get installed in different places when you use Rstudio, as opposed to just R. Then R might look in the wrong place for a package and find an old version, even though you have the new version installed somewhere else. The link below describes a bit more about this issue. You might also update R and Rstudio if you are not at the newest versions, and then fiddle with updating packages once you are on the newest versions.


Ed
--
You received this message because you are subscribed to the Google Groups "blavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blavaan+u...@googlegroups.com.

Gabe Avakian Orona

unread,
Feb 26, 2022, 1:24:58 PM2/26/22
to Mauricio Garnier-Villarreal, blavaan
Sure thing. Thanks Mauricio.

I'm still getting this issue:

This is blavaan 0.4-1
On multicore systems, we suggest use of future::plan("multicore") or
  future::plan("multisession") for faster post-MCMC computations.

Even after updating blavaan.

Gabe

--
You received this message because you are subscribed to a topic in the Google Groups "blavaan" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/blavaan/jiBuYc1inYE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to blavaan+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/blavaan/a48dcf05-4201-4528-96dc-55e6fbe2fb20n%40googlegroups.com.


--
Gabe Avakian Orona
PhD Student
School of Education
University of California, Irvine
3200 Education Irvine, CA 92697


Ed Merkle

unread,
Feb 26, 2022, 3:22:03 PM2/26/22
to Gabe Avakian Orona, Mauricio Garnier-Villarreal, blavaan
That is just a startup suggestion to help speed things up and appears for everyone.

Ed


You received this message because you are subscribed to the Google Groups "blavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blavaan+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/blavaan/CABNJFV967RrrXFYoN%2BFn80Hc8a%2BJd28QP2NZg0zRsuzRGeZ5vQ%40mail.gmail.com.

Gabe Avakian Orona

unread,
Feb 27, 2022, 12:53:37 PM2/27/22
to Ed Merkle, Mauricio Garnier-Villarreal, blavaan
Hi Ed and Mauricio,

There appears to be a unique problem occurring after implementing all the suggestions: the summary function (and other functions) don't seem to recognize my blavaan fit object as a blavaan object. For instance, when I execute summary() on my fit object, it shows "estimate" instead of posterior mean--and no information on SD, Rhat, etc. Here is an example of what I mean:

#1. Model Specification 

m0<-'

 
# Latent Variables
#_ _ _ _ _ _ _ _ _
CCR1 =~  
z_ana1 +
z_syn1 +  
z_win1 +
z_ptk1  
CCR2 =~  
z_ana2   +
z_syn2  +  
z_win2 +
z_ptk2  
#_ _ _ _ _ _ _ _ _
#Regressions
CCR2 ~ CCR1

'
#_________________________________________

#2.  Model Fit
fit0<-bcfa(m0,data =  dat, std.lv = TRUE, save.lvs = T)

# 3. Summary

> summary(fit0)
lavaan 0.6-10 ended normally after 1000 iterations

  Estimator                                      BAYES
  Optimization method                           NLMINB
  Number of model parameters                        25
                                                     
  Number of observations                           260
  Number of missing patterns                         1
                                                     
Model Test User Model:
                                                       
  Test statistic                              -2893.373
  Degrees of freedom                                 NA
                                                     
  Test statistic                                 0.003
  Degrees of freedom                                NA

Parameter Estimates:


Latent Variables:
                   Estimate
  CCR1 =~                  
    z_ana1            0.691
    z_syn1            0.807
    z_win1            0.436
    z_ptk1            0.281
  CCR2 =~                  
    z_ana2            0.491
    z_syn2            0.509
    z_win2            0.274
    z_ptk2            0.242

Regressions:
                   Estimate
  CCR2 ~                  
    CCR1              1.009

Intercepts:
                   Estimate
   .z_ana1            0.003
   .z_syn1            0.002
   .z_win1           -0.000
   .z_ptk1            0.001
   .z_ana2            0.002
   .z_syn2            0.003
   .z_win2            0.000
   .z_ptk2            0.002
    CCR1              0.000
   .CCR2              0.000

Variances:
                   Estimate
   .z_ana1            0.542
   .z_syn1            0.369
   .z_win1            0.826
   .z_ptk1            0.934
   .z_ana2            0.535
   .z_syn2            0.499
   .z_win2            0.863
   .z_ptk2            0.896
    CCR1              1.000
   .CCR2              1.000

Mauricio Garnier-Villarreal

unread,
Feb 28, 2022, 5:57:02 AM2/28/22
to blavaan
Gabe

This is a semi recurrent problem. Happens when your R session is confused, and reads the blavaan object as a lavaan object instead, so it is presenting the lavaan summary

I still havent found a "good" solution when this happens. I just close Rstudio and open it again. When you set your code, make sure to only run library(blavaan), and NOT call library(lavaan). Blavaan will automatically load lavaan, and sometimes loading lavaan adds to the confusion of the R session

Ed Merkle

unread,
Feb 28, 2022, 9:28:36 AM2/28/22
to blavaan
I keep adding fixes that I think will solve this problem, but apparently it still exists. I agree that library(lavaan) after library(blavaan) might cause problems. One other possible solution, without exiting Rstudio, is:

blsumm <- getMethod(summary, "blavaan")

and then you use blsumm() as if it were summary().

Ed

Gabe Avakian Orona

unread,
Feb 28, 2022, 12:11:32 PM2/28/22
to blavaan
Hi Maurcio, 

I implemented your suggested solutions. When I went back in to run and load only blavaan, I get this error:

Error in lav_syntax_get_modifier(rhs[[3]][[2L]]) :
  lavaan ERROR: evaluating modifier failed: *()*

Any chance you have come across this issue before?

Mauricio Garnier-Villarreal

unread,
Mar 1, 2022, 4:56:34 AM3/1/22
to blavaan
Gabe

I havent seen this error before. Can you share the full syntax that give you this?

Ed Merkle

unread,
Mar 1, 2022, 10:23:33 AM3/1/22
to Gabe Avakian Orona, blavaan
I think that lavaan is providing a hint about a problem with your model specification. It is saying that your model syntax is bad, like maybe you have an extra asterisk in it (maybe around a prior() statement).

Ed

Gabe Avakian Orona

unread,
Mar 1, 2022, 10:34:44 AM3/1/22
to Mauricio Garnier-Villarreal, blavaan
Hi Mauricio,

Sure, here it is:


library(blavaan)

bl2<-'

 # Latent Variables
#_ _ _ _ _ _ _ _ _
CCR1 =~  
  prior("normal(.5,.4)")*z_ana1  +  
  prior("normal(.5,.4)")*z_syn1  +  
  prior("normal(.5,.4)")*z_win1  +
  prior("normal(.5,.4)")*z_ptk1    
CCR2 =~  
  prior("normal(.5,.4)")*z_ana2  +  
  prior("normal(.5,.4)")*z_syn2  +  
  prior("normal(.5,.4)")*z_win2  +
  prior("normal(.5,.4)")*z_ptk2  

CUR =~ NFC + EC + openess
EFF =~ Optim + ASE + consci
#_ _ _ _ _ _ _ _ _
#Regressions
CCR2 ~ 
      prior("normal(.13,.8)")*zOuterBreadth +
      prior("normal(.7,.5)")*CCR1 +
      prior("normal(.2,.5)")*Hs_GPA +
      prior("normal(.2,.5)")*sat+
       prior("normal(.1,.8)")*CUR
       prior("normal(.1,.8)")*EFF+
       polisoc+  msf19nfge1 +
      male+  Urm  +  sesIndex+ junior +age+full_time +
      d_pubhealth + d_bio + d_bus + d_soe + d_engi+ d_hum+ d_info+
      d_nurs +d_phys +d_seco+ d_ss+ d_art
#_ _ _ _ _ _ _ _ _
# Variance/CUs
CCR2 ~~ CCR2
 z_syn1 ~~  z_syn2
z_win1 ~~  z_win2
z_ptk1 ~~  z_ptk2
z_ana1 ~~ z_ana2
 NFC ~~ openess
EC ~~ openess
EC ~~     ASE
'
#Fit & Summary of Freq Model
bl2 <- bsem(bl2, data=dat, std.lv=T, save.lvs = T,
            n.chains = 3, burnin=500,
            sample=1000, target = "stan")

Error in lav_syntax_get_modifier(rhs[[3]][[2L]]) :
  lavaan ERROR: evaluating modifier failed: *()*



Gabe Avakian Orona

unread,
Mar 1, 2022, 10:36:26 AM3/1/22
to Ed Merkle, blavaan
Hi Ed,

The thing is this model was working prior to rebooting R and Rstudio and updating the packages. I have since tried deleting and re-installing both lavaan and blavaan. Neither of those options seemed to work.

Mauricio Garnier-Villarreal

unread,
Mar 1, 2022, 10:48:24 AM3/1/22
to blavaan
Gabe

I had a problem with the updates (different error), that got fixed by using the in development version of lavaan

Can try installing it with this

library(devtools)
install_github("yrosseel/lavaan")

Gabe Avakian Orona

unread,
Mar 1, 2022, 11:03:03 AM3/1/22
to blavaan
Oddly enough, when I just run lavaan models (frequentist models), there are no issues. I can get summaries, etc. The issue is mostly with blavaan. Do you think this would help blavaan, too?

Thank you for your help.
-Gabe

Mauricio Garnier-Villarreal

unread,
Mar 2, 2022, 3:55:46 AM3/2/22
to blavaan
Yes, updating lavaan can help blavaan. Because blavaan works with a lot of lavaan structure

christop...@gmail.com

unread,
Apr 7, 2022, 12:10:02 PM4/7/22
to blavaan
For what it is worth: I too find blavaan extremely slow. A simple model that takes a few minutes in Mplus (with estimator = bayes) takes many hours in blavaan (I gave up once again and forced R to quit). 

Sorry for being negative. I've moved back to Mplus, even though I have great respect for free and open-source software. Please note that I use large samples from 1000 to 30,000.

christop...@gmail.com

unread,
Apr 7, 2022, 12:14:40 PM4/7/22
to blavaan
PS. No problem with lavaan. Fast and good. 

Ed Merkle

unread,
Apr 7, 2022, 12:40:14 PM4/7/22
to christop...@gmail.com, blavaan
Some models are indeed slow. I'd be interested to see the code for your slow model, in case it involves something I haven't seen before.

Ed

Gabe Avakian Orona

unread,
Apr 7, 2022, 12:59:20 PM4/7/22
to blavaan
christop...@gmail.com Blavaan is still incredibly slow for me after implementing the changes suggested. Anything you learn to speed it up I will be happy to hear. No worries about being negative; honesty is best in my view.

Thank you,
Gabe


Mauricio Garnier-Villarreal

unread,
Apr 8, 2022, 7:44:01 AM4/8/22
to blavaan
Can you run this in your set up, and report the system.time() ?
Because I think the porblem is not blavaan, but something on the Stan installation is making the stan models run slow


library(rstan)

scode <- "
parameters {
  real y[2];
}
model {
  y[1] ~ normal(0, 1);
  y[2] ~ double_exponential(0, 2);
}
"

system.time(
  fit2 <- stan(model_code = scode, iter = 10000, verbose = FALSE)
)

Ed Merkle

unread,
Apr 8, 2022, 9:32:03 AM4/8/22
to Mauricio Garnier-Villarreal, blavaan
Mauricio and all,

While the Stan install could definitely be a problem, Christopher's comment about large samples makes me think that blavaan is probably to blame. That reminded me of a sufficient statistic trick that I hadn't gotten around to implementing, so I tried it out yesterday and it really seems to help certain models when there is no missing data. These additions are on github, if you want to try it yourself.

Beyond that, common causes of slowness are prior-data conflict (when priors are informative and conflict with data) and sampling for tens of thousands of iterations (fewer iterations are needed in Stan as compared to Gibbs samplers).

Ed

Mauricio Garnier-Villarreal

unread,
Apr 12, 2022, 9:13:30 AM4/12/22
to blavaan
Ed

Is this new trick in the github as part of target="stan" or needs something else to be called?

thanks

Ed Merkle

unread,
Apr 12, 2022, 9:43:22 AM4/12/22
to Mauricio Garnier-Villarreal, blavaan
It should work for target="stan", without requiring anything else from you. But right now, it only works for models with continuous variables, complete data, and fixed.x=FALSE. I think you would see the most speedup for large sample sizes (thousands or more).

Ed

Terrence Jorgensen

unread,
Apr 25, 2022, 9:44:57 AM4/25/22
to blavaan
right now, it only works for models with continuous variables, complete data, and fixed.x=FALSE. 

Would the same trick work if implemented per missing-data pattern?  Perhaps problematic for unplanned missingness incomplete, but from planned-missing designs, it can be possible to compute the summary stats for the subset of variables relevant per missing pattern.  Vika and Ke-Hai capitalized on that for one of their proposed model-based transformation methods (extending Bollen-Stine for incomplete data).

Terrence D. Jorgensen
Assistant Professor, Methods and Statistics
Research Institute for Child Development and Education, the University of Amsterdam

Ed Merkle

unread,
Apr 25, 2022, 1:46:06 PM4/25/22
to Terrence Jorgensen, blavaan
Yes, it should work for missing data but I think will require a custom lpdf in Stan. The problem is that you get to a missing data pattern with 1 observation, and the sample covariance matrix for that pattern is not positive definite, and then Stan throws an error when you try to model it. I think you can get around it by avoiding the constant terms in the density function.

Ed
--
You received this message because you are subscribed to the Google Groups "blavaan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blavaan+u...@googlegroups.com.

christop...@gmail.com

unread,
Jan 2, 2023, 1:54:53 AM1/2/23
to blavaan
Sorry for bringing up the speed issue again. 

I will teach Bayesian statistics and initially wanted to let participants in the workshop use R. With R, my preference would be blavaan. But speed seems to be an issue?

I have estimated a simple regression model (y ~ x1 + x2, N = approx 2000, item-level missing data less than 5%). Time needed for running this model with Bayesian estimations with diffuse priors is:

blavaan 0.4-3: more than 9 minutes
Stata 17: 10.6 seconds
Mplus 8.8: 1 second

That seems strange: 1 sec with Mplus but nearly 10 minutes with blavaan?  Do these differences make any sense? 

CODE:
fit <- bsem("stflife ~ agea + age2", data = data) 

Only R and Stata are options for this particular workshop, and I would prefer using R. 
(I just tried brms and couldn't even make brms run. But if the source of slow estimation with blavaan lies in Stan, I guess there's no point in trying brms.) 

Thanks, 
Christopher

Ed Merkle

unread,
Jan 2, 2023, 12:12:02 PM1/2/23
to christop...@gmail.com, blavaan
Christopher,

Thanks for the report. As it currently stands, I think the speed is ok for continuous variables and somewhat slow for ordinal variables. Missingness should slow it down a bit, but 9min does seem long if you are using continuous variables. If you can send data (off list if needed), I could explore it more.

Thanks,
Ed

Gabe Avakian Orona

unread,
Jan 2, 2023, 12:21:22 PM1/2/23
to Ed Merkle, christop...@gmail.com, blavaan
I'm glad this was brought up; I'm also still experiencing the slow run time, even with no missing data. 

Gabe

--
You received this message because you are subscribed to a topic in the Google Groups "blavaan" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/blavaan/jiBuYc1inYE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to blavaan+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/blavaan/bb13cc8d5b34fccfc0c5fc2e31d05fa6f744bd1c.camel%40gmail.com.


--
Gabe Avakian Orona, Ph.D.

Postdoctoral Research Fellow
Hector Research Institute of Education Sciences and Psychology
University of Tübingen
gabrie...@uni-tuebingen.de

Ed Merkle

unread,
Jan 3, 2023, 12:21:51 PM1/3/23
to Gabe Avakian Orona, christop...@gmail.com, blavaan
In the past, I have noticed speed problems when the user does not specify priors, and the blavaan default priors do not work well for the user's data. This might especially happen if your variables have lots of scores far from 0 (say, in the 100s). I am planning to change the defaults so they are more similar to rstanarm or brms, which scale with your data.

Ed


christop...@gmail.com

unread,
Jan 4, 2023, 6:01:03 AM1/4/23
to blavaan
Thanks, Ed

I've sent data and R code in a separate email. Hopefull, it will be possible to replicate my result. 

I want to start teahing with non-informative priors, so no priors were added. 

One question, though. I would like to avoid more complex code in RStan, especially when giving an introduction to Bayes. But would using RStan directly be any faster? I understand that the Mplus team has developed some proprietary algorithms that make computations run very fast, also compared to Stata. So I think it's fairer to compare with Stata. Why is RStan/R this much slower than Stata?  (I don't think the implementation of SEM or Bayes is particularly good in Stata, so I would prefer teaching with R and blavaan for this reason too, in addition to the fact that R makes science more accessible than proprietary software does.) 

Best,
Chris

Ed Merkle

unread,
Jan 5, 2023, 2:18:33 PM1/5/23
to christop...@gmail.com, blavaan
Thanks for this example. I can reproduce that it is slow (many minutes, when one would hope seconds).

Here is what I think is happening: I have spent time optimizing estimation for bigger models that require SEM software. This has neglected many tricks that can be done for regression (and related models) to speed up model estimation. I know that lavaan flags regression models and does an estimation specific to those models, and this example has made it clear that I need to do the same thing in blavaan.

Stan can be fast for your model. For example, the brms code below uses Stan and finishes in seconds (after model compilation).

library(brms)
mb <- brm(stflife ~ agea + age2, data = data)

With some additions, blavaan should be able to get close to that.

And if you have other slow models that do not qualify as regression, I'd be interested to see those.

Thanks,
Ed

Ed Merkle

unread,
Feb 6, 2023, 2:18:00 PM2/6/23
to christop...@gmail.com, blavaan
Chris and others on this thread,

If you are able to install from github, please give the new version of blavaan a try and look at the speed improvements. It should especially be faster for large sample sizes of complete, continuous data.

Instructions to install are at the bottom of this page:


I am working on getting this on CRAN, but it is taking longer than I hoped.

Ed




On Wed, 2023-01-04 at 03:01 -0800, christop...@gmail.com wrote:

christop...@gmail.com

unread,
Feb 7, 2023, 12:34:14 PM2/7/23
to blavaan
> remotes::install_github("ecmerkle/blavaan", INSTALL_opts = "--no-multiarch")

---
... Dialogue box pops up: "Building R package from source required installation of additional building tools"
I answer yes
 ----

Downloading GitHub repo ecmerkle/blavaan@HEAD
Error: Failed to install 'blavaan' from GitHub:
Could not find tools necessary to compile a package
Call `pkgbuild::check_build_tools(debug = TRUE)` to diagnose the problem.

> pkgbuild::check_build_tools(debug = TRUE)
Trying to compile a simple C file
Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
clang -mmacosx-version-min=10.13 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I/usr/local/include   -fPIC  -Wall -g -O2  -c foo.c -o foo.o
clang -mmacosx-version-min=10.13 -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/Library/Frameworks/R.framework/Resources/lib -L/usr/local/lib -o foo.so foo.o -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation
ld: framework not found CoreFoundation
: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [foo.so] Error 1
Error: Could not find tools necessary to compile a package
Call `pkgbuild::check_build_tools(debug = TRUE)` to diagnose the problem.

---

I'm in a loop, it seems? as https://ecmerkle.github.io/blavaan/ states:
Compilation is required; this may be a problem for users who currently rely on a binary version of blavaan from CRAN.

Ed Merkle

unread,
Feb 7, 2023, 12:51:26 PM2/7/23
to christop...@gmail.com, blavaan
This generally means that your system requires extra tools outside of R to compile a Stan model. An advantage of installing from CRAN is that they handle this compilation step for you. So I think the question is whether you have the desire/bandwidth to install some extra stuff on your system:

If no, I hope to get the new version of blavaan on CRAN within a week.

If yes, then you would want to look at the RStan "getting started" materials, especially the part about configuring a C++ toolchain:


It can seem overwhelming if you've never done it before, but the process gets easier the more you do it.

Ed

christop...@gmail.com

unread,
Feb 7, 2023, 2:08:49 PM2/7/23
to blavaan
Thanks, Ed. In that case, I'll leave it for now. I don't think that procedure for installing Stan works for me, see here:

I own a copy of Mplus 8.8., but I generally like lavaan/blavaan (and R). I will stick to Mplus for a while, and hopefully return later. Sorry for not being able to test!

Best, 
Chris

christop...@gmail.com

unread,
Feb 17, 2023, 7:24:04 AM2/17/23
to blavaan
Ed, I've now been able to rerun the regression model mentioned above with blavaan (using the latest version). 

These were the original estimates:
blavaan 0.4-3: more than 9 minutes
Stata 17: 10.6 seconds
Mplus 8.8: 1 second

New result:
blavaan 0.4-6: 1 minute

Ed Merkle

unread,
Feb 17, 2023, 3:27:47 PM2/17/23
to christop...@gmail.com, blavaan
Thanks, it is better than before but could also use more improvement. Eventually, I will try to circle back and add some faster code that is dedicated to regression.

In the meantime, there is a chance that parallelization helps a bit more. In your bsem command, you could add

bcontrol = list(cores = 3)

Also, to parallelize post-estimation computations, you could try

library("future")
plan("multicore") ## mac or linux
plan("multisession") ## windows

Although I have heard that Rstudio might complain about the future package...

Ed
--
You received this message because you are subscribed to a topic in the Google Groups "blavaan" group.
To unsubscribe from this group and all its topics, send an email to blavaan+u...@googlegroups.com.

christop...@gmail.com

unread,
Feb 17, 2023, 6:11:52 PM2/17/23
to blavaan
Well, Ed, I think a reduction by 90% is a huge improvement! Congratulation.
Of course, it's still much slower than Stata (which is much slower than Mplus, and my ultimate goal is to stop using Mplus). 

But I was able to improve speed further: Going from estimations on an iMac Pro to estimations with an M1 processor on Macbook Air, I am now down to 29 seconds.
And then, using parallelization, I end up with 14 seconds on the M1 Macbook Air.

It's still much longer than on Stata with no parallelization, but I think your recent coding was a success. I'd be interested in knowing where the bottleneck is - blavaan or Stan?

Personally, I need SEM models much more than regression, also when using bayesian estimations. I have yet to test such models with blavaan after the update. I could report back once I know more. 

Christopher

Ed Merkle

unread,
Feb 17, 2023, 6:56:03 PM2/17/23
to christop...@gmail.com, blavaan
Thanks, the main bottleneck is figuring out how to code the Stan model for maximal speed/efficiency. In the traditional case, you cycle through each row of the data and evaluate the model likelihood for each row. But if you have complete data, you can use the sample covariance matrix and sample mean in place of each row of the data (at least, you can do it for SEM). If you have complete data and don't care about the means, you can use the sample covariance matrix and Wishart distribution. This gives better speed for large datasets because you no longer have to do computations for each row of the data.

And I might have already said it in a previous email, but if you don't care about the ppp (the model fit metric), then setting

test = "none"

should speed up some more.

Ed

Christopher Bratt

unread,
Feb 18, 2023, 1:33:14 AM2/18/23
to Ed Merkle, blavaan
 I very much care about the PPP ;-)

Thanks for the great work you’re doing. And thanks for explaining the challenge with missing data. One solution for me/us might be to try listwise deletion in an initial phase of more complex projects, and then include the full sample in the final estimations. I will try that, given moderate missingness at around 5%. (I guess this approach will be more problematic with a higher proportion of missingness.)

Vennlig hilsen,
Christopher Bratt

From: Ed Merkle <ecme...@gmail.com>
Sent: Saturday, February 18, 2023 12:55:59 AM
To: christop...@gmail.com <christop...@gmail.com>; blavaan <bla...@googlegroups.com>
Subject: Re: Blavaan incredible slow...speed-up solutions?
 

Gabe Avakian Orona

unread,
Jun 20, 2023, 7:30:06 AM6/20/23
to blavaan
Hello, 

Blavaan is incredibly slow for me. I have implemented the suggestions. Are there any developments on this front?

Gabe

Ed Merkle

unread,
Jun 20, 2023, 10:37:21 AM6/20/23
to Gabe Avakian Orona, blavaan
Hi Gabe, could you be more specific about your model? This thread was previously about regression models with thousands of observations, so I wonder whether you are doing that or something different.

Ed


Gabe Avakian Orona

unread,
Jun 21, 2023, 4:55:31 AM6/21/23
to blavaan
Hi Ed, thank you for your reply. I believe this thread was originally about 260 observations; but the issue still applies to thousands of observations. I am experiencing issues with about 300 observations. 

Mauricio Garnier-Villarreal

unread,
Jun 21, 2023, 9:35:26 AM6/21/23
to blavaan
Hi Gabe

Could you give us more details? The model, data characterictics, sessionInfo for example.

Also, how fast does Stan runs in general? To be sure that is not an issue with the general installation of Stan

take care

Gabe Avakian Orona

unread,
Feb 7, 2024, 3:31:12 AM2/7/24
to blavaan
Hi All, Blavaan runs incredibly slow. I'm running a rather simple cfa model; meanwhile, Stan in general (stan_glm function) runs at light speed. 

Ed Merkle

unread,
Feb 7, 2024, 9:30:15 AM2/7/24
to blavaan
Gabe, we continue to make improvements, and it would be helpful to see model syntax and details about the data to see where further improvement is needed. Ordinal data, missing data, and prior-data conflict are some common slowdowns.

Best,
Ed

Gabe Avakian Orona

unread,
Feb 7, 2024, 9:50:00 AM2/7/24
to Ed Merkle, blavaan
Hi Ed, 

Thank you for your message. Yes, I have some ordered variables, but the issue has remained in non-ordered variables. The data have a lot of missing data, about 53 percent on a data file of 364 participants. Here is some syntax.

image.png

Also, is there a way to get a posterior distribution from a generated parameter (such as an indirect effect in mediation analysis)? 

Thank you again for replying back to me.

Best,
Gabe

Reply all
Reply to author
Forward
0 new messages