Thanks for starting this thread. I have been exploring ways to improve the speed of bootstrapping in lavaan. It would be great to learn something from this discussion.
There are several separate issues I would like to discuss. Therefore, I would like to discuss each of them in a separate post.
First, on future.apply.
I am new to this package. Therefore, please correct me if I am wrong about this and related packages.
I took a quick look at the documentation. This is roughly what happens when we call plan(multisession), as far as I understand. It will start a certain number of workers (separate R processes). According to the help page, it calls parallel::makeClusterPSOCK(). That is, it does what parallel::makeCluster() does, though maybe with some additional settings (like exporting the environment, discussed below):
Therefore, I would like to compare future.apply::future_sapply() with the parallelized version of sapply, parallel::parLapply().
I revised simpleBoot() to simpleBoot2(), adding parallel::parLapply() as one of the method:
simpleBoot2 <- function(object,
R,
method = c("future.sapply",
"sapply",
"parSapply"),
cl = NULL) {
lavdata <- object@Data
lavmodel <- object@Model
lavsamplestats <- object@SampleStats
lavpartable <- object@ParTable
lavoptions <- object@Options
lavoptions$check.start <- FALSE
lavoptions$check.post <- FALSE
lavoptions$optim.attempts <- 1L
lavoptions$baseline <- FALSE
lavoptions$h1 <- FALSE
lavoptions$loglik <- FALSE
lavoptions$implied <- FALSE
lavoptions$store.vcov <- FALSE
lavoptions$test <- "none"
lavoptions$se <- "none"
if(method == "sapply") {
out <- sapply(1 : R, function(i) {
coef(doBoot(lavdata, lavmodel, lavsamplestats, lavpartable, lavoptions))
})
}
if (method == "parSapply") {
out <- parallel::parSapply(cl = cl,
1 : R,
function(i) {
coef(doBoot(lavdata, lavmodel, lavsamplestats, lavpartable, lavoptions))
})
}
if (method == "future.sapply") {
out <- future_sapply(1 : R, function(i) {
coef(doBoot(lavdata, lavmodel, lavsamplestats, lavpartable, lavoptions))
}, future.seed = TRUE)
}
t(out)
}
I omitted sapply() in the following tests because it obviously must be slow. It uses only one CPU core.
To ensure all comparisons use the same number of workers, I set workers to 8 when calling plan()
plan(multisession,
workers = 8)
I also used R = 4000L.
This is the result on my computer for simpleBoot() with future.sapply = TRUE:
> system.time({simpleBoot(fit, R = R, future.sapply = TRUE)})
user system elapsed
0.08 0.00 7.22
To compare with parSapply(), I need to create the cluster of workers first and set up the environment. As far as I understand, plan(multisession) will start the cluster right away. You can see the sessions named Rscript when you call plan(multisession), eight such sessions in my case. Therefore, it is fair to exclude the creation of the cluster from the timing.
I am unsure when the objects in the environment (the global environment, by default) will be passed to the workers when using future_sapply(). Therefore, I also included the calls to export these objects in system.time(). These are the results:
> my_cl <- makeCluster(8)
> system.time({
+ clusterExport(cl = my_cl,
+ c("doBoot", "fit"))
+ clusterEvalQ(cl = my_cl,
+ library(lavaan))
+ simpleBoot2(fit, R = R, method = "parSapply", cl = my_cl)
+ })
user system elapsed
0.03 0.02 6.89
> stopCluster(my_cl)
The difference is minor. I ran the above two tests several times. No noticeable consistent differences between them. (I could have used microbenchmark or similar packages but these casual tests should be sufficient for our purpose.)
The calls clusterExport() and clusterEvalQ() are necessary for this test unless I further revise simpleBoot(). However, I want to make the comparison clean, so I did not do so. Actually, future_sapply() and/or multisession() also need to do these things (exporting things or loading packages in the workers), although they do these automatically. I am not sure when this happens, when calling future_sapply(), or when the workers are created by multisession().
I think the purpose of futureverse packages is to make parallel processing accessible, which is an excellent idea:
But being fast may not be its goal (at least not the main one). If speed is our goal, then we can just implement parallel processing in our functions, as I did above using parallel::parLapply(), unless we want to make use of the flexibility of the interface of futurevers packages.
-- Shu Fai