Hi all,
I'm studying how increasing the number of chains affects cross-chain warmup algorithms, such as ChEES-HMC and others. I may for instance look at the Monte Carlo estimator produced by ChEES-HMC using 32 chains after N iterations / gradient evaluations.
To estimate metrics of interest, I run MCMC many times and average across runs. To have good estimates, I want the number of runs to be large (say 100's or 1000's). If possible, I'd like to do all the runs in parallel on a GPU.Â
I can pass nChains * nRuns chains to TFP and then split between groups, but then ChEES-HMC uses all chains to do adaptation. Is there a way to enable adaptation only within subgroups of chains? Of course, the chains will then not all be synchronous (different number of leapfrog steps,...)
Maybe it makes more sense to parallelize a for loop that calls TFP multiple times (e.g. each time with 32 chains).Â
Any tips welcomed!
Charles