Hello,
Having used Coupled MCMC (or MC3) quite a bit in combination with Starbeast3, I've noticed something curious about it. While it definitely improves mixing, it can take a very long time to converge initially. For example, in a dating analysis with all nodes constrained, the chain sometimes runs for 20+ million generations before it begins to sample from the target distribution, and before that it's mostly nonsensical areas of tree space, with repeated swaps into a more optimal space that don't stick- see screenshot. This is particularly noticeable with a dating analysis where all nodes are constrained, but it seems to happen to some degree with any analysis. It's sometimes a problem because if I want to limit burn-in to 10% for example, but the first 30 million states are nowhere near the target distribution, then I would need to run for 300 million states. I could stop it earlier since the ESS does go up pretty fast, but that would mean a pretty large burn-in percentage that may be frowned upon.
It seems like it's partly explained by having multiple heated chains that are all swapping with each other, keeping the overall swap rate near the target of 0.234, but not swapping with the cold chain at all. Indeed, I've seen a run go for 10+ million generations without a single cold chain swap. When I use only one heated chain, so all swaps must involve the cold chain, this results in faster convergence although it can still take a long time.
Anyway, this is contrasted with normal MCMC, where it avoids ever going into really bizarre areas of tree space but can sometimes have trouble switching between alternate topologies that both have some support and should be in the final tree set- which is why I started using MC3 in the first place.
Is this expected behavior for MC3, or might tuning some parameters help? I feel a little stuck, having the options of running regular MCMC, which may not give me a full representation of the uncertainty in the data, or MC3, which not only has a slower runtime but may need to be run for 2-3x as many generations to be able to keep the burn-in to a reasonable percentage.
Thanks!