Hi all,
I've been having some issues getting ExaBayes to run recently in a few ways. I'm using the CIPRES gateway to access ExaBayes 1.5.1.
The first issue arose when the MCMC sampled the first generation, then seemingly produced no outputs whatsoever over the next 24 hours. I saw that there was a warning ("
WARNING! The number of processes and number chains run in parallel
are not a multiple of the overall number of processes. Parts of the
code may be executed less efficiently."), so I adjusted the number of couple chains to be appropriate.
On running the analysis again, the MCMC proceeded to generation 500, then failed. The stderr has the following messages:
exabayes: ./src/mcmc/CoupledChains.cpp:383: void CoupledChains::executePart(uint64_t, uint64_t, ParallelSetup&): Assertion `_chains[remoteId].getGeneration() == swap.getGen()' failed.
exabayes: ./src/mcmc/CoupledChains.cpp:383: void CoupledChains::executePart(uint64_t, uint64_t, ParallelSetup&): Assertion `_chains[remoteId].getGeneration() == swap.getGen()' failed.
Please see the full stderr.txt attached. I am looking for any insight on how to ameliorate this issue.
I have previously run ExaBayes successfully using more parallel processes, more heated chains, on a larger data set with more partitioning, so I intuitively feel that the data amount or complexity is not a major contributor here. It may be notable that my prior successful analyses were on version 1.5.0.
I can provide alignment and partition stats or even the whole files if necessary. For now I have attached the stderr, stdout, and the config file used.
Thank you!
-Ziv