Could you please attach your input and output files, the `ckpt` file, and your control file so that we can try to reproduce the error you are getting? It seems that something may have gone wrong when reading the `mcmc.txt` file, but I may be wrong. The main key points to bear in mind when enabling checkpointing are the following:
- Option to specify in the control file: `checkpoint = 1 * 0: nothing; 1 : save; 2: resume`. By default, `checkpoint = 0`, and so checkpointing is not enabled.
- When `checkpoint = 1`, checkpointing is enabled. Note that a memory image is not saved. Instead, the current state of the Markov chain (e.g., divergence times, rates for loci, step lengths) are saved in a file called `mcmctree.ckpt`. The conditional probability vectors are not saved; they are recalculated when the run is resumed. The current implementation saves the states at every 10th percentile during the MCMC iteration and, if the `mcmctree.ckpt` file already exists, it will be overwritten.
- When `checkpoint = 2`, the program will first allocate memory after parsing the sequence alignment and then will try to locate the `mcmctree.ckpt` file. If found, then MCMCtree will fix the state of the Markov chain by reading the `mcmctree.ckpt`, which will have the last saved state of the chain and restart the MCMC from that point by setting `burnin = 0`. In essence, MCMCtree will be using the last saved parameter values as the initial values. Then, it will run until it collects `nsample * sampfreq` samples or until the job is killed (e.g., exceeded allocated wall time, manually killed, etc.).
- IMPORTANT: When you run MCMCtree when using `checkpoint = 2`, please make sure that you have saved the results of your first run when running `checkpoint = 1` because all output files will be overwritten. I suggest you create a directory called e.g. `ckpt1` (or whatever name you prefer), copy the output files there (you can also copy everything there, whatever you prefer and best suits your PC requirements), and then run MCMCtree after modifying the control file to have `checkpoint = 2` in a separate directory. Please note that you will need to manually combine the samples collected in the first run and the resumed run (i.e., the samples saved in the `mcmc.txt` files you shall have at the end of both runs).
At the moment, as pointed out by
Gustavo in the PAML GitHub repository, checkpointing seems to only be able to resume once. If you have a very long alignment and need to resume more than once, it may be worth exploring the suggestion Gustavo made in the repository.
Hope this helps to better understand the usage of checkpointing in MCMCtree. If you want us to further troubleshoot what may have gone wrong with your analysis, please send us the files requested above so that we can try to find out what may have gone wrong!
All the best,
Sandy