mcmctree error and checkpoint

67 views
Skip to first unread message

keigo ww

unread,
Oct 31, 2024, 10:21:43 AM10/31/24
to PAML discussion group
Dear all, 

When running mcmctree, I first added checkpoint = 1, but after it finished running, I encountered an error: error in scanfile() file format error: 155 fields in line 7900 while 41478 fields in the first line. However, I am not clear about the cause of this error. 
 So I tried running it a second time, adjusting checkpoint = 2. How can I confirm that it continued running by reading the results from the previous run?  

best,
yiying 

mcmctree.png

Sandra AC

unread,
Nov 1, 2024, 6:54:46 AM11/1/24
to PAML discussion group
Hi Yiying,

Could you please attach your input and output files, the `ckpt` file, and your control file so that we can try to reproduce the error you are getting? It seems that something may have gone wrong when reading the `mcmc.txt` file, but I may be wrong. The main key points to bear in mind when enabling checkpointing are the following:
  • Option to specify in the control file: `checkpoint = 1    * 0: nothing;  1 : save;  2: resume`. By default, `checkpoint = 0`, and so checkpointing is not enabled.
  • When `checkpoint = 1`, checkpointing is enabled. Note that a memory image is not saved. Instead, the current state of the Markov chain (e.g., divergence times, rates for loci,  step lengths) are saved in a file called `mcmctree.ckpt`. The conditional probability vectors are not saved; they are recalculated when the run is resumed. The current implementation saves the states at every 10th percentile during the MCMC iteration and, if the `mcmctree.ckpt` file already exists, it will be overwritten.
  • When `checkpoint = 2`, the program will first allocate memory after parsing the sequence alignment and then will try to locate the `mcmctree.ckpt` file. If found, then MCMCtree will fix the state of the Markov chain by reading the `mcmctree.ckpt`, which will have the last saved state of the chain and restart the MCMC from that point by setting `burnin = 0`. In essence, MCMCtree will be using the last saved parameter values as the initial values. Then, it will run until it collects `nsample * sampfreq` samples or until the job is killed (e.g., exceeded allocated wall time, manually killed, etc.).
  • IMPORTANT: When you run MCMCtree when using `checkpoint = 2`, please make sure that you have saved the results of your first run when running `checkpoint = 1` because all output files will be overwritten. I suggest you create a directory called e.g. `ckpt1` (or whatever name you prefer), copy the output files there (you can also copy everything there, whatever you prefer and best suits your PC requirements), and then run MCMCtree after modifying the control file to have `checkpoint = 2` in a separate directory. Please note that you will need to manually combine the samples collected in the first run and the resumed run (i.e., the samples saved in the `mcmc.txt` files you shall have at the end of both runs).
At the moment, as pointed out by Gustavo in the PAML GitHub repository, checkpointing seems to only be able to resume once. If you have a very long alignment and need to resume more than once, it may be worth exploring the suggestion Gustavo made in the repository.

Hope this helps to better understand the usage of checkpointing in MCMCtree. If you want us to further troubleshoot what may have gone wrong with your analysis, please send us the files requested above so that we can try to find out what may have gone wrong!

All the best,
Sandy
Reply all
Reply to author
Forward
0 new messages