Hi Louis,
Thanks for reaching out! I assume that you are trying to use the approximate likelihood calculation in MCMCtree and there may be issues with the so-called "in.BV" file. Everything that I am writing below is a summary of what you can also find in
Mario dos Reis' and Ziheng Yang's tutorial on "Bayesian Molecular Clock Dating Usinge Genome-Scale Datasets".
Firstly, you need to run MCMCtree`with `usedata = 3`, which will actually call BASEML or CODEML, depending on the sequence file having nucleotide or protein data respectively, to estimate the branch lengths, the gradient, and the Hessian (vectors and matrix required to later approximate the likelihood calculation). By looking at your control file, I think you have nucleotide data, and so it will be BASEML in your case. Subsequently, BASEML/CODEML save the aforementioned vectors and matrix in a file called `rst2` (content also copied in the output file `out.BV`; you shall see both output files at the end of the run). Now, you have everything you need to start sampling from the posterior distribution with MCMCtree. As Mario and Ziheng recommend in their tutorial, you can copy this `out.BV` file in a different directory and rename it to `in.BV`. Then, you can copy there your control file and your input files (or, if you prefer, you can add relative paths to your sequence and tree files in their corresponding options in the control file and keep your input files in their original directory) and set the control file option `usedata` to `usedata = 2`. When you now run MCMCtree in this second directory, then the program will read the `in.BV` file; extract the MLEs of branch lengths, gradient, and Hessian estimated by BASEML or CODEML; and approximate the likelihood calculation during MCMC, thus saving computational time.
Please let us know if, when following the procedure detailed in the tutorial (you can also
clone Mario's `divtime` GitHub repository to analyse the example data while going through the tutorial), you manage to run MCMCtree when enabling the approximate likelihood calculation!
All the best,
Sandy
P.S.: Even though it is not mentioned in the tutorial, the
PAML documentation (p.48 at the time of writing) highlights that the second argument that you can specify when enabling option `usedata` is the path to the `in.BV` file (i.e., `usedata = 2 inBVfilename`). Sometimes, I have my `in.BV` files elsewhere in my file structure, and so my pipelines replace this argument with the absolute path to wherever my `in.BV` file is -- something you may want to consider if you are planning on writing your own pipelines! Otherwise, MCMCtree will always assume that the file with the MLEs of branch lengths, gradient, and Hessian is called `in.BV` and is saved in the same directory as the one where the control file is found.
P.P.S.: If you want to follow other pipelines and/or reuse scripts that I have used when working on reproducible timetree inference projects, you may want to take a look at
my latest `LUCA-divtimes` GitHub repository. You can adapt my in-house scripts to work with your own dataset(s) and/or use them to launch PAML programs on HPCs. Make sure that, if you have nucleotide data, you also adapt my scripts to work with BASEML as, in this specific project, I used CODEML!