file rst2 not found!

Louis Nastasi

unread,

Nov 1, 2024, 9:38:30 AM11/1/24

to PAML discussion group

Hi folks,

I'm trying to run mcmctree for divergence time estimation for a UCE dataset with ~370 taxa, and I have been encountering the following error:

*** Locus 1 ***
running baseml tmp0001.ctl
file rst2 not found!

I can't seem to find a file called rst2 in the paml installation - is this something that resulted from an error during installation? Or is this something that I need to create myself? I haven't been able to find anything about rst2 in the documentation, wiki, or discussion group.

I can provide my control file, tree, etc., but I suspect that they are not relevant to this specific error.

Thanks in advance!

Louis Nastasi

unread,

Nov 1, 2024, 9:44:14 AM11/1/24

to PAML discussion group

Control and tree attached here just in case

mcmctree (1).ctl

finaltreefordivergence (3).treefile

Sandra AC

unread,

Nov 1, 2024, 10:11:54 AM11/1/24

to PAML discussion group

Hi Louis,

Thanks for reaching out! I assume that you are trying to use the approximate likelihood calculation in MCMCtree and there may be issues with the so-called "in.BV" file. Everything that I am writing below is a summary of what you can also find in Mario dos Reis' and Ziheng Yang's tutorial on "Bayesian Molecular Clock Dating Usinge Genome-Scale Datasets".

Firstly, you need to run MCMCtree`with `usedata = 3`, which will actually call BASEML or CODEML, depending on the sequence file having nucleotide or protein data respectively, to estimate the branch lengths, the gradient, and the Hessian (vectors and matrix required to later approximate the likelihood calculation). By looking at your control file, I think you have nucleotide data, and so it will be BASEML in your case. Subsequently, BASEML/CODEML save the aforementioned vectors and matrix in a file called `rst2` (content also copied in the output file `out.BV`; you shall see both output files at the end of the run). Now, you have everything you need to start sampling from the posterior distribution with MCMCtree. As Mario and Ziheng recommend in their tutorial, you can copy this `out.BV` file in a different directory and rename it to `in.BV`. Then, you can copy there your control file and your input files (or, if you prefer, you can add relative paths to your sequence and tree files in their corresponding options in the control file and keep your input files in their original directory) and set the control file option `usedata` to `usedata = 2`. When you now run MCMCtree in this second directory, then the program will read the `in.BV` file; extract the MLEs of branch lengths, gradient, and Hessian estimated by BASEML or CODEML; and approximate the likelihood calculation during MCMC, thus saving computational time.

Please let us know if, when following the procedure detailed in the tutorial (you can also clone Mario's `divtime` GitHub repository to analyse the example data while going through the tutorial), you manage to run MCMCtree when enabling the approximate likelihood calculation!

All the best,
Sandy

P.S.: Even though it is not mentioned in the tutorial, the PAML documentation (p.48 at the time of writing) highlights that the second argument that you can specify when enabling option `usedata` is the path to the `in.BV` file (i.e., `usedata = 2 inBVfilename`). Sometimes, I have my `in.BV` files elsewhere in my file structure, and so my pipelines replace this argument with the absolute path to wherever my `in.BV` file is -- something you may want to consider if you are planning on writing your own pipelines! Otherwise, MCMCtree will always assume that the file with the MLEs of branch lengths, gradient, and Hessian is called `in.BV` and is saved in the same directory as the one where the control file is found.

P.P.S.: If you want to follow other pipelines and/or reuse scripts that I have used when working on reproducible timetree inference projects, you may want to take a look at my latest `LUCA-divtimes` GitHub repository. You can adapt my in-house scripts to work with your own dataset(s) and/or use them to launch PAML programs on HPCs. Make sure that, if you have nucleotide data, you also adapt my scripts to work with BASEML as, in this specific project, I used CODEML!

Louis Nastasi

unread,

Nov 1, 2024, 4:34:24 PM11/1/24

to PAML discussion group

Hello Sandy,

Thank you very much for your detailed reply! I greatly appreciate your time in responding. Yes, I am attempting to use the approx likelihood option, and my data is nucleotide data. I have reviewed the recommended tutorial, and I'm still not sure where my problem lies. I am running MCMCtree with 'usedata = 3' and the program still seems to be expecting the rst2 file to be present as mentioned in my original post. When I try running it, the out.BV file has been created but is empty (size 0 bytes). The out.txt file is too large to append here, but this is the end of it (after the alignments, site pattern counts, etc.

"Homogeneity statistic: X2 = 0.10622 G = 0.10630
Average 0.26607 0.23052 0.26870 0.23471
(Ambiguity characters are used to calculate freqs.)
# constant sites: 0 (0.00%)"

Overall, it's not clear to me why the initial run with usedata = 3 is not working; I now understand that a second run with usedata = 2 will be necessary afterward!

Before sending this message, I did notice that fig. 2 in the dos Reis & Yang tutorial has a reduced set of parameters in the control file - I'll try to run it again with only the parameters specified there.

Thanks for any further advice you can provide!

Louis

Louis Nastasi

unread,

Nov 1, 2024, 4:53:23 PM11/1/24

to PAML discussion group

I tried to run it with only the set of parameters mentioned above but it produced the same error - rst2 is still causing issues.

Louis Nastasi

unread,

Nov 3, 2024, 1:24:37 PM11/3/24

to PAML discussion group

I have tried a few more times, and I am still receiving the error of rst2 being missing. This is my current control file:

seqfile = final-divergence-alignment.phylip-relaxed
treefile = finaltreefordivergence.treefile
ndata = 1
seqtype = 0
usedata = 3 *uses sequence data and invokes approx likelihood
clock = 2 *independent rates model
model = 4 *4 model is HKY85
alpha = 0.5
ncatG = 5
cleandata = 0

I'm not sure why it's asking for the rst2 file - I'm following all the steps outlined in the dos Reis & Yang tutorial, and using the command ./mcmctree mcmctree.ctl

Any thoughts on what might be wrong here?

Sandra AC

unread,

Nov 12, 2024, 6:07:40 AM11/12/24

to PAML discussion group

Hi Louis,

After doing some tests, I believe that your issue may have to do with the format of your input sequence file -- perhaps you have an interleaved format and that may not work well? To check whether that is indeed the issue, you can reformat your input sequence alignment so that you have one sequence per line instead. E.g.:

```

n_taxa n_sites

sp_1      aaccatcgg [...]

sp_2      aaccatcgg [...]

[...]

sp_n      aaccatcgg [...]

```

Hope that the alignment was indeed the issue and now you can run MCMCtree without problems. Nevertheless, please let us know if you encounter other problems!

All the best,
S.

Reply all

Reply to author

Forward