Hi Victor,
Thanks for your message!
I think the issue may lie in the text file you are using to specify the node age constraints as well as in your input tree file. The R script that you are using takes an input file where users need to specify the node age constraints following a specific format. E.g.: you can check
the "calibrations.csv" file that I have in my tutorial, which format follows some rules (they are also written in the R script):
- Header
- One row per calibration
- No spaces at all, semi-colon separated
- There are a total of 4 columns:
- Name of calibration (no spaces!)
- Name of tip 1 that leads to MRCA (no spaces!)
- Name of tip 2 that leads to MRCA (no spaces!)
- Calibration in `MCMCtree` format. Please check the PAML documentation for more details on how to set up node age constraints!
Please note that the first tree you have shared has an additional parenthesis after the semicolon, which I believe may just be a typo (but, if not, make sure you remove it as this may be causing issues when running MCMCtree!):
```
(Gve,((Dli,(Sca,Nve)),(Ppe,((Ryu,Rfl),((Pve,(Mme,Dcy)'B(.66,1.65,1e-300,0.025)'),(Pcy,(Ssi,Ami)))))))'>4.70<5.38';)
```
Assuming that the last parenthesis is a typo, you are constraining the root age with a soft-bound calibration that has a minimum age of 4.70x100Myr and 5.38x100Myr. This means that, in the text file that you may have used to run the R script, you may have written a row such as the following (`Dli` could be any other tip as it would lead to the root age when specifying `Gve` as the other tip):
```
root;Gve;Dli;'>4.70<5.38'
```
Please note that you have also specified a root age constraint in your control file (i.e.,
RootAge = 5.38), but the format you have used is incorrect: you are actually fixing the root age to be 5.38x100 Myr (i.e., you are not using a calibration density). I believe you wanted to use an upper-bound calibration (?) and write '<5.38' (or 'U(5.38)') instead. As explained in the
PAML Wiki under bullet point "rootAge", if users have specified a root age constraint both in the tree file and the control file, the root age constraint in the tree file shall prevail. Therefore, when you ran the analysis with the first tree, you never used an upper-bound calibration for the root age constraint: instead, you used a soft-bound calibration (i.e., 'B(4.70,5.38)' or '>4.70<5.38'). If this soft-bound calibration is supposed to be constraining another node age and the upper-bound calibration you had in your control file was indeed supposed to be used as a root age constraint, you will have to correctly identify the tips that lead to such nodes in the text file that my R script uses as aforementioned. E.g.:
```
name;tip1;tip2;MCMCtree
int_calib1;Mme;Dcy;'B(.66,1.65,1e-300,0.025)'
int_calib2;<tip1>;<tip2>;
'>4.70<5.38'
root;Gve;Dli;'<5.38'
```
To clarify the confusion that may have stemmed from following
the tutorial you mention, please note that there are two node age constraints specified in the input tree file, but the root age constraint is specified in the control file (i.e.,
RootAge = '<1.0') -- perhaps, this may have confused as you say "
places the root calibration at the ingroup node, not on the outgroup node", but the root age constraint is placed right after the last parenthesis and before the ";" that finishes the Newick format; it is just only specified in the control file (this PAML tutorial was written many years ago for an older PAML version).
As recommended in the PAML Wiki and for the most recent PAML versions (i.e., >= v4.10), it is best to always have constraints specified in the tree file, including the root age constraint (see also my comments above regarding your data). To this end, the tree file that you would use in later PAML versions would be the following (while also you would need to remove variable
RootAge = '<1.0' from the control file):
```
7 1
((((human, (chimpanzee, bonobo)) '>.06<.08', gorilla),
(orangutan, sumatran)) '>.12<.16', gibbon) '<1.0';
```
I can see that you have an "in.BV" file, which means you may have generated this file by running `usedata = 3` before -- if you have done this, please note that variables `alpha_gamma` and `kappa_gamma` are not needed in the control file (they will be ignored if not needed, but always best to keep the control file clean); they are only used when the exact likelihood calculation is used (`usedata = 1`). I can see you have many alignment blocks as you have a partitioned AA dataset, and so it is important that the "in.BV" file has the blocks of branch lengths, gradient, and Hessian in the same order as you have your alignment blocks (i.e., the branch lengths, gradient, and Hessian estimated for alignment block 1 should go in the first block in the "in.BV" file, then those for the second alignment block should go after, etc.). You must also use the same tree topology you are using during timetree inference -- if you are using a different dataset and a different tree topology, you must re-estimate the branch lengths, the gradient, and the Hessian. You can find all these details and more in the
PAML Wiki.
You may also want to check whether the priors you specified in the control file are sensible and have been specified based on your data. You have some suggestions about how you could find sensible priors throughout the
PAML Wiki too. You may also want to check the MCMC settings to make sure that you are collecting enough samples (i.e., you have "nsample = 10000", perhaps you could increase up to 20,000 if you find that the ESS is too low?) and that the sampling frequency is high enough (so far, you are sampling every 10 iterations; perhaps you may need to increase this value if autocorrelation is too high). You can find some more details in the
PAML Wiki under bullet point "burnin, sampfreq, nsample". You can check
this tutorial on MCMC to get more familiar with these settings, while I suggest you read the paper
"A biologist's guide to Bayesian phylogenetic analysis" (Nascimento et al. 2017).
Please let us know whether the suggestions above answer your questions and whether you can now run MCMCtree with the intended node age constraints! If you have any other questions, please let us know :)
All the best,
Sandy