by far more recent nodes than expected (including those that were dated)

11 views
Skip to first unread message

Alexander Fedosov

unread,
Sep 11, 2025, 5:46:26 AM (5 days ago) Sep 11
to PAML discussion group
Hi,
I ran into an issue: in my analysis, nodes are estimated by mcmctree like 3 times younger than they are supposed to be (including the nodes that were actually calibrated based on the fossil record). What could have happened? I'm happy to provide the logfile, the input treefile, the alignment, the ctrl file and the resulting FigTree output. Would greatly appreciate your comments. Thanks, /alexander

Sandra AC

unread,
Sep 11, 2025, 6:10:28 AM (5 days ago) Sep 11
to PAML discussion group
Hi Alexander,

Thanks for your message! While it is not easy to troubleshoot what may have happened without seeing the input/output files (please do share them if you have them! :) ), here you have some ideas about what you could check when troubleshooting unexpected results:
  • Node age constraints (calibrations): what distributions have you used to constrain node ages and what information have you used to derive such constraints? Have you made sure that the node age constraints you have used are not in conflict? What is your root age constraint? The ideal scenario would have as many node age constraints as possible to better inform the clock, but that is not always the case (e.g., the fossil record may be quite sparse for some species).
  • Versioning: what PAML version are you using? Are you using the latest PAML version or an older one? It is always recommended to use the latest PAML version; you can download the latest PAML release from the PAML GitHub and check what has changed when compared to older versions.
  • Priors: how have you specified the time prior, the rate prior, and the prior for rate variance? Using default values is not recommended, so you should always set these priors based on your dataset.
  • MCMC settings: how long have you been running the chain for and how often did you log a sample in the "mcmc.txt" file? Have you checked whether your chains have reached convergence? E.g., you can use R functions and/or Tracer to run MCMC diagnostics after Bayesian timetree inference.
  • Input files: check that the node age constraints are placed where you want them to be in your input tree file (e.g., use software such as TreeViewer or FigTree to visualise that they are indeed constraining the node you want). Make sure that your sequence alignments have undergone all necessary checks and quality control filters -- MCMCtree assumes all these steps have already been taken care of!
  • Control file: you can check the MCMCtree section in the PAML Wiki with more details about each of the options that can be included in the control file to make sure that they are based on your data and not default values.
I guess that, once you share the files, I (and other members of the PAML community) can make a better guess at what may be going on :) 

All the best,
Sandy

P.S.: You can also read this other post on the PAML discussion group with complementary details to your question.

Alexander Fedosov

unread,
Sep 11, 2025, 7:31:33 AM (5 days ago) Sep 11
to PAML discussion group
Hi Sandy,
Many thanks for your tips! Here are my inputs. I first ran an analysis with 10^7 generations (~10 days), and received these strange dating intervals. Then I was running shorted trees, piped with tee to save a log (attached latest run files). Results are the same. Remarkably, when I run a more complex inference on an unpartitioned aa data (for a completely different project), the results are alright. Thanks for your help,
Alexander
mcmctree_part2.ctl
rtree_2cal_values.nw
part_phylip-.phy
FigTree2.tre

Sandra AC

unread,
Sep 11, 2025, 7:53:21 AM (5 days ago) Sep 11
to PAML discussion group
Hi Alexander,

Thanks for sending your input, control, and output files!

I have had a quick look at everything you have shared and there are some things you may want to look into:
  • Your input tree file does not have a root age constraint. I think your time unit is 1Myr but, as recommended in the PAML Wiki (see "Overview"), I would suggest you use a time unit of 100Myr (i.e., divide the max. and min. ages in your soft-bound calibrations into 100). Unfortunately, your control file has included the "RootAge" variable and is set to the default value of "<1.0". As you can see, the time unit that you have used to constrain the node ages of your tree does not match the time unit that you have set in the control file. We have recommended users to avoid including the "RootAge" variable in the control file when working with molecular data. Instead, we recommend including the root age constraint directly in the input tree file alongside the rest of node age constraints (please read our PAML Wiki for details, check the explanation under the section for variable "RootAge"). I believe this is the main issue with your current analyses!
  • Your input sequence data file only has one alignment block, yet your control file has set "ndata = 11". Have you attached a control file for another analysis? The control file should have "ndata = 1" if you only have one alignment block :)
  • You are also sampling every iteration and collecting no more than 10,000 samples. We recommend having between 10,000 and 20,000 samples so, if you see issues with convergence, I suggest you use "nsample = 20000". In addition, I suggest you increase your sampling frequency to avoid autocorrelation (e.g., you can use 100, but sometimes you may need to further increase this sampling frequency; you always need to run MCMC diagnostics and check what is going on with your data). The total number of iterations you run equals to "burnin + sampfreq*nsample".
  • I am not sure where you have gotten the alpha and beta values for your priors, but make sure that they do really align with your dataset. There are some recommendations in the PAML Wiki too.
  • I can see that you are also using "model = 0", which is the simplest model of nucleotide subsitution (JC69). You may want to use "model = 4" with a more complex evolutionary model (HKY85). If your analyses become computationally expensive, you may consider running your analyses when approximating the likelihood calculation instead of using the exact likelihood calculation (i.e., this is what you are enabling by specifying "usedata = 1" in the control file). More details about approximating the likelihood calculation in the PAML Wiki (section "Approximation the likelihood calculation) and in our latest PAML tutorial (we used AA data for this protocol but, if you use BASEML instead of CODEML, the tutorial would still hold for nucleotide data).
Hope this helps!
Sandy

Alexander Fedosov

unread,
Sep 11, 2025, 8:04:33 AM (5 days ago) Sep 11
to PAML discussion group
Sandy, many thanks! I will check the recommendations and reconfigure my analyses, and I'm sure, it will do the right job! I will get back, when I have new results. Many thanks again,
Alexander

Reply all
Reply to author
Forward
0 new messages