Issue with Divergence Time Estimation in MCMCTree: Minimum Constraint Not Respected

67 views
Skip to first unread message

keigo ww

unread,
Feb 18, 2025, 2:57:08 AMFeb 18
to PAML discussion group

Dear all,

I am using MCMCTree to estimate divergence times, and I encountered an issue regarding the minimum age constraint. Specifically, I set the minimum divergence time for Shepherdia to 28 Ma, without specifying a maximum bound. However, after running the analysis, the estimated divergence time for this clade was 26.74 Ma, which is younger than the minimum constraint. The 95% highest posterior density (HPD) interval for this node is 20.22–31.92 Ma.

I expected the estimated age to be at least 28 Ma, given the constraint. Is this behavior normal in MCMCTree? 

This is my calibration setting:

  (Shepherdia_canadensis_America,(Shepherdia_argentea_Canada,(Shepherdia_argentea,wu_Shepherdia_argentea)))'>.28'  

  This is my MCMCTree run result:  

估算结果.png

估算结果(置信区间).png

best,

yiying

Sandra AC

unread,
Feb 19, 2025, 9:54:48 AMFeb 19
to PAML discussion group
Dear Yiying,

Thanks for your message! 

Firstly, I would like to know whether the results you have summarised correspond to the analysis with data (e.g., `usedata = 1` or `usedata = 2 <path_to_inBV>`) or without data (i.e., `usedata = 0`). 

Please note that you need to run the analysis first without data to make sure that the marginal density (effective prior) correspond to what you expect based on your calibration densities to constrain node ages (user-specified prior). You can load the "mcmc.txt" file output by MCMCtree onto Tracer (or use in-house scripts in R or Python) to inspect the densities resulting from an analysis without data to have a better understanding of how the joint time prior may look like -- you should not only inspect the densities for those nodes that you calibrated, but you should also check the resulting densities for non-calibrated nodes!

E.g.: the plot below shows the densities for 4 chains that I ran with MCMCtree for an example dataset of 7-taxa phylogeny with the following constraints (you can check the tutorial I wrote when using this dataset via this link):
  • t_n8: root age, constrained with upper bound U(1.0) (maximum age = 1.0; time unit = 100Mya | 100 Mya)
  • t_n9: last common ancestor of great apes (all taxa but gibbon), node constrained with soft bound B(.12,.16) (minimum age = .12, maximum age = .16; time unit = 100 Mya | 12 - 16 Mya)
  • t_n11: last common ancestor of human, chimpanzee, and bonobo; node constrained with soft bound B(.06,.08) (minimum age = .06, maximum age = .08; time unit = 100 Mya | 8 Mya).
ex2mcmctree.jpg
examplemcmctree.jpg

As you can see above, the calibrated nodes seem to be within the expected specified bounds and the marginal densities for uncalibrated nodes look sensible too. If you run MCMCtree without data and the node age constraints you have specified in your fixed tree topology, you can then inspect the effective prior by checking whether the marginal densities look reasonable based on your prior expectations (calibrations that you have used to constrain specific node ages, your user-specified prior). If not, you may need to modify the node age constraints you specify (i.e., user-specified prior) so that the resulting effective prior indeed reflects your prior knowledge on the fossil evidence you are using as explained in the PAML documentation (page 45 at the time of writing).

If you want to check how the lower-bound calibration (and other types of constraints you may have included in your fixed tree topology) may look like, you can use some of the functions of the R package mcmc3r to plot them.

exmcmc3r.jpg

You can check more details about how to plot the lower-bound calibrations and soft-bound calibrations implemented in MCMCtree if you type `?mcmc3r::dB` on the console to access the help manual.

Can you let us know whether you have run MCMCtree without data first and checked for the aforementioned? 

All the best,
Sandy

keigo ww

unread,
Feb 20, 2025, 3:48:24 AMFeb 20
to PAML discussion group
Dear  Sandy, 

Thank you for your reply!! Regarding the usedata setting, I initially set usedata=3, and after the run was completed, I modified out.BV to in.BV and set usedata=2. I have never run it with usedata=0 before. Is this step necessary?

Following your suggestion, I reran mcmctree with usedata=0. Then, I imported the generated mcmc.txt file into Tracer and located the corresponding node for this branch. The node t_n58 is the one I calibrated. How can I verify if this node is correct? Should I check whether it follows a normal distribution?

tracer.png

Best, 

Yiying


Sandra AC

unread,
Feb 20, 2025, 7:33:50 AMFeb 20
to PAML discussion group
Hi Yiying,

Yes, you must always run your analysis first without data (`usedata = 0`) to check that the effective prior is sensible -- this has been emphasised in the PAML documentation and related articles and protocol papers/tutorials :)

If you have specified a lower-bound calibration with a minimum age of 28 Myr (e.g., `L(28)`), you expect to see a distribution with a minimum around 28 Myr (which you see in the plot you show above). The shape of the distribution that you expect depends on the type of node age constraint you have used and is implemented in MCMCtree. Following Table 7 in the PAML documentation...

table7.jpg

... I believe you might have chosen the lower-bound constraint based on your first message. For instance, if you used `L(28)`, this means that you have used the default values for `p`, `c`, and `p_L` (`p=0.1`, `c=1`, and `p_L = 0.025`). Therefore, you are not using a hard bound. In other words, you are allowing for the possibility that the minimum age can be a little bit younger than 28Myr with a tail probability of 2.5% (which you also see on the plot you show in your last message). 

Remember that you can use the mcmc3r R package to check how the distribution that you want to use to constrain a specific node age looks like. Assuming that you used `L(28)`, I have then used mcmc3r to plot how this distribution would look like, which is actually very similar to what you have obtained when running MCMCtree without data:

```R
curve( mcmc3r::dL( x, tL = 28 ), from = 26, to = 90, n = 1e4 )
abline(v=28, col = "blue")
```
mcmc3r.jpg

Please note that the shape of the distribution from 28Myr to the edge of my x axis (region for older times) has changed when you ran MCMCtree as now you have to consider parental nodes in the tree topology (which may be uncalibrated and/or calibrated with additional constraints, I do not know this as I have not seen your input files); such nodes must also have a time density as they are part of the joint time prior. When considering the densities for calibrated and uncalibrated nodes to build the time prior, it is important to always bear in mind that daughter nodes can never be younger than parental nodes in the specified tree topology, and so marginal time densities for daughter nodes may have to be truncated so that this biological rule is not violated. Please note that this truncation in the time prior may happen if the densities assigned to daughter nodes ended up being older than parental nodes or when you have overlapping node age constraints in neighbouring nodes and the aforementioned restriction is to be applied. You can check this by plotting the marginal time densities that you have for "n_57", "n_56", "n_55", "n_54" when you are not using your data. If you do that, you will see the density with which you constrained the age of node "n_58" (i.e., `L(28)`) has the area for older ages "occupied" by the marginal time densities estimated for these four parental nodes, hence why the density for node "n_58" has been somewhat narrowed if compared to the plot obtained in mcmc3r.

If you wanted to use a hard bound instead, you will need to include `p_L = 1e-300` as specified in the PAML documentation, and also shown in the attached table above. You will need to include the values of `p` and `c` too. E.g.: if you keep `p` and `c` with default values, then 'L(28, 0.1, 1, 1e-300)' - I suggest that you play around and plot various distributions using the mcmc3r R package to check which shape distribution best represents your knowledge on the fossil specimen that you are using to constrain that specific node age on your tree topology!

Lastly, you will have to check whether your posterior densities obtained when using your data are in conflict with the the prior densities (no data) to assess whether your data are informative and to check whether there are serious conflicts between the specified prior and the posterior. Following Nascimento et al. 2017, if the prior and posterior densities (e.g., for a specific node age) were very similar, this means that the data are not very informative (essentially, you have not updated your prior despite incorporating some molecular data). If there is an overlap between prior and posterior densities, but the posterior is more concentrated than the prior, then the data are informative and the prior you used is reasonable. If prior and posterior densities do not overlap, then the prior and your data are in conflict (e.g., perhaps the prior is misspecified). You can use Tracer or other in-house scripts to load the "mcmc.txt" files obtained when running MCMCtree with and without data and plot both prior and posterior densities for the same nodes to check for these issues as part of the MCMC diagnostics -- you always have to carry out MCMC diagnostics to make sure that your chains have reached convergence, you have not had mixing issues, etc. You can read Nascimento et al. 2017 for more details and examples on Bayesian phylogenetic analyses.

Hope this helps!

All the best,
Sandy
Reply all
Reply to author
Forward
0 new messages