Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Use of rooted and urooted tree

120 views
Skip to first unread message

Luiz Filipe

unread,
Jul 25, 2024, 8:06:50 PM7/25/24
to PAML discussion group
Hello, I have two questions regarding the use of rooted and unrooted trees.

Taking into account that I have a phylogeny of an order of species, and that I am hypothesizing that my species tree is divided into two distinct clades from the root.

1 - For the branch model analyses, I will select a clade as foreground, so I have to use a rooted tree, right?

2 - To perform the homogeneous model test to calculate the LRT with branch model test, given the topology and hypothesis of my species tree, can I still use the same rooted tree?

Sandra AC

unread,
Jul 26, 2024, 10:49:17 AM7/26/24
to PAML discussion group
Hi there!

You may want to read the section "Rooted versus unrooted trees" in our supplementary material (Álvarez-Carretero et al., 2023) which, together with Figure S1, will (hopefully!) help clarify your doubts with regards to whether you should root your tree depending on the biological hypothesis your are testing and whereabouts your clade of interest is.

Hope this helps!
Sandra

Luiz Filipe

unread,
Jul 26, 2024, 2:26:48 PM7/26/24
to PAML discussion group
Dear Sandra,

I had already consulted this material, and from what I understood, I concluded that I can use the tree rooted in the homogeneous model and calculate the LRT with the branch model.

But I would like to know if in this context that I mentioned, it is possible to obtain an unrooted tree (if I wanted to use the unrooted tree) in a species tree with two clades that divide and that I believe to be undergoing distinct evolutionary processes without external group?

Because, I'm not using any outgroups, just species tree.

Sandra AC

unread,
Jul 26, 2024, 4:46:16 PM7/26/24
to PAML discussion group
Hi there,

I am not sure I understand the hypothesis you are trying to test nor its biological impact, which are quite important when deciding whether rooting/unrooting the tree. In addition, I do not know what criteria you have followed to root the tree nor what the root means with regards to the two clades you have in your tree topology, so it is hard for me to give a concise answer. Nevertheless, you should bear in mind that, if you are assuming that the two branches around your root evolve differently in the model, then the root of the tree is identifiable and a rooted tree should be used. If you unroot the tree that you describe (the root is unidentifiable), you would then have two background branches (i.e., you would be assuming that both branches for your two clades of interest around the root are under the same evolutionary process/constraints). Depending on which hypothesis you want to test to answer your biological question, you can then decide what is best: rooting or not rooting :)

I assume that you want to use a rooted tree because you are believe that both branches are under distinct evolutionary processes, and so you would have to root the tree -- but my assumption may be wrong if I have misunderstood what you want to do! Other PAML users may want to share their points of views too :)

All the best,
Sandra

Luiz Filipe

unread,
Jul 26, 2024, 8:15:37 PM7/26/24
to PAML discussion group
Hi Sandra,

Two questions will be the last, I promise:
1- Yes, I am using a species tree, in which I assume a hypothesis that there are two clades under distinct evolutionary processes that come from a root, and one of them will have its internal and terminal nodes marked with #1 for branch model analyses. . Furthermore I will be using the same tree for the homogeneous model analysis. This is correct, right?

2- but I also wonder if for this same hypothesis it makes sense to use an uprooted tree for the homogeneous model? If so, how could I uproot it in a multifurcated topology given that it biologically has two separate clades?

Sorry if I'm complicating your life too much Sandra, these are just questions, if what I wrote in the first question is ok, I'm already satisfied xD

Thank you for your time and support!

Sishuo Wang

unread,
Jul 28, 2024, 2:37:44 AM7/28/24
to PAML discussion group
Dear Luiz,

Thank you much for the Q and ur interest in paml!

Is it related to the Fig. S1D in the SI pointed by sandra? If i didn't misunderstand anything, my gut feeling is that your Point 2 is correct that you might want to use an unrooted tree for testing a homogeneous model say M0. That said, a rooted tree for a foreground-background case, and an unrooted tree for M0 (and they're nested models so can use LRT). Perhaps u already know the following but pls forgive my verbosity, for the left tree in Fig. S1D, there r 2 omega values, plus 4 branch lengths to estimate, while the one on the right has 3 branch lengths plus one omega to estimate. That's why the d.f. diff is 4+2-(3+1)=2. Actually, you can find similar things in Table S5 in Sandra et al. 2023 where except for test 4 all have a diff of d.f. equal to 1.

I actually don't know what will happen if you use a rooted tree for M0 but i guess the loglikelihood will be (almost) the same to using an unrooted tree, but the program might report one more parameters (and there could be some numerical problems such that the lnL might not be accurate). This tentative view, if true, would suggest that in Fig. S1D right, the program will report 5 free parameters but you should keep in mind that actually it's 4 (b/c in that case the model is not identifiable so the model that is specified has 4 free parameters despite the use of a rooted tree). As to an example of a non-identifiable model, again pls forgive my verbosity, imagine a case where one wants to minimize f(x,y)=(xy)^2 + 2xy , apparently x and y cannot be separatly estimated so one might want to reparameterize the model as z=x*y and so when z=x*y=-1 the min of f(x,y) is obtained, however in practice if you ask the code or program to estimate the MLE of x and y separately the computer will report sth, including probably the same likelihood, and the MLEs of x and y thus d.f. = 2 but actually there is only one free param to estimate, z).

Sorry for the above lengthy and probably a bit messy answer. Does it help? Feel free to let us know should u have further Qs. btw, I guess a clearer way is to upload your tree or a schematic graph of the situation so that other might know about it better. Thanks!

Dear Sandra,

Thanks much for your help! Btw, i might be mistaken but is the description of the rooted and unrooted tree in the 1st para under the section "Rooted versus unrooted trees" reverse?

best,
sishuo

Sishuo Wang

unread,
Jul 28, 2024, 2:39:53 AM7/28/24
to PAML discussion group
Dear Luiz,

There's a type that might matter: Actually, you can find similar things in Table S5 in Sandra et al. 2023 where except for test 4 all have a diff of d.f. equal to 1.

I mean Table 5. My apology.

best,
sishuo

Sandra AC

unread,
Jul 28, 2024, 6:31:11 AM7/28/24
to PAML discussion group
Hi Luiz,

If I have understood your hypothesis correctly, I think that the assumptions you make for your first question are correct. With regards to your second question, if you want to keep the two branches for your two clades leading to the root as separate evolutionary processes, I can only think of adding a far-related species (or various) as an outgroup to have an unrooted tree and keep it like that -- but that of course would change your tree topology. E.g.:

           |-- your_cladeA (foreground)
       |---|
root --|   |-- your_cladeB  (background)
       |
       |- outg

To make it easier, the Newick tree with three taxa would be the following:
  • rooted: ((A #1,B), O); 
  • unrooted: (A #1,B,O);
I cannot think of another way you could unroot a tree with two branches leading to the root while keeping such branches under different evolutionary pressures in a branch or branch-site model analysis, but I may be wrong. Perhaps others have additional suggestions :) 

Hope this helps!
S.

Sandra AC

unread,
Jul 28, 2024, 7:28:19 AM7/28/24
to PAML discussion group
P.S.: Sishuo, I had not seen your first message, only the second, so I forgot to reply to your question! Yes, indeed that must have been a typo! The sentence should read " In PAML, unrooted trees are represented using a trifurcation at the root while rooted trees are binary at the root".

Thanks for pointing this out! :)
Message has been deleted

Luiz Filipe

unread,
Jul 29, 2024, 12:52:00 PM7/29/24
to PAML discussion group

Hi Sandra and Sishuo,

I sincerely appreciate your explanations and time for my questions. I was able to understand more about PAML.

Using the scheme made by Sandra, basically my topology is this, in which I assume that clade A and B are under different evolutionary processes. From this I decided to use a rooted tree without an outer group for both branch model analysis and the m0 model with df = 1.

           |-- cladeA (foreground)
   root ---|
           |-- cladeB  (background)
    

Luiz Filipe

unread,
Jul 29, 2024, 1:27:10 PM7/29/24
to PAML discussion group
However, what I understood from Sishuo's answer was that even when using a rooted tree for model m0, I must consider for the LRT calculation that model m0 will not have this 1 more parameter, but rather the same number of parameters as if the tree was unrooted. Therefore, even if I used a rooted tree in both the branch model analysis and model m0, the value of df will be df=2. That's right?

Luiz Filipe

unread,
Jul 29, 2024, 1:36:33 PM7/29/24
to PAML discussion group
Sorry, it wouldn't be for the LRT calculation, but for the degree of freedom of the critical values.

Sishuo Wang

unread,
Jul 30, 2024, 12:39:39 AM7/30/24
to PAML discussion group
Dear Luiz,

Thanks for the following question!

if your case is like Fig. S1D in http://abacus.gene.ucl.ac.uk/ziheng/pdf/2023Alvarez-Carretero-codeml-SI.pdf then that's basically what i meant. however, if your case is not like this then using rooted tree and unrooted tree is different and you'd use a rooted tree under M0.

For your tree as below, i think it's the situation of Fig. S1D. My tentative view is to use follow the suggestion from Fig. S1D or Table 5 test 4 (the bird clade vs. the rest) and note there are 2 parameters different btwn the 2 models; again in this case under M0 i guess using rooted vs. unrooted tree hopefully returns the same lnL, but rooted tree in this case has a problem of model non-identifiability thus numerical problems in calculating the lnL occassionally.

         |-- cladeA (foreground)
   root ---|
           |-- cladeB  (background)

Any Qs pls feel free to discuss here.

cheers,
sishuo

Luiz Filipe

unread,
Jul 30, 2024, 10:16:00 AM7/30/24
to PAML discussion group
Hi Sishuo,

Thank you again for your comments, they have been valuable and I have been able to understand the concepts better. I followed your suggestion and the example of test 4 in table 5. I did some tests and as you suspected and also believed, the lnl value for model m0 did not change. But obviously now the value of df is equal to 2 due to the difference in the number of parameters. Thank you once again for clarifying my doubts, your help and that of Sandra were fundamental.

All the best,
Luiz

Sishuo Wang

unread,
Aug 2, 2024, 2:46:53 AM8/2/24
to PAML discussion group
Dear Luiz,

Thanks much for confirming this and for initiating this interesting discussion!

best,
sishuo

Reply all
Reply to author
Forward
0 new messages