Dear PAML comunity
First of all, I want to thank everyone for taking the time to respond to messages on this forum. Thanks to this community, I have been able to resolve many questions about using PAML, as others have often had the same concerns I do.
Today, I would like to ask about the use of rooted versus unrooted trees when identifying signals of selection using codeml. I understand this topic has been discussed extensively, but I haven’t found an answer to my specific question.
I am analyzing positive selection signals in two sister species separately, which diverged approximately 3 million years ago, as I hypothesize that both have been subjected to different selective pressures. I have also analyzed the "ancestral branch" of these two species to identify genes under selection inherited from their common ancestor. For the ancestral branch analysis, I used a rooted tree, assuming that the entire branch was evolving differently from the rest of the tree.
When analyzing the two sister species separately, I used an unrooted tree, as they belong to the same branch in my topology, which then diverges into the two species. Based on my understanding of the manual and documentation, an unrooted tree should be used in this scenario.
Is this approach correct? Specifically:
For further clarification, in relation to Figure S1C from Álvarez-Carretero et al. (2023) Supplementary Material, if I wanted to identify signals of selection only in species B (branch b2), should I use an unrooted tree?
Thank you in advance for your help!
```
((A,B #1) #1,O);
((A #1,B) #1,O);
((A #1,B,O);
((A,B #1,O);
```
If that's the case, then I believe you made the correct assumption :) You can use an unrooted tree because you are not constraining the evolutionary pressure on the "root branch" (see above in blue); you are only constraining the lineage from "A" (or "B") to the MRCA of "A" and "B" to be the foreground branch.
I hope that the illustrations above and the relevant explanations make sense, but please let us know if something remains unclear!
All the best,
Sandy
P.S.: The lineages highlighted in bold in my illustrations above would be those labelled as "foreground" branches by CODEML, while the rest of branches would be treated as "background" branches.