How does tree topology affect codeml models?

nat

unread,

Apr 3, 2013, 12:37:31 AM4/3/13

to pamlso...@googlegroups.com

Hi,
I have run codeml Branch and BranchSite models using a 7-species tree that was a simplification of a bigger published phylogeny. In these models, I had defined two groups of species and systematically compared these two groups. Recently, I have run some MrBayes analyses to get a tree for the 7 species using much more characters that the initial phylogeny that I used has considered. The new tree is different in topology for one species:
now:
(rooted version)
(((dis, (cra, tet)), afr #1), (pan #1, ter #1), sub #1);
Before, unrooted version
(((dis, (cra, tet)), afr #1), (pan #1, sub #1), ter #1);

I am wondering how much this difference in topologies might affect my conclusions. It will take a long time to rerun codeml analyses (many thousands of genes) so I would like first to make sure this is meaningful. In my understanding, the difference might not have affected my results, because:
1) the grouping is still the same, and so is the tagging of branches
2) I have never looked at lineage-specific patterns, but only differences between the two groups.
Am I correct? Is there so e other factor that I should consider?
Many thabks in advance for advise,
Natassa

Feifei Zhang

unread,

Apr 9, 2013, 10:21:33 AM4/9/13

to pamlso...@googlegroups.com

Hi,

Interesting question. First, before you rerun PAML, make sure which tree you want to you: more species, fewer characters or fewer species, more characters. Sometimes, the former is more reasonable.

Second, PAML suggests do several runs to avoid local optimal situation. Thus, even though changing topology does not affect your result to much, you many need to do multiple runs with such a big dataset.

Good luck,
Feifei

Ziheng

unread,

Apr 18, 2013, 4:51:29 PM4/18/13

to pamlso...@googlegroups.com

The two trees are different so the results should be different. Whether they are similar will depend on many things, such as whether the concerned species are close etc.

I think you should run some of the analysis to get an idea about the impact of the tree. Surely you can do 10 or 100 genes.

Ziheng

nat

unread,

Apr 18, 2013, 6:02:08 PM4/18/13

to pamlso...@googlegroups.com

Thank you for your replies,
I have actually rerun the analyses on the concatenated dataset of all genes (for speed, and because I did not observe any differences with the individual genes dataset) and the baseml analysis (on 4fold sites) gave identical results, while in codeml branch models, the only difference was the difference in omega between the two groups of species compared, which is now smaller. I would still like to know how in theory the different topology might affet the result, ie get some more theoretical background on that. I would think that dS values might be inflated when using the previous tree, but how would this affect dN/dS in the two-ratio model (which is the one I mostly focused on)?Note that the species compared for which the topology changed in my new tree are not close, in the exhaustive genus phylogeny they belong to distinct clades.
Regards,
Natassa

Ziheng

unread,

May 25, 2013, 6:51:29 AM5/25/13

to pamlso...@googlegroups.com

The codeml analysis assumes that the tree you are using represents the true evolutionary relationships of the sequences, so that should be your aim when you choose the tree to use. Sometimes the species tree is more appropriate, sometimes the gene tree. When you don't have that information, the question becomes one of the robustness of the codeml analysis to errors in the assumed tree topology. As far as I know, we don't have any theoretical results on this. Intuitively one would think that the effect should be minor if the tree you use is not "too wrong".

If you are very serious on the issue, you can easily do some simulations to examine the impact. You can choose parameter values so that your simualtion is representative of your real data. The version of evolver for simulating under the branch model is in the folder Technical\Simulation\Codon.