A bootstrap tree with a higher likelihood for the original alignment than the ML tree for that alignment

212 views
Skip to first unread message

Joran Martijn

unread,
Nov 7, 2017, 6:30:55 AM11/7/17
to IQ-TREE
Hi,

I'm currently running some topology tests, and in the treelist, I'm including all bootstrap trees (with branch lengths). The treelist also includes the ML tree.

Strangely, at least 1 bootstrap tree seems to have a higher likelihood for the original alignment than the ML tree.

Have you ever observed something like that before?

ML search (unconstrained):
iqtree-omp -s alignment.aln -m LG+C60+F+G -ft guidetree.treefile -nt 20 -pre noConstraint.PMSF -b 100 -wbtl -redo
I also take the bootstrap trees from this search to include in the AU test (see below)

ML search (constraints for the topologies I wish to test)
iqtree-omp -s alignment.aln -m LG+C60+F+G -ft guidetree.treefile -g topology1.constraint -nt 20 -pre topology1.PMSF -quiet -redo
iqtree-omp -s alignment.aln -m LG+C60+F+G -ft guidetree.treefile -g topology2.constraint -nt 20 -pre topology2.PMSF -quiet -redo
etc....

Topology test:
iqtree-omp -s alignment.aln -m LG+C60+F+G -ft guidetree.treefile -nt 20 -pre TopologyTest -z forTopologyTest.treelist -n 1 -zb 10000 -zw -au -fixbr -wsl -quiet -redo
So here the forTopologyTest.treelist includes the ML tree, the constraint ML trees and the unconstrained ML bootstraps.

I have done this test for many alignments (PMSF + non-parametric bootstraps and LG+C60+F+G + ultrafastbootstraps), but this phenomenon only occurs in one particular alignment.

Could it be that the unconstrained ML search was not able to find the "true" ML tree that the particular bootstrap tree represents?
I'm currently running another 10 independent ML searches with -pers 0.2 and -numstops 500 to see if it finds another unconstrained ML tree compared to my original search.

Cheers,

Joran

Bui Quang Minh

unread,
Nov 8, 2017, 4:39:33 AM11/8/17
to iqt...@googlegroups.com, Edward Susko, Huaichun Wang, Andrew Roger
Dear Joran (CC Huaichun, Ed, Andrew),

On Nov 7, 2017, at 12:30 PM, Joran Martijn <joran...@gmail.com> wrote:

Hi,

I'm currently running some topology tests, and in the treelist, I'm including all bootstrap trees (with branch lengths). The treelist also includes the ML tree.

Strangely, at least 1 bootstrap tree seems to have a higher likelihood for the original alignment than the ML tree.

Have you ever observed something like that before?

Indeed I never observed this. But I think it’s not totally unexpected. In case the original tree search stuck in (bad) local optimum. But that really depends on the alignments, typically those with many sequences and few sites (ie. low phylogenetic signals).


ML search (unconstrained):
iqtree-omp -s alignment.aln -m LG+C60+F+G -ft guidetree.treefile -nt 20 -pre noConstraint.PMSF -b 100 -wbtl -redo
I also take the bootstrap trees from this search to include in the AU test (see below)

ML search (constraints for the topologies I wish to test)
iqtree-omp -s alignment.aln -m LG+C60+F+G -ft guidetree.treefile -g topology1.constraint -nt 20 -pre topology1.PMSF -quiet -redo
iqtree-omp -s alignment.aln -m LG+C60+F+G -ft guidetree.treefile -g topology2.constraint -nt 20 -pre topology2.PMSF -quiet -redo
etc….

All commands look good


Topology test:
iqtree-omp -s alignment.aln -m LG+C60+F+G -ft guidetree.treefile -nt 20 -pre TopologyTest -z forTopologyTest.treelist -n 1 -zb 10000 -zw -au -fixbr -wsl -quiet -redo
So here the forTopologyTest.treelist includes the ML tree, the constraint ML trees and the unconstrained ML bootstraps.

Now I see more clearly. There are three things I want to discuss here:

- I never used tree testing with the posterior mean site frequency model (-ft coupled with -z options). The main reason is because I’m not 100% sure if the site-log-likelihoods computed under the PMSF model can be used for further tree testing. I guess it’s OK as long as you used the same guide tree throughout the analysis. But my co-authors can confirm this.
- There is discrepancy between the AU test implementation in IQ-TREE and CONSEL (technically: IQ-TREE used the least-square p-value estimates whereas CONSEL uses the ML p-value estimates). Thus, for now please run CONSEL, as you already printed the site-log-likelihoods via -wsl. Note that the results with other test (SH etc) are comparable.
- I would also do the tree test using the mixture model LG+C60+F+G, just to see how the results look like compared with PMSF.


I have done this test for many alignments (PMSF + non-parametric bootstraps and LG+C60+F+G + ultrafastbootstraps), but this phenomenon only occurs in one particular alignment.

Could it be that the unconstrained ML search was not able to find the "true" ML tree that the particular bootstrap tree represents?

Yes, quite likely. I would repeat a few more runs (as you wrote below) and look whether it improves. Thanks for bringing this problem into attention.

Cheers, Minh

I'm currently running another 10 independent ML searches with -pers 0.2 and -numstops 500 to see if it finds another unconstrained ML tree compared to my original search.

Cheers,

Joran

--
You received this message because you are subscribed to the Google Groups "IQ-TREE" group.
To unsubscribe from this group and stop receiving emails from it, send an email to iqtree+un...@googlegroups.com.
To post to this group, send email to iqt...@googlegroups.com.
Visit this group at https://groups.google.com/group/iqtree.
For more options, visit https://groups.google.com/d/optout.

--
Bui Quang Minh
Center for Integrative Bioinformatics Vienna (CIBIV)
Campus Vienna Biocenter 5, VBC5, Ebene 1
A-1030 Vienna, Austria
Phone: ++43 1 4277 74326
Email: minh.bui (AT) univie.ac.at

Tung Nguyen

unread,
Nov 8, 2017, 4:02:03 PM11/8/17
to iqt...@googlegroups.com, Andrew Roger, Edward Susko, Huaichun Wang
Hi Joran, 

It has been a while since I last used IQ-Tree, so I am not so sure whether the following behaviour is still valid (Minh can correct me if I am wrong). 

When IQ-Tree is fed with a list of trees for topology test, it will use the very first tree in the file to estimate the model parameters and uses these estimates to compute the likelihood of the remaining trees. Thus, these estimates are only optimized for the first topology and they are not necessarily optimized for the remaining topologies. For simple models this usually does not pose a problem. However, for parameter rich models I can imagine that the reported likelihoods of the other trees might not be the maximum. 

Thus, you can try to compute the maximum likelihood of the 2 trees in 2 independent runs to see if the result is still what you observed. In this setting, each tree will get its own set of optimized parameters and we would get the true maximum likelihood.

Cheers
Tung

Joran Martijn

unread,
Nov 9, 2017, 10:10:06 AM11/9/17
to IQ-TREE
Indeed I never observed this. But I think it’s not totally unexpected. In case the original tree search stuck in (bad) local optimum. But that really depends on the alignments, typically those with many sequences and few sites (ie. low phylogenetic signals).

In this case the alignment is 8402 sites long, over 83 sequences. So I wouldn't say it has low phylogenetic signal.

- I never used tree testing with the posterior mean site frequency model (-ft coupled with -z options). The main reason is because I’m not 100% sure if the site-log-likelihoods computed under the PMSF model can be used for further tree testing. I guess it’s OK as long as you used the same guide tree throughout the analysis. But my co-authors can confirm this.
Yes, the same guidetree was used everywhere, but hmm you raise an interesting point here, I did not think about this before.

- There is discrepancy between the AU test implementation in IQ-TREE and CONSEL (technically: IQ-TREE used the least-square p-value estimates whereas CONSEL uses the ML p-value estimates). Thus, for now please run CONSEL, as you already printed the site-log-likelihoods via -wsl. Note that the results with other test (SH etc) are comparable.
I was not aware of the technical difference between IQ-TREE and CONSEL! Could you update the online documentation?

- I would also do the tree test using the mixture model LG+C60+F+G, just to see how the results look like compared with PMSF.
I have done this also, and here the ML tree has by far the highest AU-test p-value, so it seems to work as expected.

Yes, quite likely. I would repeat a few more runs (as you wrote below) and look whether it improves. Thanks for bringing this problem into attention.
I can give a quick update on this. I ran 10 independent runs with the -pers 0.2 and -numstops 500. All ML trees are identical to my original ML tree (obtained with the default search parameters). So this seems to suggest the original ML tree was not stuck in a local optimum..

It's a peculiar problem!!

Joran Martijn

unread,
Nov 9, 2017, 10:17:17 AM11/9/17
to IQ-TREE
This is an interesting perspective! I will try this if Minh confirms IQ-TREE still behaves like this.

Bui Quang Minh

unread,
Nov 9, 2017, 11:07:48 AM11/9/17
to iqt...@googlegroups.com, Joran Martijn
Hi Joran,

Yes Tung had a good point. It is mostly true: the model parameters are indeed fixed for all trees in the list, but they are not estimated from the 1st tree, rather from the “rough" tree of the iqtree search (as you specified option -n 1 to perform one iqtree iteration).

However, one thing I now notice is that you used -fixbr option to fix the branch lengths of all the trees tested. I think this is a more important issue which makes likelihood worse than expected. So please remove this option, so that iqtree estimates branch lengths properly (default behavior). Let us know how the results look like.

Cheers, Minh

Joran Martijn

unread,
Nov 10, 2017, 10:33:05 AM11/10/17
to IQ-TREE
Aha I see. I used the -fixbr mostly to speed up the analysis, I did not realize that it would affect the topology test.

My plan now is to rerun the topology tests without the -fixbr flag, and to compute p-values with CONSEL. I then want to compare the p-values of CONSEL with those of IQTREE (fixbr and without fixbr). It will take some time, but I'll keep you in the loop.
Let's see if that also "solves" the high-likelihood bootstrap tree issue.

Cheers, Joran

Bui Quang Minh

unread,
Nov 10, 2017, 11:17:47 AM11/10/17
to iqt...@googlegroups.com, Joran Martijn
Hi Joran, 
I updated this technical difference with CONSEL in the command reference and the advanced tutorial accordingly. So it’s clear now.

Cheers, Minh

Joran Martijn

unread,
Nov 13, 2017, 9:27:17 AM11/13/17
to IQ-TREE
Great!

I have now run a new topology test without the -fixbr flag, and also ran the CONSEL AU test with the generated .sitelh file.

with -fixbr vs without -fixbr:
the obtained p-values are generally higher. I suppose this makes sense because their likelihoods rose due to branch length optimization.
Also, the bootstrap trees that I included in the test are now generally accepted (without -fixbr), whereas they were generally rejected when using -fixbr.

without -fixbr, IQTREE AU-test vs CONSEL AU-test
The generated p-values under the CONSEL test are generally lower than under the IQTREE AU test. For some topologies, the difference was extreme (0.0315 vs 10e-50), but are usually smaller differences.
Also here the bootstrap trees that I included are generally accepted.

So it seems to behave just as expected :)

I did this on a different dataset than the one that had the high likelihood bootstrap tree though. Hope to report more on that dataset soon.

Joran

Bui Quang Minh

unread,
Nov 15, 2017, 4:39:18 PM11/15/17
to IQ-TREE, Joran Martijn
Hi Joran,

So please ignore this option -fixbr next time!… that’s a huge difference. I think the explanation is straight forward: Branch lengths of the bootstrap tree were estimated on the bootstrap alignment. Thus they are not optimal branch lengths for the original alignment, fixing them will reduce the log-likelihood substantially, leading to many (if not most) bootstrap trees being rejected. 

Regarding IQ-TREE AU p-value, I will change to consel approach in version 1.6.X

Cheers, Minh

Robyn Lee

unread,
Aug 7, 2018, 12:13:49 PM8/7/18
to IQ-TREE
Hi Minh,

I just came across this thread and wanted to confirm - did you have a chance to update the approach for the IQ-TREE AU p-value calculation to that used by CONSEL? 

Thank you in advance for letting me know!

Best,
Robyn

Minh Bui

unread,
Aug 8, 2018, 7:55:58 AM8/8/18
to iqt...@googlegroups.com, Robyn Lee
Hi Robyn,

Unfortunately not. I contacted AU test's author (Shimodaira) but did not get a reply from him. If you or others want to know more details, here is the reason (part of my email to him):

“… When looking at your AU paper, it becomes clear to us that this [discrepancy] happens because the quantile function Phi^{-1} is not defined for zero values. In some extreme cases, many BP values are zero and the AU test is not applicable.

So the question is: How does CONSEL deal with such cases?"


So I don’t know yet what to do. I tried to look at the source code of CONSEL and it’s almost impossible to “decode”. 

Cheers
Minh

Robyn Lee

unread,
Aug 8, 2018, 2:52:05 PM8/8/18
to IQ-TREE
Hi Minh,

Ok, thank you very much for letting me know and sharing for the details.

Have a nice day.

Best,
Robyn
Reply all
Reply to author
Forward
0 new messages