Dear All,
What to do if the topologies of best ML (bipartitionsBranchLabels) and MRE trees differ?
Are rapid-BS values and those on the MRE tree comparable (eg in terms of significance; ie >75%)?
Best,
Giorgio
First of
all, please accept, Alexis, Grim and Romina, my most sincere apologies for not
following my own post.
The reason is that I completely forgot about it. It looks as if it is about one-year-old (gosh!).
I had
forgotten it to the point that I am currently discussing the very same issue (on
less naïve
grounds compared to my first question) with Colleagues in Lyon and Vienna.
The point
is still what to choose between best-ML tree and consensus tree. But also how
to construct
the consensus tree; and I am not convinced that using the bootstrapped
trees is really the best option.
I have been comparing best-ML tree and the corresponding consensus tree with different distance
metrics coupled to the usual statistics (AU, ELW; etc). In my test
data set (a difficult one, on purpose)
the differences are relevant (frightening?).
If interested to pursue this issue, we could open a new forum-topic.
Thanks for your time and suggestions.
Sorry again.
Best,
Giorgio
The point is still what to choose between best-ML tree and consensus tree.
But also how to construct the consensus tree; and I am not convinced that using the bootstrapped trees is really the best option.
I have been comparing best-ML tree and the corresponding consensus tree with different distance metrics coupled to the usual statistics (AU, ELW; etc). In my test data set (a difficult one, on purpose)
the differences are relevant (frightening?).
Dear all,
I am very grateful for this pleasant discussion. Grim, thanks for such detailed explanation and for share very nice papers. Specially the one about Phangorn library in R. It is a very good example to support the idea (personal at least!) how important is an alternative topology as a consensus network.
“(although it is not very common to do it; I did for over 10 years, and it brought me about 75% misery and 25% encouraging comments from the reviewers' side)”
- I hope this scenario will change with genomic data sets getting so popular, and with the insights these kind of data can recover. Personal thoughts from someone who is reviewing own manuscript, using these kind of approaches.
- I managed to construct a SuperNetwork, using a filtered data set (‘only’ 335 loci). As Grim mentioned it can be impossible try to calculate with so many leaves (my case n=113, and thousands of loci).
- Giorgio, one very nice approach, I am also using, to complement the SuperNetwork overview (the one I have has a reticulated scenario of a rapid radiation event!) is the internode certainty, IC (using RaxML, see Manual X. Computing TC and IC values, pg. 49-53). As Grim mentioned “If you are interested in how strong the support for (potentially competing) (taxon) bipartitions is, and why some branches have high and others low boostrap support” you can then calculate the IC score, by calculating the frequency of each bipartition in a “reference tree” (let’s say the bestML tree). The method is very well explained here: https://academic.oup.com/mbe/article-lookup/doi/10.1093/molbev/msw040. I think it is a very helpful output!
Thanks a lot for your attention and help,
All the best, Romina B.Hi,
>The
point is still what to choose between best-ML tree and consensus tree.
>>It depends on what you want:
Grimm: OK, I got the point and fully agree.
Yet, a
related issue is how to construct the consensus tree (see below)
> Why do you want to make a consensus tree at all?
"explanatory data analysis" for example, as you rightly suggest.
> you can look at the bootstrap support network, the consensus network of the bootstrap >pseudoreplicates.
Will definitely try this out.
And I
recommend to the followers of this post, both beginners and confirmed in
phylogenetics, to read the excellent paper by Grimm and colleagues, http://dx.doi.org/10.1111/2041-210X.12760)
>This is what David Morrison called "explanatory data analysis", and it's crucial when you work >with non-trivial data
Absolutely!
> I did for over 10 years,
Well then, if you are willing to help, you are most welcome.
The phylogeny I am working on is one of the most difficult
ever; guess what? Hox genes!
Warning: it is likely to bring more misery.
> So you have a tree with low supported branches, and data that aspect-wise
prefers competing >(alternative) topologies? Then it's really not
frightenting, but to be expected.
I’ve used frightening in a somewhat different meaning. Example: bestML and corresponding contree (my data set) are “frighteningly” different (normalized RF of about 0.4). Same to be said for equivalent trees according to AU (my data set). Topology differs; interpretation too.
>The less tree-like the signal in the primary data, the higher are the
chances that the optimised
>ML-tree conflicts with best-supported
alternatives in the bootstrap sample.
Sure!
>Giorgio,
one very nice approach, I am also using, to complement the SuperNetwork
overview (the
>one I have has a reticulated scenario of a rapid radiation
event!) is the internode certainty, IC
>(using RaxML, see Manual X. Computing TC and IC values, pg. 49-53).
Romina: Yes, but “One potential drawback when applying the IC … is that their values may not be representative when small numbers of characters or gene trees are used. … our measures are likely most informative when applied to large amounts of data (e.g., hundreds of characters or dozens of genes or hundreds of bootstrap replicates).” (Salichos et al 2016 MBE 31:1261)
And this is exactly my case (single
gene, few characters); anyhow I’d given it a try already,
but am unsure how to interpret the results (as expected)
*********
I know you guys have been through all this for a
few years, yet, given the number of
followers of this post, the issue seems to be of interest still.
Many issues are at stake here. Will mention just one of them.
We all know that we have to deal with another crucial
issue: the starting-tree bias.
There are four options I know of for start-tree
choice (implemented in different
programs): BioNJ, MP, random, bestML.
As Grimm reminds us “The bootstrap replicates
can be used either to compute consensus
trees (contree) of various flavors or
to draw confidence values onto a reference tree, e.g.,
the best-scoring ML tree.”
Suppose we still want to
construct a contree. There are two ways of computing it: from bootstrap trees
or from bestMLtrees from independent runs. (independent runs; another crucial
issue by the way).
What are we to expect comparing the two?
Intuitively, I prefer the
second option (looks like a better tree-space search procedure to me).
What I
would do is: use all four start-tree types and X independent runs each.
I think I remember seeing a similar suggestion in some old RAxML doc (but may be mistaken).
Does this make sense to you?
All the best,
Giorgio
"Well
then, if you are willing to help, you are most welcome." -- Thanks, but I would need to be paid in cash and per hour. I'm not on taxpayers' subsidaries anymore.