Bipartition (-f b) using gene tree dataset

59 views
Skip to first unread message

Theresa Miorin

unread,
Feb 28, 2022, 3:49:21 PM2/28/22
to raxml
Hello,

I am trying to perform a sensitivity analysis on a dataset of ~900 genes. I have gene trees for each of these as well as a species tree. I am wishing to draw bipartition information on the species tree using the gene trees. A paper doing the same analysis said they used the raxml -f b flag to do this. When I attempt to run this command on my data, I get the error "raxmlHPC: bipartitionList.c:931: calcBipartitions: Assertion `tr->ntips == tr->mxtips' failed."

Looking at the standard output, I saw that there is the following line after listing out the species in the first tree included in the file of all gene trees: "Expecting all remaining trees in collection to have the same taxon set." This tells me that this command requires all trees provided in the gene tree file to contain all taxa; however, often gene trees might have some missing taxa. Is there a way to work around this so that I can have bipartition information drawn on my species tree based on these gene trees?

Best,
Theresa

Grimm

unread,
Mar 1, 2022, 8:05:05 AM3/1/22
to raxml
Hi Theresa,

there are several, pending how gappy your gene sample is.
  1. The simplest solution is of course to prune the bootstrap samples and the species (gold) tree to a taxon set that is fully covered for all 900 genes. The pruned tipset may not be comprehensive enough for what you want to show, but comparing the gene-wise bootstrap support of the branches in such a tree will be informative regarding between gene conflicts/resolution issues.
  2. If your gene trees have a representative tip coverage: why not just make the whole analysis using only the fully covered tip set? This allows you to straightward assess the single-gene support for the reduced species tree's branches. By comparison (e.g. tanglegram) with the complete tip set, you can assess how representative such a reduced tip set is. Note that most conflicts between less and broader sampled phylogenomic trees are often due to signal issues relating either to rogues ("jumping taxa", can include some that simply violate the principle of dichotomy), which is rather unproblematic, or poor gene samples for some tips, which is widely ignored problem. E.g. if a tip is only covered for conservatively (slow) evolving genes, it's position in the deep parts of the species tree may be reasonable but completely nonsense if placed in the leaves. There is an endless number of examples of misplaced tips in phylogenomic trees because of such gene-sample-signal-resolution issues. So, if a pruned, fully covered tip set gives a different topology than the all-in tip set, it's a red flag. Rogues on the other hand just mess with the tree, reduce branch support, dropping them is the best thing to do unless you want to go reticulation frameworks. The evolutionary placement algorithm is a quick tool to qualify a rogue's placement in a rogue-free species tree.
  3. Last but not least is to use functions implemented in the (updated) Phangorn library for R: http://dx.doi.org/10.1111/2041-210X.12760. They allow to transfer information between trees and networks but also between trees only (since a tree is just a network without reticulation)
  4. If you have not too many tips, you can also try the supernetwork approach implemented in SplitsTree to summarise the total bootstrap sample/gene-wise trees, below an example how it looks like when reading in a tree sample (pimped up graphically, the "edge" supports are the combined data RAxML BS support, the min. gene-wise and max. gene-wise BS support of the topological alternatives in the gene trees). You can open RAxML's bootstrap tree sample files directly in SplitsTree.
Cheers, Guido


JiangEtAlSuperNet.png
Reply all
Reply to author
Forward
0 new messages