Hi Andy,
thanks, got the files! It looks like the tree with bs values was already edited with FigTree. This makes it more difficult, as now there is additional information (e.g., coloured branches) in there. Do you need to keep that information, or can we remove it for this step and you add it later again? If so, could you also send me the original, unedited file?
Also, as far as I can see, you ladderized the tree (option "Order Nodes: increasing") in FigTree. Are you also planning on rerooting it for the final figure? If so, the paper that Alexis linked to could be of importance!
Lucas
--
You received this message because you are subscribed to a topic in the Google Groups "raxml" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/raxml/V7ZS5dhffgQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to raxml+unsubscribe@googlegroups.com.
--
--
Nice! Looking forward to hearing about your new results and feedback about the software!
Lucas
You received this message because you are subscribed to the Google Groups "raxml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hi Andy,
for just the purpose of creating the labelled tree, your approach using the cluster seems a bit too complicated. It is just a small program, which should easily run on a laptop. That is, unless your data set gets really big - which yours doesn't seem to be, as far as I remember.
So, I'd suggest to just compile Genesis on a machine with a Linux
system, or even inside of a virtual machine. Then, you should have
all privileges to run stuff and change files. Also, you generally
shouldn't need any special privileges anyway.
Here is the full workflow:
Also, if you have any particular trouble (you spoke of "software bugs"?), you can post them here, so that I can troubleshoot.
Lucas
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com.
To unsubscribe from this group and all its topics, send an email to raxml+un...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "raxml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "raxml" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/raxml/V7ZS5dhffgQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to raxml+un...@googlegroups.com.
Hi Patrick,
that sounds like you successfully used Genesis for your work, glad to hear ;-)
Currently, there is no publication to cite. For now, you can cite
the GitHub repository:
Lucas Czech and Alexandros Stamatakis
Genesis - A toolkit for working with phylogenetic data.
https://github.com/lczech/genesis and http://genesis-lib.org
However, I already started working on a manuscript that will
briefly introduce Genesis, and hopefully publish a preprint within
the next few weeks (or months?). If that's done, that will be the
preferred reference to cite.
Lucas
Hi Lucas,
I just wanted to let you know I was able to use the labelled_tree app to preserve bootstrap values on a tree generated via EPA. Thank you!
Unfortunately, I also need a bootstrapped tree where the number of phylogenetic placements is denoted by branch thickness, and query sequences do not form individual leaf nodes. Essentially I want to maintain the .jplace tree format but include bootstrap values from the reference tree. As I understand it this is not possible with genesis, so as Alexis suggests, I will show two trees, one with the bootstrapped values and the other with placement information.
Ezra
To unsubscribe from this group and all its topics, send an email to raxml+un...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "raxml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "raxml" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/raxml/V7ZS5dhffgQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to raxml+un...@googlegroups.com.
Hi Ezra,
good to hear that the program is useful! We are currently working on our new tool gappa, which is a command line interface for useful functions of genesis. It will soon replace the genesis demos, and offers a simpler interface to them. The labelled tree app is already implemented in gappa, but does not yet support external trees (with bootstrap etc) - I will add this bit soon. In gappa, the command is called "graft", you find it here: https://github.com/lczech/gappa/wiki/Subcommand:-graft
As for your second point: If I understand you correctly, you want
to show a tree with branch thickness indicating the placement
distribution, and show bootstrap values at the same time. That is,
not do a "labelled tree"/"graft tree" with placements as single
branches, but summarized per branch, right? Well, there are a few
technical issues that need to be solved for this, but genesis is
not one of those issues here. In genesis, you can basically
annotate trees with whatever data you want, thickness, colour,
support values - you name it.
The problems I see are somewhere else:
Firstly, if you want to store this information (placements, as
well as BS values), you cannot use any existing format. Newick and
other tree formats do not support placements, and the jplace
standard for placements does not support extra tree data. You
could however use genesis (or some other library that supports
this) to read in the jplace file and the tree with BS values on
it, use the placements to calculate branch thickness, then make a
tree with thickness and BS values per branch, and store this in
some format like phyloxml.
Secondly, you then need a viewer which can visualize all this - I guess, Archaeopteryx is a good candidate. Alternatively, if you want branch colour instead of thickness, you can probably also use Figtree. Genesis also has a visualization component, but it produces output in Svg, which again has its issues - namely that the taxon labels at the tree tips might move depending on your viewer (a nasty problem with Svg...).
If you have a bit of time, I have planned to add jplace visualization with thickness to gappa, and also make it possible to read an external tree (with extra BS values for example), which will be used instead of the jplace-internal tree (as long as they have the same topology). Don't know however when I get to implement this. The problem here is that Newick is not meant for annotation trees with more than labels and branch length. So, when reading in a newick file with BS values, and outputting a phyloxml file with branch thickness (or colour), how does one specify what the extra newick annotations are called in the phyloxml output?! It's easy to do in genesis on a case-by-case basis, but hard to generalize. If I find a solution for this that can be specified via command line, I'll add it to gappa.
So, in short, your issue seems to me more one of file formats than of software. Which means, this can all be implemented, but as there are no formats to store the result in, it would be an ad-hoc solution...Hi Lucas,
Thanks for your response. Gappa looks great, I will try it out in the next few months.
Exactly, I want my tree to have placements summarised per branch. After looking at text files of the tree formats, I figured the issue was that jplace cannot support extra tree information. I could write an app with the genesis library using the workflow you suggest, outputting a phyoxml file containing all the required information.
I am using iTOL to visualise the tree and it can read phyoxml. Additionally, iTOL supports phylogenetic placement data in the jplace format which can be represented by branch symbol, colour or width. It also supports bootstrap values and user-custom datasets. One alternative solution might be using a text editor to extract bootstrap values from the reference tree and writing this to .csv file which has the node/branch identifier and the corresponding BS value. This information could be used to then add branch symbols to the jplace tree as a custom dataset. https://itol.embl.de/help/dataset_symbols_template.txt
Or, I could do the opposite. Use JPlaceReader to extract the placement information and write this to a .csv file which I can use as a dataset on the bootstrapped reference tree.
More convoluted than writing the genesis app, but maybe quicker for someone not fluent in C.
I am intrigued by your solution to the newick to phyoxml problem. If you solve it, let me know how! The gappa function will be useful for any future trees.
Ezra
Hi Ezra,
your approaches all seem reasonable. I like the idea of turning either of the additional pieces of information (BS values or placements per branch) into an itol dataset. That seems like it is doable with not too much effort. You might be a bit limited by what itol can do with the datasets though - I don't know what kind of tree customizations they can be used for. Maybe ETE might be an option, too. It's in python, and can also do all kinds of annotations.
I think, more users might be interested in your solution. If you
want, keep us updated here ;-)
Also of course, if you need any more input, let me know.
Cheers
Lucas
Lucas,
I had a go today with partial success. I’ve been having issues with the dataset feature in iTOL, when trying to annotate a jplace tree by a .csv file it won’t recognise the nodeIDs provided as valid, despite them being visible on the iTOL tree and present in the jplace file. It is possible however to use the interactive dataset feature, create a dataset called ‘bootstraps’ and manually enter the nodeIDs and corresponding bootstrap values, either depicted as labels or branch symbols.
The nodeIDs , BS values and if desired, symbol, can be copied and pasted into iTOL from the output files of the shell script below to make this process quicker. However, it is still somewhat laborious: iTOL will not accept the nodeIDs until they have been clicked on and matched to its drop-down menu.
I have contacted the iTOL creator and reported my difficulties. Hopefully the problems can be amended. If so, I’ll finish the shell script, so that the output can be directly uploaded on iTOL to annotate a jplace tree with bootstrap values.
Best,
Ezra
#!/usr/bin/bash
#extracts BS values from a reference tree, corresponding node IDs from a jplace tree, and writes ‘Square’ for however many BS values there are.
grep -o -E ').{3}' RAxML_bipartitions.tree.out | sed 's/)//' | sed 's/://' > BS.txt
grep -o -E '.{4}).{0}' RaxML_portableTree.tree.jplace | sed 's/^.*{//' | sed 's/})//' > nodeID.txt
a=$(cat BS | wc -l)
yes "Square" | head -$a >Symbol.txt
Hez Ezra,
that looks like a good approach. You should however make sure
that the tree in the RAxML bipartitions file and the one in the
jplace file are exactly identical, that is, the order of
parenthesis in the Newick string, the taxon names, etc. Otherwise,
the order of the two lists will not fit. I guess however that you
already thought about that ;-)
As for iTOL, yes, contacting the developers is probably a good idea. They are quite responsive usually.
Which OS do you use? If I try your script, my grep behaves a bit differently, so that I need to escape the parenthesis like this '\).{3}'
Lucas
Hi Ezra,
wow, this is quite some quick progress!
Well, I would advise against just adding 1 to node IDs - unless you are absolutely certain that this will not destroy their order. As such indices are often done via a post-order traversal of the tree (for example, this is the way the edge nums {123} in jplace are made), just adding 1 to them will not work (if I understand your idea correctly). A better way would be to do something more involved than just creating lists of values via grep. For example, read the files with proper tools that understand the format, and make sure that you match edges of the BS tree and the jplace tree correctly to each other.
As to why this is occurring: it sounds very much like the issue that we describe in our paper about such issues with the Newick format: https://academic.oup.com/mbe/article/34/6/1535/3077051
So long
Lucas
Hi Lucas,
Good advice, I understand your concerns. In this case adding 1 does fix the issue and makes the bootstrap labels consistent with those produced by iTOL on the reference tree (at least this was the case with the three trees that I checked). I have revised my code to include this feature.
https://gist.github.com/Kzra/d9231bdd4e9d2791afa9eaaaa416f7d4
It raises the question which is the true representation of the bootstrap values, the one that iTOL generates or the one in my initial dataset? In the above script, users can specify to use the initial nodeIDs by replacing nodeID2 with nodeID in the paste command.
Anyway, good to see a problem solved. Using this script will save me a lot of time over the next few weeks.
To anyone who uses it, I’d be interested to know whether your BS dataset labels also correspond with iTOL’ s labelling of the reference tree.
Thanks for all the help,
Ezra
Hi Ezra,
hm, then maybe one of the two indices starts counting at 0, and the other at 1? If both stick to traversing the tree in post-order, and both trees are identically rooted and sorted (i.e., same order of subtrees for all inner nodes), then your method should work.
What do you mean by "true representation of the bootstrap values"? BS values are properties of branches, not nodes (which is also explained in more detail in the paper that I liked before: https://academic.oup.com/mbe/article/34/6/1535/3077051). Thus, the "convention" of writing them next to nodes is only a workaround due to limitations of the Newick format and of tree viewers. Maybe you should also check with another tree viewer to make sure the values are where they should be. You can try Dendroscope - its recent versions handle BS values pretty well and explicitly ask whether those are values meant for branches or nodes.
Lucas
Hi Lucas,
Good idea, I think that could be the case.
I understand bootstrap values aren’t node properties. With iTOL bootstrap values are written next to branches, when I said the BS values were shown ‘a node higher’ I just meant the labels were written next to the branch relative to one node above. So by ‘true representation’ I meant which branch should the BS value be shown against. This sounds like an indexing issue. By indexing nodes from 0 or 1 BS labels are shown against different branches. Seeing the consensus among a few tree viewers would be a good way of telling.
Ezra