treeAnnotator v2.6.4 don't recognize taxa/tips

500 views
Skip to first unread message

Miguel Mendez Sandin

unread,
Jun 2, 2021, 2:00:39 PM6/2/21
to beast-users
Dear all,

I'm dating a large phylogenetic tree (~30 000 tips x ~30 000 sites) with treePL (https://academic.oup.com/bioinformatics/article/28/20/2689/203074) for over 1000 replicates. And tried summarizing the dated phylogenetic trees (that I can perfectly visualize each one of them in FigTree v1.4.4) with TreeAnnotator (v2.6.4) but I get the following error:

'
$ treeannotator2 -burnin 0 -heights mean phylo_replicates.tre phylo_mean.tre         

             TreeAnnotator v2.6.4, 2002-2021
                   MCMC Output analysis
                            by
          Andrew Rambaut and Alexei J. Drummond

            Institute of Evolutionary Biology
                 University of Edinburgh
                    a.ra...@ed.ac.uk

              Department of Computer Science
                  University of Auckland
                 ale...@cs.auckland.ac.nz


Processing 1000 trees from file.
A tree with a sampled ancestor is found. Turning on
the sampled ancestor summary analysis.


Total number of trees 1000, where 1000 are used.
Total unique clades: 30547

Finding maximum credibility tree...
Analyzing 1000 trees...
0              25             50             75            100
|--------------|--------------|--------------|--------------|
*************************************************************

Highest Log Clade Credibility: -1.3862943611198906
Collecting node information...
0              25             50             75            100
|--------------|--------------|--------------|--------------|
*************************************************************

Annotating target tree...
Writing annotated tree....
Error to write annotated tree file: String index out of range: -1

'

Just in case it helps, when using the "-lowMem" option the error is as follows:
'
[...]
Processing 1000 trees from file.
WARNING: The number of taxa (0) does not match the number of leafs in the tree (1)
null: Either taxon or alignment should be specified (id=null).
'

The input file gather 1000 trees in basic newick format ('((((l1:105.245172,l2:105.245172):251.644530,(l3:219.502613,').

Had anybody here the same problem before? or knows how to solve it/alternative solutions?

Thank you in advance.

Best,
Miguel.

Remco Bouckaert

unread,
Jun 2, 2021, 9:42:20 PM6/2/21
to beast...@googlegroups.com
Hi Miguel,

Since tree annotator does not fail on a file containing just there two lines:

(l1:105.245172,l2:105.245172):0.0;
(l1:115.245172,l2:115.245172):0.0;

I wonder whether perhaps the taxon names have spaces in them — this can easily be fixed by replacing all spaces with underscores in a text editor. If that is not the problem, perhaps you could convert the newick file into a nexus tree file and use tree annotator on that file.

Hope this helps,
Remco



--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beast-users/11a55ed8-0ad0-420f-9907-ee8a3067e1d5n%40googlegroups.com.

Santiago Sánchez

unread,
Jun 3, 2021, 8:53:36 PM6/3/21
to beast...@googlegroups.com
I noticed this behaviour as well. I think previous BEAST 1/2 versions (I'm not sure in which version this changed) had a somewhat "safe" way to handle tip labels with unacceptable characters (ie., ;-) you get the idea...) by wrapping the label in single quotes. For example, BEAST would write a label like this (below) in the treefile:

taxon_1_(voucher_xxxx);_from_this_location  -->    'taxon_1_(voucher_xxxx);_from_this_location'

The quote wrapping would indicate that all the characters inside are part of the label. Without them TreeAnnotator thinks that the block ends at the semicolon.

A quick fix as Remco mentions is just to replace the unacceptable characters in a text editor. But it would be nice to have the "old" functionality back.

Santiago

Miguel Mendez Sandin

unread,
Jun 4, 2021, 4:02:24 AM6/4/21
to beast-users
Hi Remco,

Thank you for the quick feedback and help
I couldn't find any space in the taxon names (neither in the full tree file):

'
$ grep -c " " phylo_replicates.tre
0
$ grep -ce "l[0-9]*" phylo_replicates.tre
1000
'

I further tried to export the newick tree into nexus file (manually in figTree v1.4.4) and run treeAnnotator v1.10.4 and seems to work. Thanks for the tip! 
However I only tried in a small subset of the trees (first 20 trees: 94Mb) because they are quite large and was taking already long time to load in figTree. Do you know then of a different approach (command line friendly if possible) to convert large trees from newick to nexus?
Otherwise I guess remote server (ssh -X) might be a good solution.

Thanks again, and thank you in advance.

Best,
Miguel.

Miguel Mendez Sandin

unread,
Jun 4, 2021, 4:02:24 AM6/4/21
to beast-users
Hi Santiago,

Thanks for noticing. 
I have two files: i) one with the complete taxon name and some other identifiers and ii) a unique taxon-coded file to safe memory. The unique characters in the taxa names of all trees are:
i) "_-./|0123456789aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ"
ii) "0123456789l"

In both files I have the same problem as described above. I also tried manually quoting (and unquoting) both files and the problem persists.

Santiago Sánchez

unread,
Jun 4, 2021, 11:00:09 AM6/4/21
to beast...@googlegroups.com
Hi Miguel,

Cool. Thanks for checking. I remember you used to be able to use whatever character inside quotes (it's been a long time since I stopped using anything that it's not "[A-Za-z0-9_]*"). I don't recall if this was specific to BEAST1 or if it changed at some point.

I noticed the exact same error that you mentioned when one of my scripts failed to convert ";" for "", so my interpretation was that BEAST "thought" that the taxa block ended right there. Basically, the taxa block is truncated so you end up with just numbers in your tree file. That's what I think is happening.

Cheers,
Santiagos

--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.

Miguel M. Sandín

unread,
Jun 4, 2021, 3:31:29 PM6/4/21
to beast-users
Hi again,

I couldn't understand the source of the problem yet. But at least exporting the trees to nexus and running treeannotator (v1.10.4) works fine (tried in the 1000 trees file). Thanks a lot Remco and Santiago for the feedback.

For future references: Since I couldn't find a nice way to convert such big files from newick to nexus I wrote a python3 script (see below) to include some "nexus formatting" within the newick files as temporary solution or workaround. Works pretty fast, and so far I haven't got any problem/bug (tested in Ubuntu 20.04.2).

Best,
Miguel.


'
#!/usr/bin/env python3

import argparse
import sys
import os

parser = argparse.ArgumentParser(description="Converts newick trees in a single file to nexus format by simply just adding headings and terminal formatting.")

# Add the arguments to the parser
parser.add_argument("-t", "--tree_in", dest="tree_in", required=True,
                                        help="Tree(s) file.")

parser.add_argument("-o", "--tree_out", dest="tree_out", required=True,
                                        help="Tree(s) file output.")

parser.add_argument("-r", "--overwrite", dest="remove", required=False, default=None, action="store_true",
                    help="Overwrite already existing output.")

args = parser.parse_args()

if os.path.exists(args.tree_out):
        print("\n  Warning! File", args.tree_out,"already exists.")
        if args.remove is not None:
                print("  Overwriting...\n")
                os.remove(args.tree_out)
        else:
                print("  Please choose other name for the output tree fle or consider using the option to overwrite (-r/--overwrite).\n")
                sys.exit(1)


with open(args.tree_out, "a") as outfile:
        print("#NEXUS", file=outfile)
        print("Begin trees;", file=outfile)
        c = 0
        for line in open(args.tree_in):
                c += 1
                print(f"\ttree tree_{c} = [&R] {line}", file=outfile, end="")
        print("end;", file=outfile)

'
Reply all
Reply to author
Forward
0 new messages