new versions synthesis tree, APIs

25 views
Skip to first unread message

Karen Cranston

unread,
Apr 7, 2016, 12:15:25 PM4/7/16
to Open Tree of Life, opentreeoflife-...@googlegroups.com
We have just released a new version of the synthetic tree and of the APIs.

View the tree (you may need to reload the page to clear browser cache):
The tree is now built with a new make-based pipeline. Details in the release notes:
API docs:
https://github.com/OpenTreeOfLife/germinator/wiki/Open-Tree-of-Life-Web-APIs

Please let us know if you have feedback, questions, suggestions, or critiques:
https://tree.opentreeoflife.org/contact

Cheers,
Karen

Cody Hinchliff

unread,
Apr 7, 2016, 1:18:23 PM4/7/16
to Open Tree of Life, opentreeoflife-...@googlegroups.com
Just a heads up, looks like the remains of a git merge are lurking in the example at the bottom of: https://github.com/OpenTreeOfLife/germinator/wiki/Synthetic-tree-API-v3#about_tree

--
You received this message because you are subscribed to the Google Groups "Open Tree of Life" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opentreeoflif...@googlegroups.com.
To post to this group, send email to opentre...@googlegroups.com.
Visit this group at https://groups.google.com/group/opentreeoflife.
For more options, visit https://groups.google.com/d/optout.

Jim Allman

unread,
Apr 7, 2016, 2:49:06 PM4/7/16
to opentre...@googlegroups.com, opentreeoflife-...@googlegroups.com
On Apr 7, 2016, at 1:18 PM, Cody Hinchliff <codif...@gmail.com> wrote:

Just a heads up, looks like the remains of a git merge are lurking in the example at the bottom of: https://github.com/OpenTreeOfLife/germinator/wiki/Synthetic-tree-API-v3#about_tree

Thanks! I’ve cleaned this up and entered the current API response for `induced_subtree`.

  =jimA=

Jim Allman
Interrobang Digital Media
http://www.ibang.com/
(919) 649-5760

Yan Wong

unread,
Apr 8, 2016, 6:54:36 PM4/8/16
to Open Tree of Life, opentreeoflife-...@googlegroups.com


On Thursday, 7 April 2016 17:15:25 UTC+1, Karen Cranston wrote:
We have just released a new version of the synthetic tree and of the APIs.

Is 'labelled_supertree.tre' at 


supposed to contain taxon labels? Because it doesn't.

Ben Redelings

unread,
Apr 11, 2016, 2:31:42 PM4/11/16
to opentre...@googlegroups.com
Thanks for pointing this out!  We currently build two versions of the tree, one with names and one without.  However, building the one with names is optional, because we don't need it to serve the tree.  It looks like that one was either not built, or not distributed, or both.

-BenRI

Yan Wong

unread,
Apr 11, 2016, 2:33:11 PM4/11/16
to opentre...@googlegroups.com

On 11 Apr 2016, at 19:31, Ben Redelings <benjamin....@gmail.com> wrote:

> Thanks for pointing this out! We currently build two versions of the tree, one with names and one without. However, building the one with names is optional, because we don't need it to serve the tree. It looks like that one was either not built, or not distributed, or both.

Thanks. It would be good to make both available. But even if not, if there is a labelled_ version it should presumably contain labels!

Yan

Ben Redelings

unread,
Apr 11, 2016, 2:39:56 PM4/11/16
to opentre...@googlegroups.com
Oh, that's interesting. I see how that could be misleading :-) I guess
the labelled_ version DOES have labels, just not taxon labels though.
That is "ott12354" in a newick is a label, even though it doesn't
contain the taxon name.

The "labelled_supertree.tre" name really corresponds to a step in the
pipeline in which unlabelled nodes with no OTT ids are given labels like
mrcaott123ott456. The nodes can then be referred to for purposes of
annotation.

Yan Wong

unread,
Apr 11, 2016, 3:35:15 PM4/11/16
to opentre...@googlegroups.com

On 11 Apr 2016, at 19:39, Ben Redelings <benjamin....@gmail.com> wrote:

> Oh, that's interesting. I see how that could be misleading :-) I guess the labelled_ version DOES have labels, just not taxon labels though. That is "ott12354" in a newick is a label, even though it doesn't contain the taxon name.

Ah, well, yes. If you use ‘label’ to mean the OTT id.

But if it didn’t have that, the tree would be basically useless, which is why simply _labelled is a bit confusing.

Either way, is there any way to get a newick in the same format as the previous draftversion4.tre (i.e. with taxon labels, but without arbitrary intermediate-named labels)

Yan

Mark Holder

unread,
Apr 11, 2016, 3:54:32 PM4/11/16
to opentre...@googlegroups.com
Hi,
Sorry for the confusion. The version of the tree with OTT names (in
addition to ott IDs) is at

http://files.opentreeoflife.org/synthesis/opentree5.0/output/labelled_supertree/labelled_supertree_ottnames.tre

With respect to the pipeline for producing the tree, the previous step
has a tree that only has tip labels. So the labeling step is
generating labels for the internal nodes. That pipeline thinks of an
OTT ID (or a mrcaott###ott###) as a label. Hence the name
"labelled_supertree.tre". We need to make the link to the version with
names more obvious. If we do that, then this quirky name probably
won't be too problematic.

all the best,
Mark
> --
> You received this message because you are subscribed to the Google Groups "Open Tree of Life" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to opentreeoflif...@googlegroups.com.
> To post to this group, send email to opentre...@googlegroups.com.
> Visit this group at https://groups.google.com/group/opentreeoflife.
> For more options, visit https://groups.google.com/d/optout.



--
Mark Holder

mtho...@gmail.com
mtho...@ku.edu
http://phylo.bio.ku.edu/mark-holder

==============================================
Department of Ecology and Evolutionary Biology
University of Kansas
6031 Haworth Hall
1200 Sunnyside Avenue
Lawrence, Kansas 66045

lab phone: 785.864.5789
fax (shared): 785.864.5860
==============================================

Yan Wong

unread,
Apr 12, 2016, 7:57:03 AM4/12/16
to Open Tree of Life
On Monday, 11 April 2016 20:54:32 UTC+1, Mark Holder wrote:
Hi,
Sorry for the confusion. The version of the tree with OTT names (in
addition to ott IDs) is at

   http://files.opentreeoflife.org/synthesis/opentree5.0/output/labelled_supertree/labelled_supertree_ottnames.tre

Thanks. 

I wonder if it is worth having the opentree number embedded in the file, as previously with draftversionX.tre (perhaps here something like labelled_supertree_ottnames_5.0.tre). You could then have a 'latest' newick file (e.g. labelled_supertree_ottnames_latest.tre), which is symlinked to the most recent version. That's how wikipedia does it for their regular dump files.

Jonathan A Rees

unread,
Apr 12, 2016, 9:17:16 AM4/12/16
to opentre...@googlegroups.com
I was going to put a symlink in the synthesis/ directory to the opentree5.0 directory. Would that do the trick? Live just now, but I could change it:


I call it 'current' instead of 'latest' because sometimes the 'latest' is an experimental tree we put out for review. I've done the same in the ott/ directory.

Would that work for you?

Jonathan A Rees

unread,
Apr 12, 2016, 9:20:41 AM4/12/16
to opentre...@googlegroups.com
Oh sorry, that doesn't give a single URL you can just fetch from. More work to be done.

This is getting kind of technical for the big opentreeoflife list, maybe we should continue discussion at

Jonathan

Yan Wong

unread,
Apr 12, 2016, 11:04:40 AM4/12/16
to Open Tree of Life
On Monday, 11 April 2016 20:54:32 UTC+1, Mark Holder wrote:
The version of the tree with OTT names (in addition to ott IDs) is at

   http://files.opentreeoflife.org/synthesis/opentree5.0/output/labelled_supertree/labelled_supertree_ottnames.tre

So this seems to be in a different format to previous draft trees. In particular, it has the the new-style labels for the unnamed nodes, and it also has spaces not underscores in quoted label strings. Is there any way of knowing if there are other format differences? It's a little bit of a pain that the format changes when I've got a whole pipeline of parsing routines relying on a particular format. Is there any plan too standardise this?

And is there any chance of getting a tree without the mrcaott###ott### labels. Perhaps supertree_ottnames.tre?

Cheers

Yan

Mark Holder

unread,
Apr 12, 2016, 12:29:14 PM4/12/16
to opentre...@googlegroups.com
Hi Yan,

The tree is being produced by an entirely different backend compared
to the previous versions, which is the cause of the differences wrt
spaces and underscores.

You can create a tree without the mrca labels using:

sed -E 's/[)]mrcaott[0-9]+ott[0-9]+/)/g'
labelled_supertree_ottnames.tre > supertree_ottnames.tre


I think that the current behavior is to encode a space before
"ott####" in the label using newick rules. So, if the label does not
require quoting, this is done with an underscore. If the label does
require quoting, then the space is used. If we had an _ in a quoted
label, then a compliant newick parser would treat the suffix as
"_ott####" instead of " ott####". So, I'd rather not change it to an
underscore in a quoted string.

I think that you can get rid of the mrca labels and change the spaces
to _ in quoted labels with:

sed -E 's/[)]mrcaott[0-9]+ott[0-9]+/)/g'
labelled_supertree_ottnames.tre | sed -E "s/ (ott[0-9]+')/_\1/" >
supertree_ottnames.tre


all the best,
Mark

Yan Wong

unread,
Apr 12, 2016, 12:47:14 PM4/12/16
to opentre...@googlegroups.com
On 12 Apr 2016, at 17:29, Mark Holder <mtho...@gmail.com> wrote:

> You can create a tree without the mrca labels using:
>
> sed -E 's/[)]mrcaott[0-9]+ott[0-9]+/)/g'
> labelled_supertree_ottnames.tre > supertree_ottnames.tre

Thanks - that’s a good point. And since there is no underscore (or space) before the ott, then it won’t get confused with that well-known bacterial taxon ‘mrcaottXXX’ :)

> I think that the current behavior is to encode a space before
> "ott####" in the label using newick rules.

Yes, you are now using more conventional newick quoting rules, which interpret underscores in unquoted names as spaces. E.g. using Dendropy I previously had to set "preserve_underscores=True". The new behaviour is more rational, but I’ll need to recode my scripts to carry out different behaviour for OT4 and OT5.

It’s more the idea that it would be sensible to standardize this all somehow. For instance, I don’t know if there is a guarantee not to have underscores in quoted names in the newick file (I assume quoted names won’t have underscores). And previously there were various rules about converting and condensing non-standard characters in taxon names when used in the newick. E.g. braces in names were subbed for underscores, and any run of underscores were condensed to a single one. I guess most of these rules have stayed the same (e.g. there are no braces, commas, colons, or semicolons in taxon names), but I don’t know for sure.

Cheers

Yan

Mark Holder

unread,
Apr 12, 2016, 1:12:22 PM4/12/16
to opentre...@googlegroups.com
On Tue, Apr 12, 2016 at 11:47 AM, Yan Wong <y...@pixie.org.uk> wrote:

> [snip] I guess most of these rules have stayed the same (e.g. there are no braces, commas, colons, or semicolons in taxon names), but I don’t know for sure.


hmm. I'm afraid that it is unlikely to be true. As far as I know, the
current version no longer munges names at all.

Why don't we take this discussion off this general list, to avoid
clogging the inboxes of folks who are not interested in this topic.
Could you let me know what characters in names would cause you
problems by adding a comment to
https://github.com/OpenTreeOfLife/treemachine/issues/147 ?
Then I'll work on a script to remove the offending characters for you.

all the best,
Mark

Yan Wong

unread,
Apr 12, 2016, 4:26:38 PM4/12/16
to Open Tree of Life
On Tuesday, 12 April 2016 18:12:22 UTC+1, Mark Holder wrote:
Could you let me know what characters in names would cause you
problems by adding a comment to
https://github.com/OpenTreeOfLife/treemachine/issues/147 ?
Then I'll work on a script to remove the offending characters for you.

Thanks. Done. 

Yan Wong

unread,
Apr 19, 2016, 6:49:28 AM4/19/16
to Open Tree of Life
On Monday, 11 April 2016 20:54:32 UTC+1, Mark Holder wrote:
Hi,
Sorry for the confusion. The version of the tree with OTT names (in
addition to ott IDs) is at

   http://files.opentreeoflife.org/synthesis/opentree5.0/output/labelled_supertree/labelled_supertree_ottnames.tre

I've just been playing with this taxon-labelled version of the tree, and it seems to be missing unifurcations. E.g. it doesn't have "Coccinella septempunctata ott343294" (https://tree.opentreeoflife.org/opentree/argus/ottol@343294/Coccinella-septempunctata), only the subspecies "Coccinella septempunctata brucki ott646773". This has rather large ramifications when looking at some higher level taxa such as Cephalochordata, which are unifurcations and therefore absent. Is there a version with all the names present in the synth tree?

Yan

Mark Holder

unread,
Apr 19, 2016, 4:21:04 PM4/19/16
to opentre...@googlegroups.com
Hi,
sorry for the delay. I just put a version at
http://phylo.bio.ku.edu/ot/labelled_supertree_simplified_ottnames_with_monotypic.tre.gz
that has monotypic taxa. The log for the name munging is at
http://phylo.bio.ku.edu/ot/simplified_ottnames_with_monotypic.log

Those names will change slightly (dropping the "_with_monotypic") the
next time that we build the tree, but we should be set up to produce
these automatically now. And in the future, they should show up in
the directory like this one:
http://files.opentreeoflife.org/synthesis/opentree5.0/output/labelled_supertree/index.html
(except with a different version # instead of 5.0, of course).



all the best,
Mark

Yan Wong

unread,
Apr 19, 2016, 4:22:41 PM4/19/16
to opentre...@googlegroups.com

On 19 Apr 2016, at 21:21, Mark Holder <mtho...@gmail.com> wrote:

> sorry for the delay.

No - that’s super speedy. Thanks for the quick response, and your general helpfulness!

Cheers

Yan
Reply all
Reply to author
Forward
0 new messages