Fwd: [bio-phylo] Newick tree simplification (#1)

37 views
Skip to first unread message

Rutger Vos

unread,
May 29, 2012, 4:38:09 AM5/29/12
to bio-...@googlegroups.com
Hi all,

Bio::Phylo power user has added some functionality for dealing with
very large newick strings (in his case: ~100k tips) for which you only
need some tips to be parsed into a tree structure. Here is the code
that he contributed: https://github.com/rvosa/bio-phylo/pull/1 (I
changed the -simplify flag to -keep). Expect this to end up on CPAN
some time soon.

Best wishes,

Rutger


---------- Forwarded message ----------
From: Florent Angly
<reply+i-4775463-1e54818cec149f0...@reply.github.com>
Date: Mon, May 28, 2012 at 7:13 AM
Subject: [bio-phylo] Newick tree simplification (#1)
To: Rutger Vos <rutge...@gmail.com>


Hi Rutger,

As I hinted previously, here is some code I wrote to simplify a Newick
tree before parsing it. Given a list of terminal node IDs to keep,
this code will process all cherries and recursively remove the
terminal nodes that are not needed.

I tried some more complex, more thorough code before, but it was quite
slow. However, processing only the cherries is very satisfactory in
terms of performance. For example, given 200 input node IDs the large
Greengenes tree that never finished parsing was simplified to ~900
nodes. The entire process took less than 1.5 minutes.

Cheers,

Florent


You can merge this Pull Request by running:

 git pull https://github.com/fangly/bio-phylo master

Or you can view, comment on it, or merge it online at:

 https://github.com/rvosa/bio-phylo/pull/1

-- Commit Summary --

* Pruning of cherries from Newick string
* Newick simplifier improvements
* POD update

-- File Changes --

M lib/Bio/Phylo/IO.pm (7)
M lib/Bio/Phylo/Parsers/Newick.pm (66)
A t/42-simplify-newick.t (108)

-- Patch Links --

 https://github.com/rvosa/bio-phylo/pull/1.patch
 https://github.com/rvosa/bio-phylo/pull/1.diff

---
Reply to this email directly or view it on GitHub:
https://github.com/rvosa/bio-phylo/pull/1


--
Dr. Rutger A. Vos
Bioinformaticist
NCB Naturalis
Visiting address: Office A109, Einsteinweg 2, 2333 CC, Leiden, the Netherlands
Mailing address: Postbus 9517, 2300 RA, Leiden, the Netherlands
http://rutgervos.blogspot.com
Reply all
Reply to author
Forward
0 new messages