Comments with newlines at start of newick file

19 views
Skip to first unread message

Yan Wong

unread,
Jun 26, 2014, 10:24:45 AM6/26/14
to bio-...@googlegroups.com
I'm commenting my newick files with a line in square brackets at the start, e.g.

[My comments here]
(((CYCLOSTOMATA,GNATHOSTOMATA)Vertebrata,Tunicates)ChordataMinusAmphioxus,CEPHALOCHORDATA);

on the assumption that "Whitespace elsewhere is ignored" (https://en.wikipedia.org/wiki/Newick_format). But Bio::Phylo::Parsers::Newick seems to die if there are newlines at the start of the newick file.  Is there a switch I can use when parsing to read these in correctly, or should I just strip newline before reading?

Cheers

Yan (apologies for all the posts!)

Rutger Vos

unread,
Jun 27, 2014, 4:19:46 AM6/27/14
to bio-...@googlegroups.com
Hi Yan,

to the best of my knowledge, this should work. Note that the comments are stripped out by the reader, so you won't have access to them by way of the tree objects you get out of the newick (though they will of course still be there in the file), but they shouldn't cause problems. Can you try the attached script and tell me what you get because I can't reproduce the issue on my end?

Rutger


--
You received this message because you are subscribed to the Google Groups "bio-phylo" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bio-phylo+...@googlegroups.com.
To post to this group, send email to bio-...@googlegroups.com.
Visit this group at http://groups.google.com/group/bio-phylo.
For more options, visit https://groups.google.com/d/optout.

yan.pl

Yan Wong

unread,
Jun 27, 2014, 6:14:06 AM6/27/14
to bio-...@googlegroups.com
On Friday, 27 June 2014 09:19:46 UTC+1, Rutger Vos wrote:
 Can you try the attached script and tell me what you get because I can't reproduce the issue on my end?

Your script works fine for me. But after a little debugging on my end and I think the problem comes because I undefined $/ so that I could read multiple files in using a single while(<>) loop. So for instance, this dies for me

#!/usr/bin/perl
use strict;
use warnings;
use Bio::Phylo::IO 'parse_tree';

$/ = undef;
my $tree = parse_tree(
    -string => "[My comments here]
(((CYCLOSTOMATA,GNATHOSTOMATA)Vertebrata,Tunicates)ChordataMinusAmphioxus,CEPHALOCHORDATA);",
    -format => 'newick'
);

print $tree->to_newick;

Rutger Vos

unread,
Jun 27, 2014, 6:49:37 AM6/27/14
to bio-...@googlegroups.com
Er, yes, don't mess with the global special variables :-)

If you must play around with $/ you want to do that with the 'local' keyword so that the rest of the world outside your specific block stays the same.


Yan Wong

unread,
Jun 27, 2014, 8:19:09 AM6/27/14
to bio-...@googlegroups.com
On Friday, 27 June 2014 11:49:37 UTC+1, Rutger Vos wrote:
Er, yes, don't mess with the global special variables :-)

Fair point!
 
If you must play around with $/ you want to do that with the 'local' keyword so that the rest of the world outside your specific block stays the same.

Hmm, difficult in this case because the Bio::Phylo code is inside the while(<>) block which I'm using with undef $/ to read multiple files. I guess I'll have to save the previous value and set it just before using Bio::Phylo::IO->parse_tree.

Thanks

Yan

Rutger Vos

unread,
Jun 27, 2014, 8:41:40 AM6/27/14
to bio-...@googlegroups.com
I don't fully understand the use case. You know you can read multiple trees from the same file in one go, right? Or is that not what you're trying? You could also in a separate block slurp the contents of multiple files and pass that into parse() as a string.


--

Yan Wong

unread,
Jun 27, 2014, 9:22:46 AM6/27/14
to bio-...@googlegroups.com
On Friday, 27 June 2014 13:41:40 UTC+1, Rutger Vos wrote:
You could also in a separate block slurp the contents of multiple files and pass that into parse() as a string.

I've got trees in multiple files. I want to pass all the filenames to the perl script and iterate over each tree. You're right there's probably a better way to do it than undef $/; while(<>) {parse($_)}

By the way, I've just installed the github version, and I can't get unnamed nodes out of $tree->resolve(1). This just seems to name the nodes "NodeX", "NodeY" etc rather than rx, ry:

#!/usr/bin/perl
use Bio::Phylo::IO 'parse_tree';
srand(1);
my $tree = parse_tree(
'-format' => 'newick',
'-handle' => \*DATA,
);

$tree->resolve(1);

print $tree->to_newick(-nodelabels=>1);

#can't get e.g ((D,((A,B)mynode1),C)mynode2,E);
#or even ((D,((A,B)mynode1):0.0,C)mynode2,E);
__DATA__
(((A,B)mynode1,C,D)mynode2,E); 
Reply all
Reply to author
Forward
0 new messages