Reconstructing a tree from DNA sequences

27 views
Skip to first unread message

in...@lucasvandijk.nl

unread,
Jun 16, 2015, 3:22:42 PM6/16/15
to dendrop...@googlegroups.com
Hi all,

Forgive me for maybe a little bit uninformed question, I've only read the DendroPy 4.x primer on dendropy.org, and one thing isn't really clear to me: can I reconstruct a tree from a DnaCharacterMatrix using my own distance function?

For a study assignment we're given the task to reconstruct a phylogenetic tree without using any sequence alighment algorithm/tool (but using statistics of k-mers etc.). So I have a few DNA sequences (which are evolutionary related), in PHYLIP format. I can read this with DendroPy. And these sequences are of course not aligned or something. For this programming assignment I want to focus on the statistic part, not on the part of actually reading and writing the data files, worrying about the datastructures to use for trees etc.

And DendroPy looks like a very useful Python package for this. So my question: what would be the best way to reconstruct a phylogenetic tree using DendroPy based on the given DNA sequences? In the documentation I couldn't immediately find anything about tree reconstruction. Should I implement my own neighbour-joining function, using the DendroPy Tree classes? Or is there something available in DendroPy itself where I can hook into?

Thanks in advance,
Lucas

Jeet Sukumaran

unread,
Jun 17, 2015, 1:22:36 AM6/17/15
to dendrop...@googlegroups.com
Hi Lucas,

DendroPy is designed to do precisely that, i.e., "focus on the statistic
part, not on the part of actually reading and writing the data files,
worrying about the data structures to use for trees etc.".

However, for the task you describe, you will need to manipulate the
alignments themselves, and while the tree data structures and operations
of DendroPy are pretty sophisticated, the character alignment
manipulation aspects are pretty primitive. In particular, column-based
operations are possible, but pretty clunky.

I would suggest that you consider your task a two part operation.

(1) Alignment
(2) Tree building.

For (1), I suggest that you read the data using native DendroPy methods,
convert the sequences to a list of lists, and align that:

~~~
d1 = dendropy.DnaCharacterMatrix.get(
path="data.dat", schema="phylip")
seqs = [s.symbols_as_list() for s in d1]
...
[align``seqs`` and save to file]
~~~

For (2), you would then using DendroPy's tree infrastructure to build a
tree:

http://dendropy.org/primer/trees.html#building-a-tree-programmatically

I've been meaning to add an NJ routine in DendroPy for a while now.
Probably will not get around to it till after August some time ...
> --
> You received this message because you are subscribed to the Google
> Groups "DendroPy Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to dendropy-user...@googlegroups.com
> <mailto:dendropy-user...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--



--------------------------------------
Jeet Sukumaran
--------------------------------------
jeetsu...@gmail.com
--------------------------------------
Blog/Personal Pages:
http://jeetworks.org/
GitHub Repositories:
http://github.com/jeetsukumaran
Photographs (as stream):
http://www.flickr.com/photos/jeetsukumaran/
Photographs (by galleries):
http://www.flickr.com/photos/jeetsukumaran/sets/
--------------------------------------

Reply all
Reply to author
Forward
0 new messages