Custom truncation of GtDB tree

8 views
Skip to first unread message

Andrzej Zielezinski

unread,
Sep 12, 2021, 1:09:36 PM9/12/21
to DendroPy Users
Dear DendroPy Users,

I'm trying to truncate the GtDB tree of Bacteria to retain only selected phyla and remove other phyla (with all their descendants) from the tree. In addition, I would like to further truncate the tree to keep all taxa of the selected phyla down to the order level (remove everything below order).

I spent hours trying to get this work, but failed. Here is what I have so far:

import dendropy
from biolib.newick import parse_label

PHYLA_TO_RETAIN = set(['p__Patescibacteria', 'p__Planctomycetota'])

tree = dendropy.Tree.get_from_path('bac120.tree', 
                                    schema='newick', 
                                    rooting='force-rooted', 
                                    preserve_underscores=True)

taxa_in_tree = set()
for node in tree.postorder_node_iter():
    if not node.is_leaf():
        support, taxon, _auxiliary_info = parse_label(node.label)
        if taxon in PHYLA_TO_RETAIN:
            for leaf in node.leaf_iter():
                taxa_in_tree.add(leaf.taxon)
            PHYLA_TO_RETAIN.remove(taxon)
                
    if not PHYLA_TO_RETAIN:
        break

tree.retain_taxa(taxa_in_tree)

tree.write_to_path('tree.newick', 
                schema='newick', 
                suppress_rooting=True, 
                unquoted_underscores=True)

I can also provide the list of orders to retain/remove.

Please help :)
Reply all
Reply to author
Forward
0 new messages