I'm trying to truncate the GtDB tree of Bacteria
to retain only selected phyla and remove other phyla (with all their descendants) from the tree. In addition, I would like to further truncate the tree to keep all taxa of the selected phyla down to the order level (remove everything below order).
I spent hours trying to get this work, but failed. Here is what I have so far:
from biolib.newick import parse_label
PHYLA_TO_RETAIN = set(['p__Patescibacteria', 'p__Planctomycetota'])
tree = dendropy.Tree.get_from_path('bac120.tree',
taxa_in_tree = set()
for node in tree.postorder_node_iter():
if not node.is_leaf():
support, taxon, _auxiliary_info = parse_label(node.label)
if taxon in PHYLA_TO_RETAIN:
for leaf in node.leaf_iter():
if not PHYLA_TO_RETAIN:
I can also provide the list of orders to retain/remove.