Here is a simple script to calculate the symmetric distance between one tree (best_tree) and a set of other trees (all_trees):
#!/usr/bin/python
import dendropy
best_tree = dendropy.Tree.get_from_path("best_tree.tre", schema="nexus")
all_trees = dendropy.TreeList.get_from_path("all_trees", schema="nexus")
output_f = open('treedist_output', 'w')
counter = 1
for tree in all_trees:
distance = best_tree.symmetric_difference(tree)
output_f.write("1 " + str(counter) + " " + str(distance) + "\n")
counter += 1
output_f.close()
In a particular example, all_trees contained 1000 trees each having 300+ taxa. I found in that case that this script took ~5 minutes to run, and used 2.3G of memory. It's the memory that I'm most concerned about; is there any more efficient way to do this with DendroPy?
thanks,
Adam