Yes, I agree the internal nodes are tricky.
What you suggested sounds like a good idea and would be really helpful, however, I was thinking for 100s of output gene trees, would it be confusing to process each gene tree's internal nodes (because the names would overlap)?
The way I was picturing this is something like:
- what we have now is the yellow region in the transfer.txt - that's based on the species tree node labels - this is very nice and easy to compare across multiple gene trees, and for something like the 1st row, its fairly simple to get the donor and recipient genes
- for 2nd row, when transfer happened from 1 gene of B species to 3 genes in node_10 (listed in Reciever _geneIDs) - this info can be taken from the recphylo.xml
- however third row, where transfer happened between two internal nodes, I guess mentioning just the recipient genes to which transfers were inferred would be cool, and maybe the donor can remain as node2 itself..because the recipient genes IDs are more important here, for listing the exact horizontally transferred genes.
I don't know if it makes sense to do it this way.. (I'm still learning generax :))
Or if there's a way to parse the recphylo.xml files that would be fine too I guess (because those files have the geneIDs and the internal node labels based on species tree - which remains the same throughout)
Regards,
Neha