gCF from IQ-TREE

39 views
Skip to first unread message

Matias Köhler

unread,
Jun 2, 2023, 12:27:58 PM6/2/23
to PhyloNetworks users
Hello everyone -

Quick question:

Does PhyloNetworks is able to handle the output 'concord.cf.stat' of gCF from iQ-TREE, or is there some workflow to convert that data to be useful for SNaQ?

Thanks,
Matias.

Cécile Ané

unread,
Jun 4, 2023, 12:45:24 PM6/4/23
to PhyloNetworks users
Hi Matias, thanks for asking!
The short answer is 'no'.

There would be 2 issues: one technical, one conceptual (for the purpose of running SNaQ).

1. Technical: the 'concord.cf.stat' file refers to edges by some edge 'ID', so this file alone is insufficient because we need another file with a tree and a way to map each edge  in the tree to its ID. I believe that the file 'concord.cf.branch' contains this information, and that the file 'concord.cf.tree' could provide a direct way to read in the tree with gCF information on each edge (based on the IQ-TREE manual).

2. Conceptually, the gCFs from IQ-TREE have less information than quartet CFs that SNaQ uses. A gCF calculated by IQ-TREE is attached to a split of the full taxon set. One single "rogue" taxon jumping across the 2 sides of the split from gene to gene would strongly decrease the gCF for that split.
A quartet CF focuses on a subset of 4 taxa, that is, a smaller split of 2-vs-2 taxa rather than a split of the full set of taxa. A rogue taxon would affect the CFs of the quartets containing it, but would not affect quartets for subsets of 4 taxa that do not contain the rogue taxon.
Site sCFs are defined quite differently than gCFs by Minh et al (2020), and more similarly to quartet CFs. For an edge of interest, the sCF of the edge is the average sCF over all subsets of 4 taxa that are decisive for (or "span") the 4-way partition defined by the edge. So the sCF averages over (estimated) quartet CFs, but loses information about the individual quartet CFs for individual 4-taxon subsets. The sCF are closer to quartet CFs, but not enough for running SNaQ.
If I'm wrong, please let me know! (IQ-TREE evolves with more functionality very fast!)

If there is interest to solve the technical issue (#1), it should be fairly easy to have a function that parses the file 'concord.cf.tree' file that has a tree and gCFs as node labels. And I'd be happy to help with this. That might be interesting for some purposes, but not as input for SNaQ.
Cecile

Reply all
Reply to author
Forward
0 new messages