How low-supported nodes in gene trees affect SNaQ?

Érico Polo

unread,

Dec 31, 2024, 10:10:23 PM12/31/24

to PhyloNetworks users

Zhang et al. (2018) demonstrated that collapsing nodes (or branches) with low support into polytomies significantly increases the accuracy of ASTRAL. I wonder if the same would be true for SNaQ, and I started to do some tests with real data. Initially, I did as recommended by Mirarab (2019) and collapsed only nodes with very low support (i.e., bs < 10-25), but I did not notice much difference in the SNaQ results. However, when increasing the minimum support to 75 (i.e., collapsing nodes with bs < 75 into polytomies), I had the impression that the SNaQ results started to make more sense from a biogeographical point of view. I am thinking of running some systematic tests with simulations to verify this, but before embarking on this mission, I would like to know if you have already done tests of this type, or what considerations you would have to make about it.

Thank you very much in advance.

Érico.

Cécile Ané

unread,

Jan 3, 2025, 3:42:44 PM1/3/25

to PhyloNetworks users

Hi Érico, I personally don't know of any prior study doing this. That's an interesting question!

Your question assumes that the input to SNaQ are inferred gene trees, 1 tree per gene (or locus). SNaQ can take this as input, but does not *require* this type of input.

A better input to SNaQ is a table of quartet concordance factors estimated with a method that accounts for gene tree error. The downside of giving 1 tree per gene as input is ignoring gene tree error.

In the SNaQ paper, quartet concordance factors are estimated with BUCKy, to account for gene tree error.

There is prior work looking at the impact of gene tree estimation error on BUCKy.

I mostly know of Chapter 3 (scan), p.35-52 in the book "Estimating species trees: Practical and theoretical aspects", edited by Knowles & Kubatko. In section 3.3, figure 3.2, there is an example in which filtering has no impact. It makes no difference using all loci, many of which have poorly resolved trees, versus only using 1/3 of all loci: those whose estimated tree has 95% posterior credibility (and that doesn't reject a clock). Also, sampling 100 loci (out of ~30,000) doesn't make a difference either in terms of the estimated concordance factors (uncertainty increases with fewer loci of course).

The take-home message is that BUCKy does a good job accounting for gene tree error when estimating concordance factors, at least in this example.

So if the input to SNaQ are quartet concordance factors estimated with BUCKy, or some other method that accounts for gene tree error; then I would guess that filtering low-information loci has a small impact. But again, a proper study on this would be interesting, I think.

Cécile.

Érico Polo

unread,

Jan 4, 2025, 11:41:22 AM1/4/25

to PhyloNetworks users

Hello, thank you very much for your prompt reply!

I remember that when I first started experimenting with SNaQ, I "skipped" the part about generating the quartet concordance table with BUCKy because I had the impression that I wouldn't be able to use it with multiple alleles (tips) per taxon, and I went straight to a part of the tutorial that showed how to map the terminals in the taxa directly from the gene trees. I work mainly with phylogeography, and I always have multiple samples representing cryptic populations or taxa, and I need this mapping. Is there a way I can do this with BUCKy?

By the way, for a while now I haven't been able to find the SNaQ page on github.io, with several tutorials, including the one I mentioned (about mapping multiple alleles). Has it really gone offline?

Érico.

Reply all

Reply to author

Forward