BUCKy and the delineating of a species.

Noppol Kobmoo

unread,

Oct 26, 2020, 3:45:35 AM10/26/20

to BUCKy users

Hi,

I’m having another question concerning BUCKy again. I’m trying to see whether the rationale behind BUCKy can really fit my work.

Let me explain... The principal task in my work is to “delineate new species”. I work on fungi. So for fungal molecular taxonomists, we refer in general to the finding of highly supported monophyletic clades, a practice based on Phylogenetic Species Concept (PSC). With increasing number of used markers, the criterion for recognizing such species is the concordance between different genes; we expect the branches leading to different species to be concordant while the branches within species to be conflicting as the members within species do recombine (see attached Figure taken from Taylor et al., (2000)⁠). Although conflicting within species, the branches within species should always supported the monophyly of the species, hence the expectation of a highly supported node at the origin of the species under the classic framework of molecular phylogenetics–that is where I find it difficult to fit in Bayesian Concordance Analysis (BCA).

If I’m not mistaking the concept of BCA, the concordance between genes for a “clade” is evaluated based on the “split” at the origin of that clade (from what I understand from Ané et al., (2007)⁠. While a split characterizes the separation between groups of branches, a monophyletic clade classically represents the idea of grouping with a common ancestor. In case where we have several individuals recombining and belonging to a same species, the split at the origin of the species can actually have low concordance factor.

For example, from the figure attached, under the phylogenetics species criterion, ABCD and WXYZ each belongs to different species with the split ABCD|WXYZ that should have high concordance factor as well as high bootstrap support/posterior probability. However, within species, the clades [ABCD] and [WXYZ] can each have low concordance factor as inside them there could be different splits–A|BCD, AB|CD or ABC|D for the clade [ABCD], and W|XYZ, WX|YZ or WXY|Z for the clade [WXYZ]– supported by different set of genes. However, these clades should have high bootstrap support/posterior probability under the classical phylogenetics frameworks.

When I started using BUCKy for my work, I had the idea that the clade representing a species of interest should have high concordance factor as the clade should be supported throughout the genome. However, the thorough examination of the results showed me the contrary in some cases while the simple ML or Bayesian analyses of singles or concatenated genes tended to give these clades high supports.

So my interpretation is that BUCKy is tailored for making “species trees” in which not many samples are drawn from each species, but when there are several individuals per species, it may not be the best tool to “designate” new species.

I just want some views from other users and the developers of the software and the algorithm behind.

Cited references.

Ané, C., Larget, B., Baum, D.A., Smith, S.D., Rokas, A., 2007. Bayesian estimation of concordance among gene trees. Mol. Biol. Evol. 24, 412–426. https://doi.org/10.1093/molbev/msl170

Taylor, J.W., Jacobson, D.J., Kroken, S., Kasuga, T., Geiser, D.M., Hibbett, D.S., Fisher, M.C., 2000. Phylogenetic Species Recognition and Species Concepts in Fungi. Fungal Genet. Biol. 31, 21–32. https://doi.org/10.1006/fgbi.2000.1228

Phylogenetics_Species_Criterion.png

Taylor_et_al_2000.pdf

DavidBaum

unread,

Oct 26, 2020, 8:53:37 PM10/26/20

to BUCKy users

Let me chime in with some clarifications.

The idea of “monophyly” is ambiguous when different parts of the genome have different histories. There are several ways you can rethink monophyly in this case:

1) Identify clades that are supported by a majority of the genome (i.e., CF>0.5) and consider those “monophyletic” [i.e., use BUCKy]

2) Identify clades that are true of more of the genome than any conflicting clade (usually implemented as requiring that the clade's CF credibility interval does not overlap any conflicting clade) [i.e., use BUCKy]

3) Define “relatedness” as the average time since common ancestry across the genome (low time = high relatedness) and look for “exclusive" groups of organisms where the minimum pairwise relatedness between members of the ingroup is higher than between any ingroup-outgroup pair. Methods for doing this are described in this paper on bacterial species delimitation: https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-018-5099-6

4) Shift from a focus on gene trees to an estimate of the population tree (i.e., assume that the POPULATIONS have a treelike history such that all gene-to-gene discordance is due to incomplete lineage sorting). Then a clade and potential species should be a clade on the species tree. You can estimate a population tree (also called a species tree) from BUCKy, but it is not really the best program for that - there are many others to choose from.

5) As in 4, but allow that some gene-to-gene discordance is due to reticulation. This would involve estimating a phylogenetic network, using something like SNAQ (which uses the output from BUCKy), part of the Phylonetworks package. One problem here is that the concept of monophyly can be tricky to apply in a network.

I hope this is helpful,

Sincerely,

David

Cécile Ané

unread,

Oct 26, 2020, 9:39:35 PM10/26/20

to BUCKy users

just in case the question is technical (instead of fundamental/philosophical): I wonder if the cause of confusion is edge versus node.

In the scenario pictured above, both ABCD and WXYZ would have high (not low) concordance factors. The concordance factors are associated with branches, not nodes. The crown node of ABCD, that can be “resolved” in many different ways via its daughter branches. But the concordance factor for the ABCD clade is attached to the stem edge of that clade, not the crown node, and not the daughter edges of the crown node.

Reply all

Reply to author

Forward