Methodology for Subspecies Phylogeny Inference

21 views
Skip to first unread message

Cleiton Valentim

unread,
Sep 19, 2025, 10:23:08 AMSep 19
to BioGeoBEARS
Hi. 

This might not be the right group, but I would like to know if it's possible to infer the age and topology of subspecies inserted within a dated phylogeny. I have the dated phylogeny of the species, but not for the subspecies. There is no molecular data at the subspecies level for this group of birds I'm studying, and the division between subspecies is based mainly on geographic distribution and a few morphological differences, such as feather coloration. I would like to insert the subspecies for the BioGeoBEARS analysis, but they would appear as polytomies, and even if I correct these polytomies, I still have methodological concerns about how to justify this decision. 

I would greatly appreciate it if anyone has any methodology to suggest.

Best,
Cleiton Valentim

Nick Matzke

unread,
Oct 24, 2025, 5:10:12 AMOct 24
to BioGeoBEARS
Hi -- your instincts are correct.  Without DNA or other coded character data, there no way to include these.

I think what you've really got is a population genetics question and probably should go that route.

Even with DNA, it would not be clear what you are doing by using BioGeoBEARS in such a situation.  See the below. Cheers!
Nick


OTUs of the phylogeny should be species/populations, not specimens
The OTUs (operational taxonomic units) at the tips of the input phylogeny should be species or monophyletic populations. A raw tree where the OTUs are individual specimens should NOT be used, unless each specimen can "stand in" for their respective species/populations. If an input phylogeny has multiple specimens for each species/monophyletic population, this will severely bias the results: a specimen can only live in a single area, so a phylogeny of specimens will have all of its tips living in single areas. When all of the tips inhabit single areas, this typically strongly favors the DEC+J model over the DEC model. This is fine if each species/monophyletic population really does live in a single area, but it's a big problem if species/monophyletic populations actually live in multiple areas, but you have input a specimen tree that forces each tip to live in a single area. Furthermore, if multiple specimens from the same species/population are in the phylogeny, a large number of fake "speciation"/"cladogenesis" events are introduced.

This subtle issue is often ignored, because most phylogenetic models (for DNA, amino acids, Mkv model for morphology, etc.) assume that nothing special happens at cladogenesis: whatever state is in the ancestor is instantaneously passed on to both daughters after cladogenesis. They are "continuous-time" models. In these models, all that matters is branchlength, not the number of cladogenesis events, so a specimen tree can be used without problems (or at least without major problems).

In contrast, in biogeographical models of geographic range evolution (Lagrange DEC, BioGeoBEARS DEC, DEC+J, etc.), the "character" (geographic range) changes according to an anagenetic model along branches, and changes according to cladogenetic model at splitting events (see the Figure at: http://phylo.wikidot.com/biogeobears#BioGeoBEARS_supermodel_graphic for a summary of the processes used by different models). Thus, it is problematic to introduce fake cladogenesis events that just represent the divergence of specimens, rather than the divergence of monophyletic populations/species.

The solution, if you have a specimen tree, is to collapse it to a species or population tree. Then, in any case where specimens from a particular population/species occupy multiple areas, the relevant tip is coded as living in e.g. 2 areas.

Code for collapsing a specimen tree to a species tree, given a specimen phylogeny, and an Excel table specifying which specimens go into which species/population OTUs, is found in the function prune_specimens_to_species.


Reply all
Reply to author
Forward
0 new messages