Hi,
the reason why we have different approaches like bPTP and GYMC is that they may give different results. Neither one identifies species per se (keep in mind that a species is primarily a systematic concept rather than an absolute biological entity), they only give you a (quite robust) hypothesis how it could be. It can be 2 species or 4, whatever makes more sense given further lines of evidence.
For the test example, both obviously underestimate the number of species: chimps, bonobos and human are (no matter which of the c. two dozen species concepts we apply) clearly different species.
For your data, if you want to test the number-of-species-hypothesis, you would apply both approaches as well to see wether they differ in the result, their estimates. If not, all's fine, and you and take the algorithmical result as basis for your discussion/refinement of the species concept. If they do, the results give you something of an possible min-max range within their models and the primary discrimination capacity of your molecular data.
In your case (looking at the names of the leaves in the 28S and COI trees), the latter is the tricky bit.
Just looking at the trees, it's very straightforward to define taxa. Take your 28S tree for example.
It's obvious from the topology and branch-lengths distribution that you have 5 clades of similar quality regarding intraclade coherence and interclade distinction:
And that the genus concept doesn't fulfil the criterion of cladistic classification. To make the genera "reciprocally monophyletic" (in a molecular-phylogenetic sense), reniformis needs to be moved from Rotylenchulus to Hoplolaimus.
The question for your data may be, how many species have the two genera? Some of the major clades include only one species, others several but quite similar ones genetically. A likely result for bPTP or GYMC would hence be at least 5, possibly up to 8/9 species: Clade 2 has a higher intra-clade diversity than the other clades. Also within Clade 1 you see a deep split, the pararobustus are genetically clearly distinct from the other species of Clade 1. Clade 1 is also the least-distinct one: it has the shortest root branch and the first-diverging tip in the pararobustus clade is substantially different from the rest; this is a situation where bPTP and GMYC may decide on different numbers using different approaches to estimate the number of species.
Looking at the names in each of the five main clades, we would expect that any species discrimination algorithm used on the 28S data will underestimate the number of species in Clade 1. Let's say bPTP and GMYC give you 7 species standing against the 14 annotated in the tree (I suppose it's the classic morphotaxa). What does this result tell us?
- The number of morphologically distinguished species is too high. Note how poorly some of them group, so reducing the number makes sense. If dubius is nearly-identical to some seinhorsti, but the latter's intra-species diversity is higher, there's molecular-wise little reason to keep it as a different species. Especially also, if you want to apply a cladistic classification (which many want and expect, I don't): only clades with high support may be named. Following the bPTP/GMYC-result, one could drop indicus, dubius, columbus and seinhorstii for whatever of these species epithets has priority.
- The resolution of the used gene region it simply not high enough to discern even good species. Hence, bPTP/GMYC underestimate the number of species.
Likewise you may get some 8+ species for the COI data, and if you use the combined tree (28S+COI), you may end up with a number close to the number of morphotaxa (species as labelled).
As I said, it's just concepts. But algorithms like bPTP/GMYC can help us to objectivise (within limitations) species and also help us to assign unnamed tips to a species, e.g. they may give us as result that all Hoplolaimus sp. individuals in Clade 2 are same species as H. stephanus and that the KY849910 individual may be mislabelled.
Or an argument to drop poorly described morphotaxa (it's never a good sign, when taxonomists called an invertebrate species the dubious one).
But in a cases like this, where a systematic concept already exists, you need to further discuss and argument to erect or drop species when the pPTP and GMYC don't match the (phenotypic) tip labels or give a lower number. If higher, it's much easier to explain (especially in a group like the nematodes): pseudocryptic or cryptic speciation.
Cheers, Guido.
PS Beyond the species question, I see a different issue with your data: the Bayesian trees indicate a deep incongruence. I.e. you deal with conflicting nuclear (28S = nuclear-encoded 25S rDNA?) – mitochondrial (COI = cox1?) genealogies, incomplete sorting during the early diversification (incongruence towards the leaves can probably be explained by the usual population dynamics). The two subclades of 28S-Clade 1 are not part of the same cox1 lineage. It's not a dramatic incongruence, it may be even a signal artefact in the cox1 data. PP = 0.54 is the Bayesian chain saying, "I just randomly placed this subtree". Check the bootstrap support, a branch with low PP but higher BS or if the BS prefer a competing alternative this indicates the Bayesian chain got trapped in a suboptimum. If both PP and BS are low for all topological alternatives found in the Bayesian sampled topologies and bootstrap pseudoreplicate trees, the data has little patterns to make a call.
