bPTP and GMYC

271 views
Skip to first unread message

Emmanuel Olajide

unread,
Sep 1, 2022, 10:44:06 AM9/1/22
to PTP and GMYC species delimitation
Dear all,

I am new to specie delimitation. I was trying the exercise with the apes data on beast website. bPTP gave me 4 different species but my interpretation of GYMC is I have two species. I want to understand how it works before working with my data. 

Can someone advise and guide me;
I attached the output from GYMC and bPTP

I also attached the phylogenetic tree of the 28S and COI of my data, can I go ahead and run GYMC and bPTP. I will appreciate any advise and suggestion I could get.

Thank you so much. 
apes_figtree
GMYC output.png
bPTP output.PNG
28S.jpg
COI_final.jpg

Das Grimm

unread,
Sep 2, 2022, 8:41:14 AM9/2/22
to PTP and GMYC species delimitation
Hi,

the reason why we have different approaches like bPTP and GYMC is that they may give different results. Neither one identifies species per se (keep in mind that a species is primarily a systematic concept rather than an absolute biological entity), they only give you a (quite robust) hypothesis how it could be. It can be 2 species or 4, whatever makes more sense given further lines of evidence.

For the test example, both obviously underestimate the number of species: chimps, bonobos and human are (no matter which of the c. two dozen species concepts we apply) clearly different species.

For your data, if you want to test the number-of-species-hypothesis, you would apply both approaches as well to see wether they differ in the result, their estimates. If not, all's fine, and you and take the algorithmical result as basis for your discussion/refinement of the species concept. If they do, the results give you something of an possible min-max range within their models and the primary discrimination capacity of your molecular data.

In your case (looking at the names of the leaves in the 28S and COI trees), the latter is the tricky bit.

Just looking at the trees, it's very straightforward to define taxa. Take your 28S tree for example.

It's obvious from the topology and branch-lengths distribution that you have 5 clades of similar quality regarding intraclade coherence and interclade distinction:

5Clades.png

And that the genus concept doesn't fulfil the criterion of cladistic classification. To make the genera "reciprocally monophyletic" (in a molecular-phylogenetic sense), reniformis needs to be moved from Rotylenchulus to Hoplolaimus.

The question for your data may be, how many species have the two genera? Some of the major clades include only one species, others several but quite similar ones genetically. A likely result for bPTP or GYMC would hence be at least 5, possibly up to 8/9 species: Clade 2 has a higher intra-clade diversity than the other clades. Also within Clade 1 you see a deep split, the pararobustus are genetically clearly distinct from the other species of Clade 1. Clade 1 is also the least-distinct one: it has the shortest root branch and the first-diverging tip in the pararobustus clade is substantially different from the rest; this is a situation where bPTP and GMYC may decide on different numbers using different approaches to estimate the number of species.

Looking at the names in each of the five main clades, we would expect that any species discrimination algorithm used on the 28S data will underestimate the number of species in Clade 1. Let's say bPTP and GMYC give you 7 species standing against the 14 annotated in the tree (I suppose it's the classic morphotaxa). What does this result tell us?
  1. The number of morphologically distinguished species is too high. Note how poorly some of them group, so reducing the number makes sense. If dubius is nearly-identical to some seinhorsti, but the latter's intra-species diversity is higher, there's molecular-wise little reason to keep it as a different species. Especially also, if you want to apply a cladistic classification (which many want and expect, I don't): only clades with high support may be named. Following the bPTP/GMYC-result, one could drop indicus, dubius, columbus and seinhorstii for whatever of these species epithets has priority.
  2. The resolution of the used gene region it simply not high enough to discern even good species. Hence, bPTP/GMYC underestimate the number of species.
Likewise you may get some 8+ species for the COI data, and if you use the combined tree (28S+COI), you may end up with a number close to the number of morphotaxa (species as labelled).

As I said, it's just concepts. But algorithms like bPTP/GMYC can help us to objectivise (within limitations) species and also help us to assign unnamed tips to a species, e.g. they may give us as result that all Hoplolaimus sp. individuals in Clade 2 are same species as H. stephanus and that the KY849910 individual may be mislabelled.
Or an argument to drop poorly described morphotaxa (it's never a good sign, when taxonomists called an invertebrate species the dubious one).

But in a cases like this, where a systematic concept already exists, you need to further discuss and argument to erect or drop species when the pPTP and GMYC don't match the (phenotypic) tip labels or give a lower number. If higher, it's much easier to explain (especially in a group like the nematodes): pseudocryptic or cryptic speciation.

Cheers, Guido.

PS Beyond the species question, I see a different issue with your data: the Bayesian trees indicate a deep incongruence. I.e. you deal with conflicting nuclear (28S = nuclear-encoded 25S rDNA?) – mitochondrial (COI = cox1?) genealogies, incomplete sorting during the early diversification (incongruence towards the leaves can probably be explained by the usual population dynamics). The two subclades of 28S-Clade 1 are not part of the same cox1 lineage. It's not a dramatic incongruence, it may be even a signal artefact in the cox1 data. PP = 0.54 is the Bayesian chain saying, "I just randomly placed this subtree". Check the bootstrap support, a branch with low PP but higher BS or if the BS prefer a competing alternative this indicates the Bayesian chain got trapped in a suboptimum. If both PP and BS are low for all topological alternatives found in the Bayesian sampled topologies and bootstrap pseudoreplicate trees, the data has little patterns to make a call.

Tanglegram.png

Emmanuel Olajide

unread,
Sep 4, 2022, 4:20:04 PM9/4/22
to PTP and GMYC species delimitation

Hi Guido,

 

Thank you so much for taking the time to check my data and the amazing feedback, I am taking my time to read and understand your response.

 

Now I know that the specie delimitation approaches like the bBTP, GYMC and the others, so the results of these algorithms should be used with most caution. For example, both D2-D3 28S Hoplolaimus seinhorsti in bold was almost given a new name as a new specie due to one of the morphological features which is not very clear. But on checking the slides again, the nematode is very much similar to Hoplolaimus seinhorsti.


 28S__.png


For my data, the interest is in the Hoplolaimus species, the other general were included because there were no enough Hoplolaimus sequences on GenBank. When I will be feeding my data in the Bptp and GYMC algorithm, I will feed them with the Hoplolaimus species sequences alone.

 

For the 28S, I can say from the topology and branch-lengths distribution that I have 2 clades of similar quality regarding intraclade coherence and interclade distinction. (Thank you)

 

To make the genera "reciprocally monophyletic" (in a molecular-phylogenetic sense), reniformis needs to be moved from Rotylenchulus to Hoplolaimus.

 

I think I understand what you mean. Rotylenchus and Hoplolaimus, these two general are very different based on morphology……….we don’t have much molecular data on NCBI for these genera

 

The question for your data may be, how many species have the two genera? Some of the major…………………. Everything you said here is clear, thank you. Now I understand better.

The number of morphologically distinguished species is too high………

I will check those sequences on NCBI to see if they have morphology and morphometrics data  for dubius, seinhorsti, indicus and columbus to see how they gave the names…………reducing the numbers will make sense.

 

For the resolution of the used gene region…..

I have tree for 18S and ITS as well (I attached, my sequences are in bold).

ITS_18S.JPG

Ps: For the ITS, initially Hoplolaimus papuaensis n. sp. (now Hoplolaimus sinhorsti).

 

 Thank you for the additional information, the issue with my data. No agreement between 28S and COI genealogies.

I will read and reread your response over and over so as not to miss out on anything you have said.

 

PS: My sequences are the sequence on Bold, Hoplolaimus pararobustus and Hoplolaimus seinhorsti, I aligned and make the tree based on the available Hoplolaimus sequences on GenBank.

 Within my COI Hoplolaimus pararobustus in (bold), intraspecific variation is large, up to 24 nucleotide different, and these are the same population I use for 28S. The same DNA………..

 

Thank you,
Fig 5_ITS.jpg
Fig 7 18S.jpg

Das Grimm

unread,
Sep 5, 2022, 6:31:55 AM9/5/22
to PTP and GMYC species delimitation
Hi,

by comparing the ribosomal trees you can directly assess the capacity of the genes to recognise species. No matter which organism, it's always like this:

  • 18S is the most conserved bit, it gives you good deep signal but usually struggles at the leaves (most sites are not free to evolve, so you have mutation biases the closer you get to the leaf)
  • ITS is the most variable bit (ITS1 and ITS2 in higher organism, if I remeber correctly nematodes don't have a 5.8S rRNA gene yet), it's often a litmus test for species, especially if they already sort to some degree in the rRNA gene trees (which they do in your case)
  • 25S is in between (it's core structure bits are as conserved as 18S but it has serveral variable intersections, which can be as divergent as the noncoding but transcribed ITS)
Pending the bPTP and GMYC estimates, one would right now have little argument to erect papuaensis as a new species, one would rather fuse it into the same species as columbus and seinhorstii, as they don't sort in the ITS tree but still represent near-identical tips (short terminal branches) of a well-distinct unambiguously support across the entire 35S (45S) rDNA cistron (18S-ITS-25S).

If columbus and seinhortii would be morphologically easy to discern and papuaensis, too, one could just argue the used markers are not variable enough to discern this group of closely related species. If this is the case, or if the new species is geographically very apart from the other sampled ones, and bPTP/GMYC fail to recover it as new putative species, you may try for median-networks and alike, to check whether there are unique alignment patterns that could be diagnostic at the species level, nonetheless.

If you have a good individual overlap, you can concatenate the 18S, ITS and 25S data. But since you are going to apply bPTP and GMYC and looking for species boundaries, only concanate data from the same individual/population, not across the labelled species/geographic provenances.

Good species hunting, Guido


Emmanuel Olajide

unread,
Sep 7, 2022, 5:37:39 AM9/7/22
to PTP and GMYC species delimitation
Hi Guido,

Thnak you so much for the explanation and your input. Such ana amazing community. 
I have the result for the bPTP, I am running GMYC locally in R. Having some issues, I should have the result before the end of the week.
I will share with you the result.

Thank you.

Reply all
Reply to author
Forward
0 new messages