Hi Jeong,
this sounds that you're probably too close to the speciation horizont to rely solely on phylogenetic tree inference.
3rd codon saturation in mitochondrial genes: To
test for the topological effect of potential saturation just re-run the inference excluding the third-codon
position and, as cross-check, run an analysis only on the 3rd
codon position. Tangle the two trees (e.g. with Dendroscope) or compare
them using weighted Robinson-Foulds distances: if their topologies are
very different, the signal in either the 1st and 2nd or the 3rd codon
position is severely biased. In plants, for instance, mitochondrial genes are utterly useless at the intra-family level: they mostly show mutations at the 3rd codon position, but even if a test for saturation fails, they are not phylogenetically sorted along the species/mothers tree at all.
To make a call, the ITS2 data can be helpful.
The ITS2 underlies fundamentally different evolutionary constraints than mitochondrial genes. There are two antagonistic processed shaping its sequences:
-
Being a part of the nuclear-encoded 35S (or 45S) rDNA cistron arrays, they are inherited from both parents. Since these arrays comprise extremely conserved sequence bits (and hundreds if not thousands of them), they are prone to crossing over, so a F1-heterozygote can pass on its paternal, maternal but also recombinant nrDNA arrays. In plants that lack selective fertilisation barriers, especially the wind-pollinated ones, this leads to puzzling intragenomic variation. If you do a direct classic Sange sequencing or NGS like genome skimming you end up with a lot of ambiguous base calls in what is effectively a ITS(2) consensus sequence.
- On the other hand concerted evolution homogenises the arrays, and across the genomes; and this process adds to inbreeding homogenisation effects due to selective fertilisation (e.g. by active choosing of the sex partner, e.g. in birds due to particular courting behaviour). I have not looked at a lot of animal ITS data sets, but this seems to be more or less the rule in most animals. Your ITS(2) sequences are free of ambiguous bases.
ITS2 indels: single-nt insertions are quite common and sometimes really specific. If the rest of the alignment looks like this (few SNPs, no prominent length-polymorphism), you can just run a haplotype network treated gaps as 5th base to get an idea about the ITS2 differentiation and main genotypes/ genetic lineages.
cox1 diversity > ITS2: This seems to be counter-intuitive because the one is a coding gene but the other isn't. But 3rd codons in mitochondrial genes are oddly divergent and easily saturated, and ITS2 can be oddly low-divergent (in contrast to ITS1, it's not a intergenic spacer but evolved from a variable stem-loop region in the original large subunit ribosomal RNA gene). Third-codon position saturation in mitochondrial genes like cox1
indeed inflate branch-lengths, eventually leading to LBA. On the other hand, your ITS2 tree has very short branches. While being a nuisance for tree inference (too little signal), it's an asset to cross-check the cox1 data:
- If the 1st+2nd codon-based cox1 tree congruent with the ITS2 tree (or differentiation pattern), the 3rd codon position is biased by saturation effects.
- If the 3rd codon-based cox1 tree is congruent with the ITS2 tree
(or differentiation pattern), the 1st+2nd codon position do not have enough signal to resolve these deep relationships.
I wrote "tree (or differentiation pattern)" because of the short branches and very flat terminal subtrees in the ITS2 tree(s): the signal in the ITS2 may not have the amplitude needed for probabilistic tree inference, this looks like parsimony haplotype network level differentiation.
Where to go from here? I would first run
a haplotype network on the ITS2 data (median-network, reduced median or statistical parsimony) to identify the main ITS2 genotypes in your data sample. Then select one placeholder per main ITS2 genotype, only individuals for which you also have cox1 data, to put together a (much) reduced but representative tip set with no missing data for either marker. Maybe try the same with 1st and 2nd codon position of cox1.
Use that set to run the comparative tree inferences: 1st+2nd codon cox1 vs 3rd codon cox1 vs ITS2. By tangling the resultant trees against each other pairwise (Dendroscope) or overlaying them (strict consensus network) and/or establishing the weighted Robinson-Foulds distances, do an all vs. all AU-test (or SH-test, implemented in classic RAxML), you can assess whether there is any significant topological conflict in the used data.
If there's no significant conflict, you just combine all data to run an all-inclusive tree. If product of the same species tree, the low-diverged partitions will sort out the deep splits, and the 3rd codon cox1 resolve the tips. The branch-length and branching pattern will tell you how good of a species the ROME populations are (the shared ITS2 genotype may just be ancestral-shared, i.e. a genetic "symplesiomorphy").
If only two fit (ITS2 + cox1 partition), drop the one (the other cox1 partition) that doesn't, for the combined tree (but keep e.g. a tanglegram showing the conflict as supplement). If you can't decide, just discuss the two options. Regarding species identification, it hardly matters where the species is placed in either tree, as long as it is cleary distinct, forming it's own exclusive subtree in both of them.
If the intra-clade divergence is too low for ML tree-inference of the whole tip set in e.g. co-informative 1st+2nd cox1 and ITS2, you just take the tip-reduced tree as your phylogenetic backbone (if their differentiation is too low, you won't find a significant conflict: the tree may look different but the cox1 3rd codon topology would not be rejected on either 1st-2nd or ITS2 data matrix). The cox1 1st+2nd codon pos. and ITS2 networks can be used to identify the major genotypes of all samples; and the 3rd codon (assuming it's the most divergent partition) to diagnose the actual species (e.g. via PTP, GMYC, NHSC etc.)
One could even just visually map them on the all-tip cox1 3rd position tree, if it reasonably agrees with the (combined) phylogenetic backbone tree. Which directly gives you an idea about ILS (ITS2 genotypes shared across non-sister cox1 lineages). Here's an example from our plant research: low-divergent ITS (too young speciation) mapped on the plastid (maternal) tree (insect-pollinated, speciose and currently niching genus of southern Africa).
The stars give the tips still showing the genus-ancestral ITS genotype, the "1 CU, 1 Sh" (CU = plastid clade-unique: genetic "synapomorphies"; Sh = shared across plastid clades: typically genetic "homoiologies": convergently evolved but representing a parallelism rather than random homoplasy) at branches (plastid) lineage-conserved ITS mutations.
While an ITS tree would look superfically very different from the plastid tree, by mapping the notably few ITS mutation patterns on the plastid tree, we can see it can all be explained by low-divergent data incompletely sorted on essentially the same evolutionary tree (it's plants, so there will be reticulation towards the tips, too, but one would need a much more informative nuclear marker to test this).