TL;dr: one long branch in my tree bounces between a nested position with high clock rate, or at the root with lower clock rate. Does this reflect poor model specification or the reality of parameter space? Should I choose to trust only one island in parameter space (based on external info) or accept that Bayesian analysis is telling me not to put my eggs in a single basket?
My case: I have 25 ingroup taxa, for which I've sampled 41 genes (with plenty of informative sites). One of these taxa (TAH_41 below) lies on an unusually long branch subtended by several short branches based. This is inferred from its placement in ML time-reversible (w/ or w/o outgroup) and time irreversible models, as well as summary coalescent (ASTRAL) trees, where it is always placed nested among other taxa. Notably, the same taxa are always early to diverge in the ingroup (07, 08, 42), albeit with slightly different ingroup-root positions (below, ML tree w/ outgroup).

Beyond just getting a Bayesian perspective, my intent was to use BEAST2 to get a clock-based estimate of the root position. However, depending on the optimized-relaxed clock (ORC) rate assigned to the branch for TAH_41, it moves around in the tree, between where it lies in the ML/coalescent topology (with a high branch clock rate) and at the base (with a lower branch clock rate), resulting in low posterior support for the branches in between (figures below: branch labels are median rate, node labels are clade probability).
In contrast, and critically, when I run the same ORC model without TAH_41, posterior support for the ML topology goes up considerably.
Notably, with a strict clock, TAH_41 is always at the base with 100% posterior for all branches. On the other hand, when I force it to nest within the ingroup (monophyly excluding 07, 08, 42), the TAH_41 branch has the highest rate in the tree (1.57). So it appears to me that where TAH_41 is placed (I hesitate to say, 'whether it is placed correctly'), depends strongly on the rate its branch is assigned.
Interestingly, starbeast3 returns the same ML topology (albeit with apparently poor mixing, despite 4 chains with mc3), but with an altogether different rooting and poor posterior support for branches near the root. I did, however, note in the starbeast3 runs that model posterior and tree height had an inverse relationship, which led me to wonder if there was a trap in these data where the topology, branch rates, and tree length were pulling each other in different directions.
I see somewhat the same relationship of posterior and tree height with the ORC model (below), though not as strict, and it is absent when TAH_41 is omitted from the data (not shown).
For this reason, I tried placing an exponential prior on the root (MCRA for all samples), and while it constrained tree height as expected, it did not change the underlying relationship between posterior probability and tree height, or the quasi-bimodal inference of rate and placement of TAH_41.
I got similarly disappointing results after broadening the distribution for the sigma parameter for the lognormal prior for ORC clock rate (increasing 'alpha' for the gamma distribution prior for sigma): the estimated ORC LN sigma shifts up a bit, but the bimodality of placement and rate for TAH_41 remains, suggesting the issue is not the model accommodating variation among branches adequately, but rather the extremity of that single branch's rate relative to the support for its placement.
So, my ultimate goal was (is) to get a clock-based estimate of root placement and infer gene trees under the species coalescent that I could use for testing positive selection, but I seem to have opened a can of worms. And, I will add, the root placement and internal topology are not trivial, since the ancestral reconstruction of photosensory environment which I am studying depends on them. Without TAH_41, the ORC-based root seems pretty clear, and I could enforce some clades with TAH_41 to infer gene trees under 'my preferred' hypothesis, but I'd much rather construct a model with all samples and no clade constraints that I trusted was not getting bounced between two regions of parameter space for which I do not trust the relative optimality. Apart from enforcing alternate clades and running nested sampling or similar to test which model should be 'preferred', are there other ways to investigate the apparent bimodality and size/optimality of those islands in parameter space? TIA!