Thanks Remco, that's really helpful.
I followed the guide at
beast2.org/wiki/index.php/Performance_Suggestions but there's no mention there of adding 'useThreads=true' to the id=likelihood element.
Have added this now, and am running with all 8 threads (was actually using SSE, but simplified my explanation a little). The threads are now "flatlining" at 70-80%, while before they were oscillating wildly, so it's definitely had an effect.
Still not getting much of a speedup though. Only 10% faster than with one thread. Suggests to me the bottleneck is elsewhere? The tree topology is fixed by the way, and the clock model is the RLC.
Cheers