Unfortunately, MCMCtree can only run at the moment as a single-core job and, as you have already mentioned, it cannot be parallelised -- had missed the part where you mentioned you had tried the approx. lnL, sorry! Increasing the RAM will not speed things up either with MCMCtree, although you may need to increase that when running CODEML/BASEML for gradient and Hessian calculation with large datasets.
As an alternative, you may want to use the Bayesian Sequential-Subtree (BSS) approach, which combines the usage of the approximate likelihood with a "backbone tree + subtrees" method under a Bayesian framework to speed up clock-dating analyses with MCMCtree. You can find a step-by-step tutorial on
my GitHub repository, which complements
our paper. While we applied the method to infer a mammal evolutionary timeline, the method can be used with other datasets :)
Nevertheless, not sure what you mean by <split the data into multiple partitions as a way to "parallelize" things>. If you partition your molecular alignment (e.g., codon position schemes, slow- to fast-evolving genes, different data types, etc.) and include each partition as "alignment blocks" in a unique alignment file (i.e., `ndata = X` in the control file, where `X` is the number of alignment blocks in the alignment file you provide via option `seqfile`), MCMCtree will take longer to run. If you analyse each alignment block as a "separate data subset" and individually with MCMCtree, you will infer time estimates for each alignment block (or data subset) separately. MCMCtree will indeed be faster as you will have divided your main dataset into X data subsets, but you should not average the time estimates you obtain with the independent analyses as the results will correspond to analyses with different data subsets. E.g., say you divide your main molecular alignment into two data subsets by randomly dividing the sequences you have into two halves. You would then infer the divergence times for each data subset separately while fixing the same tree topology (unless you want to evaluate different tree hypotheses), which will result in estimated divergence times for the first data subset and then for the second data subset, but you should not average those. I just wanted to clarify this in case other PAML users read this thread of messages :)
Good luck with your inference analyses!
S.