perhaps reducing the data size is a practical solution. You can sample 1M sites, say, for the same partition.
The number of partitions is important, but the number of sites (if you are analyzing all sites in one partition) is not important to the posterior time estimates. when you reach 1kb or 10kb, having more sites won't really help.
here are some papers characterizing the uncertainties in the posterior time estimates:
dos Reis M, Yang Z. 2013. The unbearable uncertainty of Bayesian divergence time estimation. J Syst Evol 51:30-43.
Zhu T, dos Reis M, Yang Z. 2015. Characterization of the uncertainty of divergence time estimation under relaxed molecular clock models using multiple loci. Syst Biol 64:267-280.
if you want to struggle with the large number of sites, i used raxml to get branch lengths, and then use baseml to calculate the hessian matrix. you can use in.baseml to specify parameter values or initial values for baseml.
ziheng