Hi Sam,
by using the coalescent costant pop size tree prior, you are making the prior assumption that the population size of the organism that you are trying to model has been constant over time AND that the samples you have collected are from a panmictic population (i.e.
the same pop), which clearly is not true as you say that you have samples from three different species. While there are cases where it might be legitimate to use coalescent tree priors and, in my experience, the tree topology doesn't change much, I'd argue that you have to have a good reason to go ahead with such an approach. So really comes down to what questions you are trying to answer. If you are after some kind of phylogeographic analysis, this may be acceptable, but you could also consider a structured coalescent with each species as one deme. If you are interested in looking at the degree of separation and divergence time between these species, a speciation prior (Birth-Death) may be also an option. If you are interested in the demographic status of each of these species (although it doesn't sound that this is the case), then there are better approaches.
That being said, I'd also consider whether it is really necessary to link all your fragments into one partition as this may lead to inaccurate results since most likely these are actually independent loci. As I suspect you are interested in phylogenetic analysis you may want to consider using a species tree approach. This will probably take a very long time to converge, but you could consider dropping the fragments that only have 1 or 2 SNPs, since these contribute very little information.
Keep in mind that there may be interactions between the tree prior and your tree calibration (see Drummond and Suchard 2008) and it may be worthwhile to run an analysis sampling from the priors to make sure that the calibration reflects what you wanted. Also, in absence of heterochronus data, your calibration on the root and prior on the clock are the only source of information for the analysis. I'm not sure what you intend with very wide HPD, but your calibration on the node would allow the root to move up to ~6M years ago. Is this what you are seeing?
The prior you use should reflect the knowledge you have and most of the time a lognormal calibration on an internal node is appropriate because we usually think to know when something happened (higher prob mass around the mean), but we want to allow for some uncertainty, so I wouldn't change it to another prior (e.g. uniform) unless you have good reason to do so (e.g. you came across new information).
There is no fix/magic number for how long a MCMC should be, that changes for each dataset and type of analysis combination, but you certainly have a big dataset and I wouldn't be surprised if you need much longer runs to get convergence (from the plot you posted both runs are too short).
you don't say whether you are using BEAST1 or BEAST2, but both allow for tuning of operators. BEAST2 will spit out some suggestions to this end if it detects the need from some manual adjustments, but BEAST1 also reports an analysis of the operators that you can look at to decide whether some need adjustments. Most of the time the autotuning that BEAST has inbuilt does a pretty good job, so I would first worry about making sure you have a proper set up before playing around with the operators, but happy to give you some pointers in another email if you think you need to adjust them.
Good luck,
carlo