To add another data point — scaling helps in both modeling
and computational. As Andrew notes scaling is essentially
identifying the natural units of the problem which makes it
much easier to define weakly informative priors (a weakly
informative prior essentially identifies the expected order of
magnitude of a parameter — if the parameters are all
scaled appropriately than these are all O(1) ). Additionally,
when all of the parameters are scaled the posterior becomes
more or less isotropic which makes it makes any computational
algorithm much better conditioned. For example, in HMC this
means a much more stable and efficient integrator as the cost
of the leapfrog integrator goes as largest_scale / smallest_scale
which is minimized when largest_scale ~ smallest_scale.
Moreover, floating point arithmetic is essentially the most accurate
when the parameters are all O(1), further avoiding computational
problems.