Janko,
Your process looks good.
I don't think it is an issue with spice kernels, as those will result in much bigger problems.
When you solve for jitter, you use as a constraint the DEM that you just made (after alignment), which has jitter relative to MOLA, so then the jitter cannot go away. I think that may be the source of the problem.
Then, you did not specify --heights-from-dem-uncertainty, which then defaults to 10.0, which may be a little too tight for CTX. Maybe a value such as 50 will work better (the CTX doc says 20, which should be fine too I think). This is a minor thing here though.
When solving for jitter the DEM to constrain against should perhaps another independent DEM that is consistent with MOLA. One could perhaps in-fill MOLA for that, but MOLA is way too sparse.
My suggestion would be to first try another stereo pair for that area (there are plenty for CTX). One should ensure the stereo convergence angle is about 20-40 degrees, and the images have similar illumination.
Perhaps, if you do another stereo pair, and that DEM can be aligned to MOLA as well, then those two DEMs you have could be merged and one could see how this one differs against MOLA, and if solving for jitter with the merged DEM gives a better result.
In principle, one could solve for jitter jointly from 4 images. One could use two dense match files (from each stereo pair) and sparse matches for the rest of pairs, as produced by bundle_adjust. These would need to have the same naming convention.
So, likely jitter cannot be made to go away altogether, but with more images and a better DEM constraint one could get improved results.
Happy to hear how this works out for you.
Oleg