Hi Peter,
Your intuition is correct, M optimizes its model of the entire multi-particle system which includes (for tilt series)...
- particle poses (optionally resolved over time)
- deformations (image/volume warp)
- electron optical params (defocus + higher order ctf stuff)
- stage tilt angles
- probably more things I'm forgetting right now
Rigid body 2D shifts of tilt images are captured by a 1x1 image space warping as you suggest - I recommend reading the Warp and M papers carefully, especially the methods sections, they're a goldmine and very clear.
You can find some discussion started by Euan Pyle who works at EMBL on the topic of improving tomograms in this thread
Cheers,
Alister