Hi Zhu,
My experience regarding Warp vs Aretomo reconstruction readability is similar to yours. Not sure if you've tried inverting the warp reconstruction contrast in 3dmod, but that helps me see better. That being said, the approaches that Warp and Aretomo use to generate the tomograms and correct CTF are different, and I think this difference in human readability is to be expected:
I think that the image plane shift is probably a result of the tilt-axis not being at 90 degrees (most krios seem to be around ~85-86). My experience is that most software, including previous versions of AreTomo if I remember correctly, have that plane shift, and thus I'm not sure how correctable it is.
Regarding noise2noise, does 10Å/pix correspond to your desired bin8? Also, while by eye the difference might not be big, it's possible that a computer likes the data much more. For template matching in Warp, make sure the template particle is scaled correctly to the pixel size and test different box sizes suitable for your particle. When I first tried template matching I accidentally scaled the particle way too small and was very confused with all the false positives! If you have time, I also think it's a good idea to compare template matching results with your Aretomo3 reconstruction along with different denoising pipelines like cryocare-IsoNet.
Adding onto this, the Noise2noise might also be cleaning up the data in ways that are more computer-friendly than human readable. Also, does 10Å/pix correspond with your bin8? If you are still having problems with the template matching have you tried other denoising pipelines like cryocare-isonet or deepdewedge?
Sorry if this wasn't a satisfying answer - information on cryo-ET processing can still be opaque and not easy to find.
Good luck,
Daniel