What you call "easy mode" is more an exercise in advanced calculus but sure let's give it a go!
The hardest thing with macro, dSLR or otherwise is the parallax distortion. In your scenario you're physically moving so freaking far by the optic's perspective. Take that burlap shot. Without ICE and complicated math, if you shot a single thread over the top of a bundle of threads dead-centered, but then moved the printer head a few millimeters off in any direction; assuming for the moment we're keeping X/Y and Z is changing which isn't reality, so now that thread looks like it's floating way off to the left right top bottom of said bundle. If you reversed the rig, embedded the camera in a stationary arm, and moved a platen/table that held the object being photographed, you'd still have the same issue.
This is why ICE can do it but hugin cannot. ICE used the magic of SeaDragon aka Photosynth. Microsoft Live went offline what, 4 years ago now? And some of it can work independently of the Live/MSN site, but you'll need to find a safe clean d/l of photosynth (it'll likely be from 2014 and x32) But THAT is how your ICE deep zoom panos work, they're not real, they're synthetic, synthesized content. Same way ICE can fabricate missing sections of panoramas instead of cropping them, it just invents it! It analysed the images and made up what it would have looked like, same way Photosynth made freaking 3D images from multiple 2D photo planes at different angles. It's freaking incredible technology, and is back again in a for-profit app and apparently spies on you too? (what doesn't in 2020?) with Microsoft Pix & Hyperlapse Pro (ICE is back too, but now it's Interactive Composite Editor)
If, however you mounted the microscope on a 6-axis gimbal, and programmed it to pivot around the no-parallax point of the lens, hugin would definitely be your macrogigapanorama tool! You could pan in fraction-of-a-millimeter steps, at whatever the most repeatable step size of your steppers was for the camera weight, and be 100% repeatable, and yes, THEN, in that scenario, automating the multirow would be as simple as knowing the overlap of the 480x720, you could just increment the Tpy (translation parameter yaw) (720-overlap) and likely get some subpixel accuracy.
But, no. What we have here is a comparison of apples to oranges. Plastic oranges. Although very realistic looking ones, granted!