The obvious approach would seem to be to use whatever registers the
machine provides and work a register at a time in the local assembly
language. This is non-portable, but given the ease of propagating
carry bits, it's probably the speediest. I could imagine there's
some clever scheme that might take advantage of the local FPU, but I
can't seem to think of one. Any suggestions?
Is there advantage to be taken of the fact that some of the
multiplication is really squaring?
Sorry if this is FAQ.
--
pilchuck!jimo@phred Jim Osborn, Physio Control Corp
11811 Willows Rd, Redmond, WA, 98073
206-867-4704 direct to my desk