Dunno about others, but I had experimentally implemented it as:
BCDADC/BCDSBB instructions, which add a group of 16 BCD digits in 64
bits, using SR.T as a CarryIn/CarryOut flag.
For each digit, IIRC:
First do a 4-bit add with a carry in/out;
If result was > 9 (res[3]&((res[2]|res[1]))),
Subtract 10 from result, and set carry.
Then chain 16 of these units together to form a full-width BCD adder.
The BCDSBB logic was similar, just adding a lookup table on the input
side to complement the value.
Say:
If SBB=0, RHS digit passed in as-is;
If SBB=1, RHS digit is complemented, carry-in is inverted.
For multiply, IIRC (in software):
Could use shift-and-add, just using a BCD adder.
For divide, IIRC:
Could use shift-and-subtract, just using a BCD subtract.
Multiplying/dividing by a power of 10 can be done with a plain shift.
I have a vague memory of naive shift-add happening to still work with
BCD, provided the BCD operations were still used for the main ADD/SUB
part (rather than needing to resort to a more complex long-division
algorithm). I suspect that the factor is that each multiple of 4 bits
happens to map exactly to a power of 10, and each group of 4 bits
doesn't go outside of the allowed range.
Not sure of a full hardware multiplier, I didn't implement one for BCD.
There was the nifty trick that one could do fastish binary to decimal:
Rotate-with-carry-left the binary value by 1 bit;
Do a BCDADC with the value of itself (adds the bit from the binary value);
Repeat for each bit of the input value.
For signed values, had used a wonky/modified version of 9s complement.
Top digit:
0..7: Positive
8/9: Negative
But, otherwise the same.
This variant mostly added the property that normal integer comparisons
could be used on the values, and would still give the same relative
ordering.
IIRC, there were also some ops to pack/unpack DPD values (DPD being
wonky, but not too difficult for hardware to pack/unpack).
I think my thinking was that with BCD, this minimized the amount of
"additional" logic needed to support BCD (could add BCDADC/BCDSBB, and
pretty much everything else could still leverage normal integer
instructions).
But, practically, not a whole lot of use-case for BCD in the stuff I am
doing...
Say, I am not imagining embedded/DSP style use-cases are going to have
much use for BCD or decimal arithmetic.
I guess one other possible scheme could be do add a version of decimal
math where, say, each group of 10 bits is understood as a value between
000 and 999, with each 64-bit register understood to hold 18 or 19
decimal digits (top 4 bits being special).
Multiplying/dividing would be a bit more complicated (I don't imagine
shift/add or shift/subtract will give the desired results). Would likely
need to use a form of (mod 1000) long multiply and long division.
To make this effective, the hardware support would need to be a bit more
involved, with special instructions needed to help facilitate multiply
and similar, etc. Vs just being able to get along entirely with ADD/SUB
helpers.
There is also the 30-bit 000000000..999999999 scheme, but this makes
more sense for a plain software implementation (can mostly leverage
normal integer operations without too much overhead). Would be less
viable for direct hardware support.
...