Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Double precision /MOD

168 views
Skip to first unread message

none albert

unread,
Mar 21, 2023, 7:22:19 AM3/21/23
to
Suppose we have a fast single precision /MOD , say unsigned.
Does that help with building a double precision /MOD ,
or must we resort to a shift and subtract algorithm?

/mod ( u u -- remainder quotient )
um/mod ( du u - remainder quotient )

Results are single precision unsigned.

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat spinning. - the Wise from Antrim -

minforth

unread,
Mar 21, 2023, 8:04:33 AM3/21/23
to
none albert schrieb am Dienstag, 21. März 2023 um 12:22:19 UTC+1:
> Suppose we have a fast single precision /MOD , say unsigned.
> Does that help with building a double precision /MOD ,
> or must we resort to a shift and subtract algorithm?
>
> /mod ( u u -- remainder quotient )
> um/mod ( du u - remainder quotient )
>
> Results are single precision unsigned.

Not quite clear what you intend. The maximum
UD/MOD ( udnom uddenom -- udrem udquot) ??
on a 32-bit or 64-bit system ??

or like
https://groups.google.com/g/comp.lang.forth/c/RlszeFOHIB4/m/ijpQHZ0huR0J

minforth

unread,
Mar 21, 2023, 10:01:20 AM3/21/23
to
P.S. after weeding out all reduction possibilities for special cases, there are
two other options than shift-and-subract:
- on certain CPUs use floating-point division with 32-bit systems when du has < 56 bits
- Newton-Raphson division is perhaps faster

dxforth

unread,
Mar 21, 2023, 10:35:33 AM3/21/23
to
I think he's asking how to make UM/MOD from U/MOD on the assumption the
latter is faster. Problem is they are essentially the same algorithm.

minforth

unread,
Mar 21, 2023, 11:07:46 AM3/21/23
to
Only if you do shift-and-subtract on both. But who would?

The question to decompose higher order divisions into lower order CPU-supported
efficient division primitives pops up now and then. I am also among those on the
lookout for simple answers. ;-)

b...@gmx.com

unread,
Mar 21, 2023, 12:13:31 PM3/21/23
to
Maybe Donald Knuth's Long Division Algorithm is helpful. Here is an examination about some implementations:
https://skanthak.homepage.t-online.de/division.html

Anton Ertl

unread,
Mar 21, 2023, 2:19:27 PM3/21/23
to
albert@cherry.(none) (albert) writes:
>Suppose we have a fast single precision /MOD , say unsigned.
>Does that help with building a double precision /MOD ,
>or must we resort to a shift and subtract algorithm?
>
>/mod ( u u -- remainder quotient )

The standard word /MOD works with signed numbers. Gforth calls this
word U/MOD.

>um/mod ( du u - remainder quotient )
>
>Results are single precision unsigned.

If you have a fast U/MOD, you get a faster UM/MOD by using U/MOD than
by using shift-and-subtract. But it's not straightforward. Gforth
copied a large chunk of code from libgcc for this.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2022: https://euro.theforth.net

minforth

unread,
Mar 21, 2023, 2:53:27 PM3/21/23
to
Nice overview by 'Skanthak'. But I am astonished at those complications he is
demonstrating there, and at those required code lengths.

I think Forth can do better. FWIW what I am using so far without trouble:

\ Non standard:
\ Primitive: _MU/MOD \ ( ud v - r udq ) mixed unsigned division (gforth: UD/MOD)
\ : PLUCK 2 pick ;

: D/REM \ ( du dv -- drem dquot ) MF double number signed symmetric division
pluck >r pluck over xor >r \ remember signs
dabs 2swap dabs 2swap dup IF
C mfUCell ul=mffth, uh=mfthd, vl=mfsec, vh=mftos; // copy Forth stack to C unsigned variables
C mfUCell q=0, m=1;
C if ((uh==vh)&&(ul==vl)) uh=ul=0, q=1;
C else { // shift and subtract
C while(vh==uh?vl<=ul:vh<=uh)
C vh=(vh<<1)|(vl>>63), vl<<=1, m<<=1;
C while(m>1) {
C vl=(vl>>1)|(vh<<63), vh>>=1, m>>=1;
C if (uh==vh?ul>=vl:uh>=vh) {
C uh-=vh+(vl>ul), ul-=vl, q|=m; } } }
C mffth=ul, mfthd=uh, mfsec=q, mftos=0;
ELSE drop _mu/mod 0 -rot THEN
r> 0< IF dnegate THEN \ apply signs
r> 0< IF 2swap dnegate 2swap THEN ;

Don't bother about this seemingly wild mix of Forth and C code. All I want to show here
is how compact it can be.

dxforth

unread,
Mar 21, 2023, 8:42:13 PM3/21/23
to
What pops up every now and again are C users asking why Intel didn't provide
double by double and single by single division instructions.

Anton Ertl

unread,
Mar 22, 2023, 5:51:54 AM3/22/23
to
minforth <minf...@arcor.de> writes:
>b...@gmx.com schrieb am Dienstag, 21. M=C3=A4rz 2023 um 17:13:31 UTC+1:
>> https://skanthak.homepage.t-online.de/division.html
>
>Nice overview by 'Skanthak'. But I am astonished at those complications he =
>is
>demonstrating there, and at those required code lengths.
>
>I think Forth can do better. FWIW what I am using so far without trouble:
>
>\ Non standard:
>\ Primitive: _MU/MOD \ ( ud v - r udq ) mixed unsigned division (gforth: U=
>D/MOD)
>\ : PLUCK 2 pick ;
>
>: D/REM \ ( du dv -- drem dquot ) MF double number signed symmetric divisi=
>on
> pluck >r pluck over xor >r \ remember signs
> dabs 2swap dabs 2swap dup IF=20
>C mfUCell ul=3Dmffth, uh=3Dmfthd, vl=3Dmfsec, vh=3Dmftos; // copy Forth=
> stack to C unsigned variables
>C mfUCell q=3D0, m=3D1;=20
>C if ((uh=3D=3Dvh)&&(ul=3D=3Dvl)) uh=3Dul=3D0, q=3D1;
>C else { // shift and subtract
>C while(vh=3D=3Duh?vl<=3Dul:vh<=3Duh)
>C vh=3D(vh<<1)|(vl>>63), vl<<=3D1, m<<=3D1;
>C while(m>1) {
>C vl=3D(vl>>1)|(vh<<63), vh>>=3D1, m>>=3D1;
>C if (uh=3D=3Dvh?ul>=3Dvl:uh>=3Dvh) {
>C uh-=3Dvh+(vl>ul), ul-=3Dvl, q|=3Dm; } } }=20
>C mffth=3Dul, mfthd=3Duh, mfsec=3Dq, mftos=3D0;
> ELSE drop _mu/mod 0 -rot THEN
> r> 0< IF dnegate THEN \ apply signs
> r> 0< IF 2swap dnegate 2swap THEN ;
>
>Don't bother about this seemingly wild mix of Forth and C code. All I want =
>to show here
>is how compact it can be.

Except that it does not solve the problem "none albert" asked for:
Implementing double/single->single with single/single->single, faster
than shift-and-subtract.

What you provide is an implementation of double/double->double that
uses shift-and-subtract in the general case and calls a pre-existing
implementation of double/single->double in Forth, without showing how
that is implemented. In Gforth the latter is implemented as

: ud/mod ( ud1 u2 -- urem udquot ) \ gforth
\G divide unsigned double @i{ud1} by @i{u2}, resulting in a unsigned double
\G quotient @i{udquot} and a single remainder @i{urem}.
over 0= if nip u/mod 0 exit then
dup >r u/mod r> swap >r um/mod r> ;

And this implementation uses UM/MOD, which is the word that "none
albert" has asked about.

So we have an onion of division words, starting from the core:

u/mod (called by) um/mod (called by) ud/mod (called by) d/rem

The first part can be skipped if the architecture directly supports
UM/MOD, or if it does not have division instructions.

none albert

unread,
Mar 22, 2023, 6:58:06 AM3/22/23
to
In article <2023Mar2...@mips.complang.tuwien.ac.at>,
Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>albert@cherry.(none) (albert) writes:
>>Suppose we have a fast single precision /MOD , say unsigned.
>>Does that help with building a double precision /MOD ,
>>or must we resort to a shift and subtract algorithm?
>>
>>/mod ( u u -- remainder quotient )
>
>The standard word /MOD works with signed numbers. Gforth calls this
>word U/MOD.
>
>>um/mod ( du u - remainder quotient )
>>
>>Results are single precision unsigned.
>
>If you have a fast U/MOD, you get a faster UM/MOD by using U/MOD than
>by using shift-and-subtract. But it's not straightforward. Gforth
>copied a large chunk of code from libgcc for this.

I suspected that much. Thanks.
Knuth based a multiprecision divide on a single division divide.
A 2-precision divide is a special case of this, but it is equally
cumbersome. I remember that in implementing this multiprecision
divide on the transputer (we did number theoretic work on it)
Marcel Hendrix asked my help for debugging, a rare event.

>
>- anton

Paul Rubin

unread,
Mar 22, 2023, 1:56:33 PM3/22/23
to
an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
> : ud/mod ( ud1 u2 -- urem udquot ) \ gforth
> \G divide unsigned double @i{ud1} by @i{u2}, resulting in a unsigned double
> \G quotient @i{udquot} and a single remainder @i{urem}.
> over 0= if nip u/mod 0 exit then
> dup >r u/mod r> swap >r um/mod r> ;

In gforth 0.7.3 on amd64, I get:

see ud/mod
: ud/mod
>r 0 i um/mod r> swap >r um/mod r> ; ok

I haven't tried to figure out what either of these are doing.

I had been wondering what happens if the quotient won't fit in a single.
0 1 1 ud/mod gives 0 0 1 on my system, which doesn't seem so great.

Anton Ertl

unread,
Mar 22, 2023, 2:58:58 PM3/22/23
to
Paul Rubin <no.e...@nospam.invalid> writes:
>an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>> : ud/mod ( ud1 u2 -- urem udquot ) \ gforth
>> \G divide unsigned double @i{ud1} by @i{u2}, resulting in a unsigned double
>> \G quotient @i{udquot} and a single remainder @i{urem}.
>> over 0= if nip u/mod 0 exit then
>> dup >r u/mod r> swap >r um/mod r> ;
>
>In gforth 0.7.3 on amd64, I get:
>
> see ud/mod
> : ud/mod
> >r 0 i um/mod r> swap >r um/mod r> ; ok
>
>I haven't tried to figure out what either of these are doing.

They are doing what is documented. The implementation I cited is more
optimized than the 0.7.3 one (with the intention of getting a faster
"#" (and words that call it, like ".", especially on architectures
that do not provide UM/MOD, such as ARM and RISC-V)).

>I had been wondering what happens if the quotient won't fit in a single.
>0 1 1 ud/mod gives 0 0 1 on my system, which doesn't seem so great.

What you are doing (on a 64-bit system) is:

18446744073709551616/1

And the result you get is the quotient 18446744073709551616 and
remainder 0.

What's not so great in your opinion?

minforth

unread,
Mar 22, 2023, 4:32:51 PM3/22/23
to
Of course. IMO it boils down to either you have CPU mixed division support
or not. If not, the question remains: is there a faster/simpler division algorithm
than binary shift-subtract?

F.ex. given fast integer log2 and pow2 operators, division can be solved with
multiplication and some additions. But only timing can tell if it is worth the hassle.

dxforth

unread,
Mar 22, 2023, 7:51:20 PM3/22/23
to
On 23/03/2023 4:56 am, Paul Rubin wrote:
> an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>> : ud/mod ( ud1 u2 -- urem udquot ) \ gforth
>> \G divide unsigned double @i{ud1} by @i{u2}, resulting in a unsigned double
>> \G quotient @i{udquot} and a single remainder @i{urem}.
>> over 0= if nip u/mod 0 exit then
>> dup >r u/mod r> swap >r um/mod r> ;
>
> In gforth 0.7.3 on amd64, I get:
>
> see ud/mod
> : ud/mod
> >r 0 i um/mod r> swap >r um/mod r> ; ok
>
> I haven't tried to figure out what either of these are doing.
>
> I had been wondering what happens if the quotient won't fit in a single.

As is potentially the case in UM/MOD ? That would be an overflow/exception.
Don't do that. Intelligent use of UM/MOD is better than default use of UD/MOD.

Anton Ertl

unread,
Mar 23, 2023, 4:03:45 AM3/23/23
to
minforth <minf...@arcor.de> writes:
>Of course. IMO it boils down to either you have CPU mixed division support
>or not. If not, the question remains: is there a faster/simpler division al=
>gorithm
>than binary shift-subtract?=20
>
>F.ex. given fast integer log2 and pow2 operators, division can be solved wi=
>th
>multiplication and some additions. But only timing can tell if it is worth =
>the hassle.

Yes, a lot of the bulk of the stuff we took from libgcc is due to
variations for making use of some feature present on some
architectures to get a faster result, plus the code for an efficient
implementation on architectures where the feature is not present
(which may mean that you use a different approach rather than just
replacing the code for the feature with a software implementation of
that feature). With several such features in play, the result bloats
considerably.

You can code a simple implementation that uses only minimal features,
and you can code it up in Forth, and get a much slimmer result and
then claim "Forth simplicity", but in that case the slimness is not
due to Forth, but due to preferencing slimness over efficiency.

minforth

unread,
Mar 23, 2023, 4:36:52 AM3/23/23
to
Anton Ertl schrieb am Donnerstag, 23. März 2023 um 09:03:45 UTC+1:
> You can code a simple implementation that uses only minimal features,
> and you can code it up in Forth, and get a much slimmer result and
> then claim "Forth simplicity", but in that case the slimness is not
> due to Forth, but due to preferencing slimness over efficiency.

Due to caching effects on CPUs or being able to make better usage of registers,
very often the slimmer code wins the race despite being mathematically less efficient.

And it saves a lot of debugging time. ;-)

Anton Ertl

unread,
Mar 23, 2023, 5:40:14 AM3/23/23
to
minforth <minf...@arcor.de> writes:
>Anton Ertl schrieb am Donnerstag, 23. M=C3=A4rz 2023 um 09:03:45 UTC+1:
>> You can code a simple implementation that uses only minimal features,=20
>> and you can code it up in Forth, and get a much slimmer result and=20
>> then claim "Forth simplicity", but in that case the slimness is not=20
>> due to Forth, but due to preferencing slimness over efficiency.
>
>Due to caching effects on CPUs or being able to make better usage of regist=
>ers,
>very often the slimmer code wins the race despite being mathematically less=
> efficient.

For every architecture only the variant for its architectural features
results is compiled (using conditional compilation). This results in
the following code lengths for UM/MOD (excluding the final jump in the
NEXT):

Bytes Architecture
30 AMD64
292 RISC-V
308 Aarch64

If you come up with shorter code that runs on all the architectures
Gforth supports and also results in shorter and faster object code, I
will happily replace the current Gforth implementation. Until then I
continue to believe that no significant source code size improvement
is possible that does not result in a significant disadvantage in
performance.

>And it saves a lot of debugging time. ;-)

Yes, that's a disadvantage of all this special-case code, but in this
case the debugging was performed by the gcc developers, I just dropped
it in as-is.
0 new messages