Definition of 'DIGIT'

dxforth

unread,

Jun 17, 2022, 12:24:32 AM6/17/22

to

This topic has come up before e.g.

https://groups.google.com/g/comp.lang.forth/c/JC3uSNJdA_8/m/gkNUaKabHP8J

Nearly every forth has it, possibly embedded within >NUMBER. Stack effect
is typically:

DIGIT ( char base -- u true | false )

as that's the minimum.

I never offered it as a factor as I've had applications where 'char' was
needed to be returned on error. Conceivably one could do DUP base DIGIT ...
but it gets rather messy.

Looking at the issue again I've settled on:

>DIGIT ( char base -- u true | char false )

Returning char on error shouldn't create issues for apps which don't need
it as a simple DROP will fix.

I've documented the function as follows:

>DIGIT ( char base -- u true | char false ) A

Convert numerical character char into the corresponding digit u
according to the specified radix. If the conversion is successful
return the digit and true, or char and false otherwise.

Note: Conversion is case-sensitive.

The opposite function >CHAR ( u -- char ) which converts a digit to
character might also be worth offering - not least because it's already
present in '#'.

Below are sample implementations for >DIGIT in 8080 and 8086. They're
in assembler because it's cheaper and a PITA in forth.

; >DIGIT ( char base -- u true | char false )

hdr 1,'>DIGIT'
todig: pop hl
pop de
ld a,e
sub '0'
jp c,todig2
cp 10
jp c,todig1
sub 7
cp 10
jp c,todig2
todig1: cp l
jp nc,todig2
ld e,a
push de
jp true

todig2: push de
jp false

; >DIGIT ( char base -- u true | char false )

hdr 1,'>DIGIT'
todig: pop dx
pop bx
mov al,bl
sub al,'0'
jc todig2
cmp al,10
jc todig1
sub al,7
cmp al,10
jc todig2
todig1: cmp al,dl
jnc todig2
mov bl,al
push bx
jmp true

todig2: push bx
jmp false

Anton Ertl

unread,

Jun 17, 2022, 2:29:24 AM6/17/22

to

dxforth <dxf...@gmail.com> writes:
> DIGIT ( char base -- u true | false )

[...]

> >DIGIT ( char base -- u true | char false )

The advantage of this stack effect is that the stack depth is
independent of the flag value, which means that using >DIGIT restricts
the surrounding code far less than using DIGIT. The same is true of
FIND vs. SEARCH-WORDLIST.

Of course, if the only use of >DIGIT is always followed by IF, WHILE,
or UNTIL right away, this lack of restrictions is unused. I have not
found Gforth's DIGIT? (like DIGIT) restrictive yet.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2022: http://www.euroforth.org/ef22/cfp.html

dxforth

unread,

Jun 17, 2022, 3:11:23 AM6/17/22

to

On 17/06/2022 16:20, Anton Ertl wrote:
>
> I have not
> found Gforth's DIGIT? (like DIGIT) restrictive yet.

More than once I've used:

: ?BADHEX ( char flag -- char )
0= if end
cr ." Expected HEX character "
dup [char] ! [char] ~ between if
[char] " dup emit swap emit emit space
else [char] $ emit hb. then .line ;

: NIBBLE ( -- nibble )
nextchar
dup [char] 0 < ?badhex
dup [char] 9 > if
dup [char] A < ?badhex
dup [char] F > ?badhex 7 -
then $0F and ;

which now becomes:

: BADHEX ( char -- )
cr ." Expected HEX character "
dup [char] ! [char] ~ between if
[char] " dup emit swap emit emit space
else [char] $ emit hb. then .line ;

: NIBBLE ( -- nibble )
nextchar 16 >digit if end badhex ;

NN

unread,

Jun 18, 2022, 7:58:01 PM6/18/22

to

I thought I would try and see if its really a PITA ;

attempt 1

: adjch ( ch -- n T | ch F )
dup '0' '9' 1+ within if '0' - TRUE exit then
dup 'A' 'Z' 1+ within if 'A' - 10 + TRUE exit then
dup 'a' 'z' 1+ within if 'a' - 10 + TRUE exit then
FALSE ;

: >dgt ( ch base -- n T | ch F )
over adjch if dup >r
0 rot within if drop r> true else r> drop False then
else drop drop FALSE then ;

attempt 2

create dgt 128 chars allot

:noname ( -- )
bl 0 do
bl dgt i + c!
loop ; execute

:noname ( -- )
128 bl do
i dgt i + c!
loop ; execute

:noname ( -- )
'9' 1+ '0' do
i '0' - 128 xor dgt i + c!
loop ; execute

:noname ( -- )
'Z' 1+ 'A' do
i 'A' - 10 + 128 xor dgt i + c!
loop ; execute

:noname ( -- )
'z' 1+ 'a' do
i 'a' - 10 + 128 xor dgt i + c!
loop ; execute

: adjch ( ch -- n T | ch F )
dgt + c@ dup 128 and if 128 - TRUE else FALSE then ;

: >dgt ( ch base -- n TRUE | ch FALSE )
over adjch if
dup >r 0 rot within if drop r> TRUE else r> drop FALSE then
else drop drop FALSE then ;

dxforth

unread,

Jun 18, 2022, 10:25:58 PM6/18/22

to

On 19/06/2022 09:57, NN wrote:
> I thought I would try and see if its really a PITA ;

> ...

What did you conclude?

Anton Ertl

unread,

Jun 19, 2022, 3:54:02 AM6/19/22

to

NN <novembe...@gmail.com> writes:
>I thought I would try and see if its really a PITA ;
>
>attempt 1
>
>: adjch ( ch -- n T | ch F )
> dup '0' '9' 1+ within if '0' - TRUE exit then
> dup 'A' 'Z' 1+ within if 'A' - 10 + TRUE exit then
> dup 'a' 'z' 1+ within if 'a' - 10 + TRUE exit then
> FALSE ;

This is pretty good for BASE-36 conversion, if you do it the branchy
way. Only three branches. I thought about using binary search, but
it's not necessarily better.

>: >dgt ( ch base -- n T | ch F )
> over adjch if dup >r
> 0 rot within if drop r> true else r> drop False then
> else drop drop FALSE then ;

This is ugly, though.

One thing to note is that we don't need the upper bounds of ADJCH,
because that is handled in >DGT; let's see what we can do with that:

: adjch1 ( ch -- n1 | -n2 )
dup '0' '9' 1+ within 'A' '0' - and
over 'A' u>= 10 and +
over 'a' u>= 'A' 'a' - and + + 'A' - ;

This is somewhat intricate. The first and the third 'A' need to be
the same value and at least as large as 'A' (and not too large). It
results in all non-digits below 'A' being negative. The cool thing
about this implementation is that it is branchless and does not need a
table in memory. On VFX 4.72 it takes 51 bytes, 20 instructions, on
VFX64 66 bytes and 20 instructions.

Let's see what we can do about >DGT:

: >DGT ( ch base -- n true | ch false )
over adjch1 tuck u> dup >r 0= select r> ;

Where SELECT can be defined as

: select ( u1 u2 f -- u )
\ ""If @i{f} is false, @i{u} is @i{u2}, otherwise @i{u1}.""
IF swap THEN nip ;

Or it can be implemented with a conditional move instruction.

NN

unread,

Jun 19, 2022, 10:40:35 AM6/19/22

to

If the standard hadn't mandated base to be 2..36 , we could
have gone all the way up to base 62. ( treat 'A' and 'a' as
different )

Although attempt 2 looks longer and complicated, a lookup
and jump seems quicker than perhaps doing a conditional test.

Anton Ertl

unread,

Jun 19, 2022, 11:13:50 AM6/19/22

to

NN <novembe...@gmail.com> writes:
>If the standard hadn't mandated base to be 2..36 , we could
>have gone all the way up to base 62. ( treat 'A' and 'a' as
>different )

The standard does not guarantee that BASE>36 works, but systems are
free to support more. In Gforth you can sensibly use bases up to 42:

#42 base ! ok
#36 . [ ok
#37 . \ ok
#38 . ] ok
#39 . ^ ok
#40 . _ ok
#41 . ` ok

I just tried

#42 base !
#41 .
0` .

on iForth, lxf, sf, and vfx4, and they all output "`" for the second
and third lines, so it seems they can all manage base 42.

The next character after "`" is "a", so to support larger bases, a
more complicated implementation that skips a..z would be necessary on
case-insensitive Forth systems. The benefit is miniscule, so I doubt
that anybody went there.

NN

unread,

Jun 19, 2022, 11:17:34 AM6/19/22

to

I agree my code for >dgt was ugly.

I like your implementation of adjch1. Its what I tried to do and failed, which is
why I ended up using the 3 withins which I wanted to avoid.

Reminded me of the old saying about writing dumb code vs clever code.
I wrote dumb code but clever code is better in this example.

Anton Ertl

unread,

Jun 19, 2022, 12:27:29 PM6/19/22

to

NN <novembe...@gmail.com> writes:
>> >: adjch ( ch -- n T | ch F )
>> > dup '0' '9' 1+ within if '0' - TRUE exit then
>> > dup 'A' 'Z' 1+ within if 'A' - 10 + TRUE exit then
>> > dup 'a' 'z' 1+ within if 'a' - 10 + TRUE exit then
>> > FALSE ;

...

>> : adjch1 ( ch -- n1 | -n2 )
>> dup '0' '9' 1+ within 'A' '0' - and
>> over 'A' u>= 10 and +
>> over 'a' u>= 'A' 'a' - and + + 'A' - ;

...

>I like your implementation of adjch1. Its what I tried to do and failed, which is
>why I ended up using the 3 withins which I wanted to avoid.
>
>Reminded me of the old saying about writing dumb code vs clever code.
>I wrote dumb code but clever code is better in this example.

For some value of "better". Your definition is easier to understand
and (I guess) was easier to write. It took me over an hour to get the
clever solution right. Let's try a more understandable
implementation of ADJCH1:

: adjch1 ( ch -- n1 | -n2 )

dup '9' 1+ < if '0' - exit then
'A' - dup 0< if exit then
'a' 'A' - < if 'a' 'A' - - then
10 + ;

Untested, and probably still less understandable than your ADJCH. But
at least I think that I can get it to work with much less effort than
the clever version above.

none albert

unread,

Jun 19, 2022, 2:59:52 PM6/19/22

to

In article <2022Jun1...@mips.complang.tuwien.ac.at>,

Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>NN <novembe...@gmail.com> writes:
>>If the standard hadn't mandated base to be 2..36 , we could
>>have gone all the way up to base 62. ( treat 'A' and 'a' as
>>different )
>
>The standard does not guarantee that BASE>36 works, but systems are
>free to support more. In Gforth you can sensibly use bases up to 42:
>
>#42 base ! ok
>#36 . [ ok
>#37 . \ ok
>#38 . ] ok
>#39 . ^ ok
>#40 . _ ok
>#41 . ` ok
>
>I just tried
>
>#42 base !
>#41 .
>0` .
>
>on iForth, lxf, sf, and vfx4, and they all output "`" for the second
>and third lines, so it seems they can all manage base 42.
>
>The next character after "`" is "a", so to support larger bases, a
>more complicated implementation that skips a..z would be necessary on
>case-insensitive Forth systems. The benefit is miniscule, so I doubt
>that anybody went there.

ciforth went there. As soon as the base is not decimal, the exponent
sign becomes _.
S[ ] OK DECIMAL 1E1 FS.
9.999999999999999999E0
S[ ] OK HEX 1_1 FS.
1.000000000000000000_1
S[ ] OK
The _ is algol68 compatible, going as far as 5F.
Hex numbers can use base 0x40, this is useful, because
fp numbers can be represented exactly.

(This goes through the drain as you want CASE-INSENSITIVE.
Then 'a' is mapped on 'A'.)

>
>- anton

Groetjes Albert
--
"in our communism country Viet Nam, people are forced to be
alive and in the western country like US, people are free to
die from Covid 19 lol" duc ha
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

Marcel Hendrix

unread,

Jun 20, 2022, 1:16:12 AM6/20/22

to

On Sunday, June 19, 2022 at 8:59:52 PM UTC+2, none albert wrote:
> In article <2022Jun1...@mips.complang.tuwien.ac.at>,
> Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
> >NN <novembe...@gmail.com> writes:
> >>If the standard hadn't mandated base to be 2..36 , we could
> >>have gone all the way up to base 62. ( treat 'A' and 'a' as
> >>different )
> >
> >The standard does not guarantee that BASE>36 works, but systems are
> >free to support more. In Gforth you can sensibly use bases up to 42:
> >
> >#42 base ! ok
> >#36 . [ ok
> >#37 . \ ok

[..]

> ciforth went there. As soon as the base is not decimal, the exponent
> sign becomes _.
> S[ ] OK DECIMAL 1E1 FS.
> 9.999999999999999999E0
> S[ ] OK HEX 1_1 FS.

iForth:

FORTH> 72 base ! ok
FORTH> A . A ok
FORTH> X . X ok
FORTH> Z . Z ok
FORTH> a . a ok
FORTH> z . z ok
FORTH> z decimal . 67 ok

-marcel

dxforth

unread,

Jun 20, 2022, 2:44:56 AM6/20/22

to

When case matters...

DX-Forth:

67 >char emit z ok
char z 72 >digit ok 67 -1 <

dxforth

unread,

Jun 20, 2022, 3:17:31 AM6/20/22

to

On 20/06/2022 00:40, NN wrote:
> On Sunday, 19 June 2022 at 03:25:58 UTC+1, dxforth wrote:
>> On 19/06/2022 09:57, NN wrote:
>> > I thought I would try and see if its really a PITA ;
>> > ...
>>
>> What did you conclude?
>
> If the standard hadn't mandated base to be 2..36 , we could
> have gone all the way up to base 62. ( treat 'A' and 'a' as
> different )

Is there an externally recognized convention or standard for
number to character conversion? If so, and it's limited to
'0'-'9','A'-'Z' then it's hard to argue ANS should have done more.

Anton Ertl

unread,

Jun 20, 2022, 4:06:38 AM6/20/22

to

albert@cherry.(none) (albert) writes:
>ciforth went there. As soon as the base is not decimal, the exponent
>sign becomes _.

...

>The _ is algol68 compatible, going as far as 5F.

That's unfortunate, because _ is a common digit group separator, not
just in Ada, C#, D, Haskell, Java, Kotlin, OCaml, Perl, Python, PHP,
Ruby, Go, Rust, Julia, and Swift
<https://en.wikipedia.org/wiki/Decimal_separator#Data_versus_mask>,
but also outside computing:

<https://en.wikipedia.org/wiki/Decimal_separator#Digit_grouping>:

|For ease of reading, numbers with many digits may be divided into
|groups using a delimiter, such as [...] underbar "_" (as in maritime
|"21_450") [...]

>Hex numbers can use base 0x40

This makes no sense. "Hex" means "base $10".

>this is useful, because
>fp numbers can be represented exactly.

If you want easy conversion to binary FP numbers, binary, octal, hex
or base 32 will do, no need for base 64. The way other languages
(apparently starting with C99) seem to have standardized on is hex
mantissa digits, "p" (instead of "e") to start the exponent, and
decimal exponent digits.

Marcel Hendrix

unread,

Jun 20, 2022, 7:03:31 AM6/20/22

to

On Monday, June 20, 2022 at 10:06:38 AM UTC+2, Anton Ertl wrote:
[..]

> If you want easy conversion to binary FP numbers, binary, octal, hex
> or base 32 will do, no need for base 64. The way other languages
> (apparently starting with C99) seem to have standardized on is hex
> mantissa digits, "p" (instead of "e") to start the exponent, and
> decimal exponent digits.

"Convert among IEEE 754 32-bit or 64-bit float, C99 mixed hex/decimal
string, and raw hex string formats" published by David N. Williams higher
up in this newsgroup.

-marcel

Anton Ertl

unread,

Jun 20, 2022, 10:13:35 AM6/20/22

to

I found the code by googling this title:

http://www-personal.umich.edu/~williams/archive/forth/utilities/mixfloat.fs
http://www-personal.umich.edu/~williams/archive/forth/utilities/mixfloat-test.fs

none albert

unread,

Jun 20, 2022, 4:53:40 PM6/20/22

to

In article <2022Jun2...@mips.complang.tuwien.ac.at>,

Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>albert@cherry.(none) (albert) writes:
>>ciforth went there. As soon as the base is not decimal, the exponent
>>sign becomes _.
>...
>>The _ is algol68 compatible, going as far as 5F.
>
>That's unfortunate, because _ is a common digit group separator, not
>just in Ada, C#, D, Haskell, Java, Kotlin, OCaml, Perl, Python, PHP,
>Ruby, Go, Rust, Julia, and Swift

Python?
~$ echo aap=123456E1 | python
~$ echo aap=1234_56E1 | python
File "<stdin>", line 1
aap=1234_56E1
^
SyntaxError: invalid syntax

Fortunately floating point is a loadable extension in ciforth:

: XC BASE @ $0A = IF &E ELSE &_ THEN ; \ Exponent character
(line 2 in screen 250) is easily changed to
: XC BASE @ $0A = IF &E ELSE &~ THEN ; \ Exponent character

><https://en.wikipedia.org/wiki/Decimal_separator#Data_versus_mask>,
>but also outside computing:
>
><https://en.wikipedia.org/wiki/Decimal_separator#Digit_grouping>:
>
>|For ease of reading, numbers with many digits may be divided into
>|groups using a delimiter, such as [...] underbar "_" (as in maritime
>|"21_450") [...]

I wished that that grouping was applied to the 16 digit number that
I must type in to prolong my prepaid. For ease of reading the space
suffice largely. It can be used this way in Algol68, because white
space has no meaning in Algol68, except in strings.

>> Base 0x40 is useful, because

>>fp numbers can be represented exactly.
>
>If you want easy conversion to binary FP numbers, binary, octal, hex
>or base 32 will do, no need for base 64. The way other languages
>(apparently starting with C99) seem to have standardized on is hex
>mantissa digits, "p" (instead of "e") to start the exponent, and
>decimal exponent digits.

32 is 5 bits, 64 is 6 bits. Most Forthers will gladly accept the 16%
save in storage for virtually no effort:
0x40 BASE !

The Forth way is simple. All digits have the same BASE.
Accepting E as the exponent sign is a misguided compatibility
with FORTRAN. Using _ (or ~) from the start had been much simpler.

If you must be compatible with Ada, then you must jump through hoops.
It can't be helped, but ascii representation of fp Forth numbers
can be painless.

>
>- anton

Paul Rubin

unread,

Jun 20, 2022, 5:14:52 PM6/20/22

to

albert@cherry.(none) (albert) writes:
> ~$ echo aap=1234_56E1 | python
> File "<stdin>", line 1
> aap=1234_56E1
> ^
> SyntaxError: invalid syntax

This works for me in Python 3.9.2. It may be a Python 3 thing, or maybe
even only a recent versions of Python 3 thing.

Anton Ertl

unread,

Jun 21, 2022, 3:36:22 AM6/21/22

to

albert@cherry.(none) (albert) writes:
>In article <2022Jun2...@mips.complang.tuwien.ac.at>,
>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>>albert@cherry.(none) (albert) writes:
>>>ciforth went there. As soon as the base is not decimal, the exponent
>>>sign becomes _.
>>...
>>>The _ is algol68 compatible, going as far as 5F.
>>
>>That's unfortunate, because _ is a common digit group separator, not
>>just in Ada, C#, D, Haskell, Java, Kotlin, OCaml, Perl, Python, PHP,
>>Ruby, Go, Rust, Julia, and Swift
>
>Python?
>~$ echo aap=123456E1 | python
>~$ echo aap=1234_56E1 | python
> File "<stdin>", line 1
> aap=1234_56E1
> ^
> SyntaxError: invalid syntax

[~:130895] echo 'print(1234_56E1)' | python3
1234560.0

>><https://en.wikipedia.org/wiki/Decimal_separator#Data_versus_mask>,

There you find for many programming languages the version when _ was
introduced as digit group separator.

>For ease of reading the space
>suffice largely. It can be used this way in Algol68, because white
>space has no meaning in Algol68, except in strings.

Anxiously awaiting the next version of ciforth where space has no
meaning.