It is often useful to write numbers in other than the current BASE.
Changing BASE to do this is often inconvenient and has lead to
hard-to-find errors.
Solution
The Forth text interpreter should accept numbers like the following ones:
#-12346789. \ decimal double
$-1234cDeF. \ hex double
%-10101010. \ binary double
'a' \ equivalent to char a or [char] a
(and of course without "." for single-cell and without - for positive
and unsigned numbers; the text interpreter only has to accept
double-cell numbers if the double-cell wordset is present).
Typical usage:
#99
#-99.
$-ff
$ff.
$FF
$-FF.
%-11
%11.
'a'
Existing Practice
Many Forth systems support various number prefixes already in the
interpreter:
#10 $10 %10 &10 0x10 0X10 10h 10H $f $F 'a 'a' input
10 16 2 - - - - - - 15 - 97 iforth 2.1.2541, CHForth 1.2.5, 1.3.9
10 16 2 10 - - - - 15 15 97 24871 bigFORTH rev. 2.1.6
- 16 2 10 16 16 - - 15 15 - - PFE 0.33.34
- 16 2 10 - - - - 15 15 97 24871 Gforth-0.6.2
10 16 2 10 16 16 - - 15 15 97 97 Gforth 0.6.9
- 16 - - 16 16 - - 15 15 - 97 Win32Forth version 4.xxx
10 16 2 10 16 16 16 16 15 15 - 97 Win32Forth version 6.xx, Win32Forth-STC version 0.02.xx
10 16 2 8 - - - - 15 15 - - SwiftForth
10 16 2 - 16 16 16 16 15 15 - - VFX Forth 4.0.2 build 2428
10 16 2 8 - - - - 15 15 - - ntf/lxf (Peter Fälth)
#9. #9 -#9. -#9 #-9. #-9 $F. $F -$F. -$F $-F. $-F %1. %1 -%1. -%1 %-1. %-1
9 9 - - -9 -9 15 15 - - -15 -15 1 1 - - -1 -1 iForth 2.1.2541
9 9 -9 -9 - - 15 15 -15 -15 - - 1 1 -1 -1 - - bigForth 2.1.6
- - - - - - 15 15 -15 -15 -15 -15 1 1 -1 -1 -1 -1 PFE 0.33.34
9 9 -9 -9 - - 15 15 -15 -15 - - 1 1 -1 -1 - - gforth 0.6.9
9 9 -9 -9 -9 -9 15 15 -15 -15 -15 -15 1 1 -1 -1 -1 -1 Win32Forth 6.xx
9 9 - - -9 -9 15 15 - - -15 -15 1 1 - - -1 -1 SwiftForth 3.0.11
9 9 - - -9 -9 15 15 - - -15 -15 1 1 - - -1 -1 ntf/lxf (Peter Fälth)
The programs used to produce these outputs are: test-num-prefixes.fs
and test-num-prefixes2.fs
None of these systems accept !10, @10 or ^10, and none accept these
prefixes in >NUMBER.
Also:
iForth and CHForth support the following prefixes
^ for control characters, e.g., ^G for 7.
& for characters, e.g., &" for 34 and &1 for 49.
" for single characters, e.g. "a" for 97.
nft/lxf supports the prefix U for a Unicode codepoint, e.g., U20AC for
the Euro sign (to convert from the Unicode number to the on-stack
format in this system for xchars).
Open Firmware, eforth, and Swiftforth support the parsing words H# D#
B# O# to determine the base, used as in "H# FF".
Remarks
Character value syntax
The '<char>' syntax does not have as much support in existing systems
as the others; if there is too much resistance against that, I will
take it out.
Sign or base prefix first?
Some people find it more intuitive to place the sign in front of the
base prefix, some find it more natural to do it the other way round.
That's probably a good reason for systems to accept both variants.
However, currently slightly more systems support the base-prefix-first
approach, so that's what is proposed here. Systems can also implement
to other variant, but programs making use of that would not conform to
this proposal.
Proposal
Move "12.2.2.1 Numeric notation" to "2.2.1 Numeric Notation",
with appropriate editing.
Replace "3.4.1.3 Text interpreter input number conversion" with:
|When converting input numbers, the text interpreter shall recognize
|integer numbers in the following form:
|
|Convertible string := { <BASEnum> | <decnum> | <hexnum> | <binnum> | <cnum> }
|<BASEnum> := [-]<bdigit><bdigit>*
|<decnum> := #[-]<decdigit><decdigit>*
|<hexnum> := $[-]<hexdigit><hexdigit>*
|<binnum> := %[-]<bindigit><bindigit>*
|<cnum> := '<char>'
|<bindigit> := { 0 | 1 }
|<decdigit> := { 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 }
|<hexdigit> := { <decdigit> | a | b | c | d | e | f | A | B | C | D | E | F }
|
|<bdigit> represents a digit according to the value of BASE (see
|"3.2.1.2 Digit conversion"). For <hdigit>, the digits a..f have the
|values 10..15. <char> represents any printable character.
|
|The radix used for number conversion is: for <BASEnum>, the value in
|BASE; for <decnum> 10; for <hexnum> 16; for <binnum> 2. For <cnum>,
|the number is the value of <char>.
Change "8.3.2 Text interpreter input number conversion" as follows:
|When the text interpreter processes a number except a <cnum> that is
^^^^^^^^^^^^^^^ new
|immediately followed by a decimal point and is not found as a
|definition name, the text interpreter shall convert it to a
|double-cell number.
|
|For example, entering DECIMAL 1234 leaves the single-cell number 1234
|on the stack, and entering DECIMAL 1234. leaves the double-cell number
|1234 0 on the stack.
Reference implementation
Implementing this proposal requires changes in places that are quite
system-specific, therefore a reference implementation is not useful.
Tests
require test/tester.fs
decimal
{ #1289 -> 1289 }
{ #12346789. -> 12346789. }
{ #-1289 -> -1289 }
{ #-12346789. -> -12346789. }
{ $12eF -> 4847 }
{ $12aBcDeF. -> 313249263. }
{ $-12eF -> -4847 }
{ $-12AbCdEf. -> -313249263. }
{ %10010110 -> 150 }
{ %10010110. -> 150. }
{ %-10010110 -> -150 }
{ %-10010110. -> -150. }
{ 'z' -> 122 }
- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2007: http://www.complang.tuwien.ac.at/anton/euroforth2007/
First of all, I would suggest to use & as a "prefix" for octal
numbers.
Second: I dont understand, why it has to be a prefix. I could
perfectly imagine to have multiple such "prefixes" to construct
numbers - eg. to contruct numbers with mixed bases. A number converter
would then (temporarily) change the base at the point where the
"prefix" occurs.
Third: To simplify things, the minus-sign should simply flag that the
result is negative (as . (dot) does with doubles), as long as there
are no digits other than 0 (zero). That means an implementation has
simply to check if temporary convert result is zero and then to accept
the minus.
Valid numbers at my opinion could be:
0- ( yes! )
12-34
%111.11110001
&007
#10$FF
Grüße,
Helmar
Whooo! Nothing for Win32Forth to do to meet this standard; a first!
This is a very useful extension to the standard. (I'd be amazed if you
could get the same congruence or agreement for output words.)
Interesting to note the support for 0Xn and nH hex formats; they're
very useful when parsing C type constructs. Win32Forth also supports
0x with a trailing L as in 0x1234L, which I understand is a long
indicator; it's treated as a single cell value.
Win32Forth also supports an open number parsing interface that allows
the addition of non-standard number formats (as long as they translate
to a single/double cell or float). One included in the system as
standard supports dotted notation IPv4 addresses.
--
Regards
Alex McDonald
Ouch to 12-43. O-uch ou-ch ouc-h, that hurts, even in base36.
And 1- is not a number.
--
Regards
Alex McDonald
> Ouch to 12-43. O-uch ou-ch ouc-h, that hurts, even in base36.
Yes, that's why I did not want to write it. I forgot to remove this
example. That's also why I wrote the converter should check for zero.
I know that you would say "ouch" ;)
> And 1- is not a number.
: 6 ." six" ;
and 6 is no number.
-Helmar
I meant that 1- is defined in the standard; it's not the same as -1.
Allowing 0- would simply serve to confuse.
--
Regards
Alex McDonald
My concerns where about simplification of the number parsing/converter
algorithm.
Just imagine the algorithm I proposed:
While character in input Do
Case the character from input
'%' Of binary EndOf
'#' Of decimal EndOf
...
'-' Of Result @ Negative? @ abort" sign on wrong place"
Negative? on EndOf
dup Digit? If Digit>Bin Result @ base @ * + Result ! Else
...
Then
EndCase
EndDo
I could give you a real implementation if you need.
-Helmar
> --
> Regards
> Alex McDonald
> '-' Of Result @ Negative? @ abort" sign on wrong place"
'-' Of Result @ Negative? @ or abort" ...
(Today's a bug in my keybord ;) )
-Helmar
> On Aug 3, 8:07 pm, an...@mips.complang.tuwien.ac.at (Anton Ertl)
> wrote:
>> Problem
>>
>> It is often useful to write numbers in other than the current BASE.
>> Changing BASE to do this is often inconvenient and has lead to
>> hard-to-find errors.
>
> First of all, I would suggest to use & as a "prefix" for octal
> numbers.
Why, who's still using octal in this century? I'm perfecty comfortable with
& as an alternative to CHAR or [CHAR]
> Second: I dont understand, why it has to be a prefix. I could
> perfectly imagine to have multiple such "prefixes" to construct
> numbers - eg. to contruct numbers with mixed bases. A number converter
> would then (temporarily) change the base at the point where the
> "prefix" occurs.
Too complicated, and confusing
> Third: To simplify things, the minus-sign should simply flag that the
> result is negative (as . (dot) does with doubles), as long as there
> are no digits other than 0 (zero). That means an implementation has
> simply to check if temporary convert result is zero and then to accept
> the minus.
Ditto
> %111.11110001
Some implementations allow a decimal point only at the end of a (double)
number.
--
Coos
CHForth, 16 bit DOS applications
http://home.hccnet.nl/j.j.haak/forth.html
The core of the Win32Forth support for all the number notations it
supports in Anton's tests is around 250 words of code in total. You
would be better using >NUMBER to parse the numeric part of the string
(having set the base first based on the prefix).
--
Regards
Alex McDonald
I'm using ' (tick) for this. Also works perfect and it would work with
the proposed 'x' too. It is not so that the proposal of 200x would not
make people need to change their systems...
> > Second: I dont understand, why it has to be a prefix. I could
> > perfectly imagine to have multiple such "prefixes" to construct
> > numbers - eg. to contruct numbers with mixed bases. A number converter
> > would then (temporarily) change the base at the point where the
> > "prefix" occurs.
>
> Too complicated, and confusing
But easy to implement and offers the feature of mixed base for
numbers. Well, you probably do not need it - but you did not have it
until now ;) It is usually simple to explain to everybody that ever
made a number convert algorithm.
> > Third: To simplify things, the minus-sign should simply flag that the
> > result is negative (as . (dot) does with doubles), as long as there
> > are no digits other than 0 (zero). That means an implementation has
> > simply to check if temporary convert result is zero and then to accept
> > the minus.
>
> Ditto
?! Implement the "-%xxx"-"%-xxx"-thing. The only thing with my idea is
that the x may be zero before reporting an error.
> > %111.11110001
>
> Some implementations allow a decimal point only at the end of a (double)
> number.
That's not in ANS or?
-CG
> The core of the Win32Forth support for all the number notations it
> supports in Anton's tests is around 250 words of code in total.
Ah, that's about 220 in 4p (as wc says), including comment words,
supporting mixed bases and supporting the OF-"prefixes" like h#, o#
and so on. Ah, the implementation of "decimal" & co. is also counted.
> You
> would be better using >NUMBER to parse the numeric part of the string
> (having set the base first based on the prefix).
What?
Regards,
-Helmar
> > You
> > would be better using >NUMBER to parse the numeric part of the string
> > (having set the base first based on the prefix).
>
> What?
Ah, I see. I've never seen >NUMBER as a good factoring. It's not
complicated to implement, so I did not do it for most of my systems.
In fact I did not even understand the stack diagram in
http://maschenwerk.de/HelFORTH/DPANS/dpans6.htm#6.1.0570 completely
from a design point of view. "( ud1 c-addr1 u1 -- ud2 c-addr2 u2 )"
ud"x" means double. "c-addr1 u1" means at least two values. I dont
have the "ud1" first in most cases. At least that's what I'm thinking
about...
-Helmar
Anton's [-][$&%#]n+[.n*] parse is trival.
\ parse for [-][$&%#]n+[.n*]
0 value double? \ double value
-1 value dpl \ decimal point
location
0 value -ve-num? \ negate value flag
: -ve-test ( addr len -- addr' len' ) \ skip possible - sign, set
-ve-num?
\ -ve-num? throw \ for additional - syntax
over c@ [char] - = \ check for sign
if
true to -ve-num?
1 /string \ bump past
dup 0= throw \ nothing left is error
then ;
: dotted-number ( addr len -- d )
\ -ve-test \ for additional - syntax
0 0 2swap >number \ convert number
dup if
over c@ [char] . = if \ next char is a '.' ?
dup 1- to dpl
true to double?
1 /string >number \ convert the rest
then dup 0<> throw \ check no string left
then 2drop ;
: base-tonum ( addr len base -- d )
>r 1 /string r> \ past base char
base @ >r base ! \ save base, set base
['] dotted-number catch \ convert
r> base ! \ restore base
throw \ throw if in error
;
: base-number ( addr len -- d ) \ [-][$&%#]n+[.n*]
0 to -ve-num?
0 to double?
-1 to dpl
-ve-test \ might start with -
over c@
case
[char] $ of 16 base-tonum endof
[char] # of 10 base-tonum endof
[char] % of 2 base-tonum endof
drop dotted-number dup
endcase
-ve-num? if dnegate then
;
\ usage
\ s" <number to convert>" base-number
\ returns a double cell value, and 2 values;
\ double? true for a double, false for a single cell
\ dpl is the position of the '.' in the string for a double
The extended [-][$&%#][-]n+[.n*] has a trivial test for more than one
- (minus-sign) as shown.
--
Regards
Alex McDonald
: 0- ; immediate
i.e. a noop
> --
> Regards
> Alex McDonald
George Hubert
> > ?! Implement the "-%xxx"-"%-xxx"-thing. The only thing with my idea is
> > that the x may be zero before reporting an error.
>
> Anton's [-][$&%#]n+[.n*] parse is trival.
You've forced it. The implementation in 4p is much more simple,
including all my proposals. The orginal 4p implementation has some
more thing (eg. a "c" inside numbers that multiplies with cell-size if
base is < 12 - just dont ask, you did not even understand completely
my actual concerns).
: >xnum convert-sign off
dup 0= convert-error !
base @ >r 0 >r while: over c@
case
'- of convert-sign @
r or if convert-error on then
convert-sign on endof
'% of bin endof
'& of octal endof
'# of decimal endof
'$ of hex endof
', of endof
dup >bin r> base @ * + >r
endcase
s++ repeat drop r> r> base !
convert-sign @ and: negate ;
I know that it is unfair to post this, since the source makes use of
some features of 4p and makes use of parts of the default number
convert things (~ 220 words).
Btw, to hook this into default interpreter do:
:: scratch$ >xnum
convert-error @ if drop [''] word? ;then
[''] (number) tuck body^ ! ; Numbers >cfind
Regards,
-Helmar
...
> Ouch to 12-43. O-uch ou-ch ouc-h, that hurts, even in base36.
>
> And 1- is not a number.
Badly placed signs are abominable. Everyone knows that 12-34 = -22.
Jerry
--
Engineering is the art of making what you want from things you can get.
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
You're right.
1- is in RPN-world something you could misunderstand.
So we should rename it:
1- to decr
1+ to incr
The 2+ and 2- things are no longer needed and superseded by cell+ and
cell-.
Regards,
-Helmar *just likes this discussion*
Convenient? - yes, but not essential.
Hard-to-find errors? - no more so than any error is hard to find :)
> Solution
>
> The Forth text interpreter should accept numbers like the following ones:
>
> #-12346789. \ decimal double
> $-1234cDeF. \ hex double
> %-10101010. \ binary double
Forth has long history of simple syntax i.e. not embedding operators
into arguments as it only makes the compiler/parser more complex.
But given the practice of prefixing numbers has become popular
(in large forths at least), then it may be worth standardizing.
However since it is a convenience rather than a necessity (we have
DECIMAL HEX) it should not be 'required'.
BTW standard forth is currently ambiguous whether HEX 0abcd
is a valid number. Will this also apply to $abcd ?
The same rules on case ought to apply, irrespective how numbers
are represented.
> 'a' \ equivalent to char a or [char] a
This is a gforth-ism which is not in widespread use. It's not even
a number prefix. Make it a separate proposal.
ANS programs require the dec.point be at the end of a double number.
It was a good move and creates less confusion.
As shown in Section "Existing Practice", & has three conflicting
meanings in existing systems, so trying to standardize any meaning for
the & prefix is likely to fail. The implementors and users of the
systems that implemented/used a different meaning are unlikely to
implement/use the proposed standard meaning, and are likely to fight
against the inclusion of the proposal in the standard.
>Second: I dont understand, why it has to be a prefix. I could
>perfectly imagine to have multiple such "prefixes" to construct
>numbers - eg. to contruct numbers with mixed bases. A number converter
>would then (temporarily) change the base at the point where the
>"prefix" occurs.
This RfD is about standardization and existing common practice, not
design space exploration. But anyway, systems are allowed to accept
additional syntaxes, including the one you suggest. However, as a
programmer the need for mixed bases has eluded me until now.
Discussion of that is outside of the scope of the number prefixes
proposal, but in an earlier instance of the number prefixes discussion
this has come up, and there was the opinion that this kind of stuff is
so system-specific that it would be impossible to standardize. At
first I agreed, but later I thought that this is not necessarily true:
The specification of the way that the text interpreter works does not
leave that many freedoms to system implementors, so they should be
able to integrate well-designed hooks without too much effort.
So, what might such a hook interface look like?
Essentially the program has to provide a word (let's call it a
recognizer) that takes a string (for the "word" that was not found),
and returns a flag that indicates whether the string was recognized or
not. In addition, the word may do things to the stack below the
string/flag (e.g., push literal numbers) or compile things (e.g.,
literal numbers). So, such a recognizer would have the stack effect:
( i*x c-addr u -- j*x f )
[Yes, let's not design counted strings into this interface]
In addition, the interface should support stacking of recognizers, so
that one library can provide support for recognizing time syntax,
while another library provides support for complex numbers. The
program should be able to specify in which order the recognizers
should be processed (for cases where the recognized syntax overlaps).
In that context, the ordinary dictinary search and
execution/compilation of words by the text interpreter is just another
recognizer that is normally run first, before the integer and FP
recognizers.
Moreover, one should be able to remove a recognizer (e.g., it may be
useful in some context, but not in general).
This stacking stuff reminds me of the search order for wordlists, so
the stacking interface should be similar to that.
Thinking further (probably beyond what has even a slight chance of
being standardized), one could have a common abstraction for
recognizers and wordlists, and hierarchical search orders. Then the
dictionary search would be implemented as a sub-search-order of the
recognizer stack, that just contains the normal wordlist search order.
One thing to consider with this kind of interface is whether we want
to have just one STATE-smart recognizer word per syntax, or have two
recognizer words, one for each state, or maybe something else (maybe
something that would also support using the recognizers inside ]]
... [[).
So I'm not in your killfile after all. Hmm.
>But given the practice of prefixing numbers has become popular
>(in large forths at least), then it may be worth standardizing.
>However since it is a convenience rather than a necessity (we have
>DECIMAL HEX) it should not be 'required'.
That's a good point. The next RfD will contain appropriate wording.
>BTW standard forth is currently ambiguous whether HEX 0abcd
>is a valid number. Will this also apply to $abcd ?
No, $abcd is a valid number. This is made clear in various parts of
the proposal: Nearly all the examples of $-prefixed numbers, the
specification of <hexdigit> and the tests.
That follows common practice. All systems except iForth and CHForth
support it.
>The same rules on case ought to apply, irrespective how numbers
>are represented.
Yes, systems should accept lower-case letters as digits equivalent to
the corresponding upper-case letter when the BASE is <=36 (this is be
a compatible extension for existing systems even if they treat
lower-case letters as high-value digits for higher bases; for such a
low base lower-case letters don't occur in numbers in these system's
existing programs); however, that's for another proposal.
>> 'a' \ equivalent to char a or [char] a
>
>This is a gforth-ism which is not in widespread use.
Actually Gforth-0.6.2 does not support it, so it's not a Gforthism at
all. iForth, CHForth, Win32Forth (even the ancient 4.x version),
Gforth-0.6.9 and probably current bigForth support it, so it is widely
supported and my impression is that it is also widely used (IIRC I
have seen it in programs posted here, and not just by me).
>Make it a separate proposal.
Now or never, at least as far as I am concerned.
So we have one voice against '<char>'. Any more? Any voices in
favour?
[..]
> That follows common practice. All systems except iForth and CHForth
> support it.
If mixed case for HEX and other bases <= 36 is standardized, how do we then
specify how such numbers are to be *displayed*? In lower case, in upper case,
steered with a global flag, with all new words? (Or as a string, with an
extra allocating lowercasing word, or as two strings :-)
[..]
>>> 'a' \ equivalent to char a or [char] a
>>
>>This is a gforth-ism which is not in widespread use.
I see the commenters are doing some heavy research before posting.
> So we have one voice against '<char>'. Any more? Any voices in
> favour?
Strongly in favor.
-marcel
> ( i*x c-addr u -- j*x f )
> [Yes, let's not design counted strings into this interface]
And let's not design another variable stack effect word.
It might be an idea to return the address of a structure, or
do ( c-addr u -- x-addr type ) where addr is the address of a
transient buffer, aligned for the biggest possible item (SS2,
Altivec, ..). The value of the possible type indicators
might need to be standardized (enumerated ;-)
-marcel
Just as it is specified now, i.e. in upper case. PFE, bigForth and
Gforth (and I guess others, but I only tried these three) all accept
lower-case digits on input and (as is required by the standard) print
upper-case digits on output.
Note that the present proposal is for lower-case digits for $-prefixed
numbers only. Whether a Forth system accepts lower-case digits in
other contexts is still up to the implementor even if this proposal is
accepted into the standard. However, I encourage accepting lower-case
digits on input, especially for an otherwise case-insensitive system
like iForth.
I dislike variable stack effects as much as anyone, but in this case I
think it's hard to avoid and the alternatives are worse (see below).
Also, this is not about designing a word, but a hook. So we would not
be providing a word that is cumbersome to use because of its variable
stack effect, we would provide a hole into which a
variable-stack-effect word can be plugged. Ok, such words would then
be pretty useless for other purposes, but that's probably also the
case with other interfaces, certainly with the one proposed below.
>It might be an idea to return the address of a structure, or
>do ( c-addr u -- x-addr type ) where addr is the address of a
>transient buffer, aligned for the biggest possible item (SS2,
>Altivec, ..). The value of the possible type indicators
>might need to be standardized (enumerated ;-)
So this would limit us to the enumerated types, it would require us to
provide a buffer, we would have to store into that buffer, only to
have the system decode the type indicator and fetch the data from the
buffer to finally arrive at the variable stack effect that you wanted
to avoid. This appears to be complicating the interface for no gain
(or rather, for a negative gain, because the complicated interface
restricts the types).
The issue is that the hook is part of the text interpreter (EVALUATE,
INCLUDE, LOAD, QUIT), which is variable-stack effect by nature, and
it's variable stack effect comes from interpreting numbers, among
other things.
.. 8.3.2 Text interpreter input number conversion
.. When the text interpreter processes a number that
.. is immediately followed by a decimal point and is
.. not found as a definition name, the text interpreter
.. shall convert it to a double-cell number.
There are implementations that allow a decimal point (or more, or e.g. a
comma) before the end of the numeric string. Are they IYO not ISO?
Seconded.
--
Regards
Alex McDonald
The implementation is Forth-94 wrt numeric conversion as long as it
correctly converts numbers formed as described in the Forth-94
standard. What it does with input that is neither in the dictionary
nor a well-formed Forth-94 number is entirely up to the
implementation.
It is *programs* that rely on the decimal point in the middle that
have an environmental dependency.
I know. Put it down to the excitement of having met a proposal in its
entirety in advance of the pack.
> but in an earlier instance of the number prefixes discussion
> this has come up, and there was the opinion that this kind of stuff is
> so system-specific that it would be impossible to standardize. At
> first I agreed, but later I thought that this is not necessarily true:
> The specification of the way that the text interpreter works does not
> leave that many freedoms to system implementors, so they should be
> able to integrate well-designed hooks without too much effort.
>
> So, what might such a hook interface look like?
>
> Essentially the program has to provide a word (let's call it a
> recognizer) that takes a string (for the "word" that was not found),
> and returns a flag that indicates whether the string was recognized or
> not. In addition, the word may do things to the stack below the
> string/flag (e.g., push literal numbers) or compile things (e.g.,
> literal numbers). So, such a recognizer would have the stack effect:
>
> ( i*x c-addr u -- j*x f )
>
> [Yes, let's not design counted strings into this interface]
V4 of Win32Forth used a similar flag to indicate whether the
conversion was possible or not. The variable stack depth for various
number types (single, double, float -- with a separate fp stack etc)
and its lack of extensibility led me to redesign the recogniser for V6
and the STC version. It now always has the stack effect
( c-addr u -- d )
where there may be side effects on certain other variables, such as
indicators for double or single or float. Side effects don't include
compilation and nothing is done on the stack under the string.
Flags are not returned; if the conversion can't be done, the
recogniser executes a THROW. Recognisers are executed in series until
either one returns success or all have reported failure; if so, a
final -13 throw is percolated up to the caller. This simplified the
coding of the recogniser enormously, as the result stack can be
ignored completely on failure to convert, and a THROW can be executed
anywhere at any stack depth.
>
> In addition, the interface should support stacking of recognizers, so
> that one library can provide support for recognizing time syntax,
> while another library provides support for complex numbers. The
> program should be able to specify in which order the recognizers
> should be processed (for cases where the recognized syntax overlaps).
Win32Forth has the concept of a list of XTs (a chain) that are
executed in series; the chain has words for adding to the front or the
end of the list.
new-chain number?-chain
number?-chain chain-add base-number? \ #$%n
number?-chain chain-add quoted-number? \ 'c'
number?-chain chain-add ... \ etc
' new-number is number \ replace normal number
conversion
\ with the chain
The number?-chain is executed left to right until success of one or
all have failed.
base-number? quoted-number? hex-number? 0x-number? wincon-number? ip-
number? float-number?
As a side note, the wincon-number? accepts words that are Windows
constants, so that (for instance) WM_PAINT returns its hex value
without having to be defined in the normal dictionaries (it uses a DLL
to do the lookup).
>
> In that context, the ordinary dictinary search and
> execution/compilation of words by the text interpreter is just another
> recognizer that is normally run first, before the integer and FP
> recognizers.
>
> Moreover, one should be able to remove a recognizer (e.g., it may be
> useful in some context, but not in general).
This can be done as the W32F chain is just a list, but there's no word
for it.
>
> This stacking stuff reminds me of the search order for wordlists, so
> the stacking interface should be similar to that.
>
> Thinking further (probably beyond what has even a slight chance of
> being standardized), one could have a common abstraction for
> recognizers and wordlists, and hierarchical search orders. Then the
> dictionary search would be implemented as a sub-search-order of the
> recognizer stack, that just contains the normal wordlist search order.
>
> One thing to consider with this kind of interface is whether we want
> to have just one STATE-smart recognizer word per syntax, or have two
> recognizer words, one for each state, or maybe something else (maybe
> something that would also support using the recognizers inside ]]
> ... [[).
Not sure what you mean by state; the number conversions in W32F are
state-free.
The variable stack effect of the current number conversion word in
Gforth sucks, too, but I believe that this can be alleviated with the
right factoring. I guess I'll have to code it in order to prove it.
>Flags are not returned; if the conversion can't be done, the
>recogniser executes a THROW. Recognisers are executed in series until
>either one returns success or all have reported failure; if so, a
>final -13 throw is percolated up to the caller. This simplified the
>coding of the recogniser enormously, as the result stack can be
>ignored completely on failure to convert, and a THROW can be executed
>anywhere at any stack depth.
That's an interesting idea. Of course, the cost is in generating and
destroying an exception frame for every recognizer that is tried,
which may or may not add significantly to the compilation time
(probably not a big issue on current machines).
So what would the recognizer loop look like? In the flag case
somewhat like this:
: run-recognizers ( i*x c-addr u -- j*x )
recognizers 2@ ?do ( c-addr u -- )
2dup 2>r i @ execute if \ recognized
2r> 2drop unloop exit
then
2r> cell +loop
-13 throw ;
With a similar throwing word:
: run-recognizers ( i*x c-addr u -- j*x )
recognizers 2@ ?do ( c-addr u -- )
2dup 2>r i @ catch 0= if \ recognized
2r> 2drop unloop exit
then
2drop 2r> cell +loop
-13 throw ;
Not a big difference here. Maybe in the individual recognizers, but I
think that's a question of getting the factoring right.
[stacking of recognizers]
>Win32Forth has the concept of a list of XTs (a chain) that are
>executed in series; the chain has words for adding to the front or the
>end of the list.
>
>new-chain number?-chain
>number?-chain chain-add base-number? \ #$%n
>number?-chain chain-add quoted-number? \ 'c'
>number?-chain chain-add ... \ etc
>
>' new-number is number \ replace normal number
I guess you mean "' number?-chain" instead of "'new-number" here,
right?
>> One thing to consider with this kind of interface is whether we want
>> to have just one STATE-smart recognizer word per syntax, or have two
>> recognizer words, one for each state, or maybe something else (maybe
>> something that would also support using the recognizers inside ]]
>> ... [[).
>
>Not sure what you mean by state; the number conversions in W32F are
>state-free.
At one point, it has to be decided whether to put the number on the
stack (in interpret state), or whether to compile it (in compile
state), and how to compile it (LITERAL, 2LITERAL, FLITERAL, or
something else). I guess you do this after the recognizer stage
through some of the flags you touch, but for a general interface, this
has to be done either in the recognizer, or a general way to do it has
to be provided.
I can see how to do this for number parsing: Let the recognizer also
return the xt of an appropriate compiling word. This will be dropped
in the interpreter, executed by the compiler, and executed and
compile,d in the ]] loop, e.g., like this:
\ recognizers now have the stack effect ( c-addr u -- false | j*x xt true )
\ where xt has the stack effect ( j*x -- ; run-time: -- j*x )
: interp-recognizers ( c-addr u -- j*x )
recognizers 2@ ?do ( c-addr u -- )
2dup 2>r i @ execute if \ recognized
drop 2r> 2drop unloop exit
then
2r> cell +loop
-13 throw ;
: comp-recognizers ( c-addr u -- ; run-time: -- j*x )
recognizers 2@ ?do ( c-addr u -- )
2dup 2>r i @ execute if \ recognized
execute 2r> 2drop unloop exit
then
2r> cell +loop
-13 throw ;
: ]]-recognizers ( c-addr u -- ; much later: -- j*x )
recognizers 2@ ?do ( c-addr u -- )
2dup 2>r i @ execute if \ recognized
dup execute compile, 2r> 2drop unloop exit
then
2r> cell +loop
-13 throw ;
If we want to support fully general recognizers, including dictionary
search, this becomes quite a bit harder because interpretation
semantics of a word could be completely different from compilation
semantics, so one would have to return an xt for interpretation
semantics, maybe a compilation token for compilation semantics, and
maybe something more if we want to support ]], too. In practice, and
in many systems, they are not completely independent, and one may be
able to make something out of that (and maybe just ask users not to
use parsing words inside ]]); but if I am going to implement it, it's
going to be on Gforth, so I'll have to think of an approach that works
or at least coexists with the generality of INTERPRET/COMPILE:. I
need to think some more about that.
> >Not sure what you mean by state; the number conversions in W32F are
> >state-free.
>
> At one point, it has to be decided whether to put the number on the
> stack (in interpret state), or whether to compile it (in compile
> state), and how to compile it (LITERAL, 2LITERAL, FLITERAL, or
> something else).
In 4p the number conversions are also state free. The decision whether
to compile it or not is done by the class of the word the number
conversion returns. A number in 4p is simply processed as a word that
returns from "find" an "execution token" in terms of ANS. In usual
cases the number conversion will return an "execution token" to
(number) .
Regards,
Helmar
I gave up trying to fix it and re-designed it; it was easier.
>
>> Flags are not returned; if the conversion can't be done, the
>> recogniser executes a THROW. Recognisers are executed in series until
>> either one returns success or all have reported failure; if so, a
>> final -13 throw is percolated up to the caller. This simplified the
>> coding of the recogniser enormously, as the result stack can be
>> ignored completely on failure to convert, and a THROW can be executed
>> anywhere at any stack depth.
>
> That's an interesting idea. Of course, the cost is in generating and
> destroying an exception frame for every recognizer that is tried,
> which may or may not add significantly to the compilation time
> (probably not a big issue on current machines).
Certainly not an issue on most of the machines I can get my hands on;
W32F compiles several thousand lines of code in millisecond times. Most
compiled code is words, which are found in the dictionary.
A full-blown catch/throw isn't required; it could be a lightweight
version. W32F saves/restores several system variables in the exception
frame, and most don't need to be protected for this application.
Actually, my bad, I didn't post the code for new-number; however, it's
very similar to your run-recognisers (throw variety). The number?-chain
is just a single linked list that new-number runs through. number does
the conversion, and other words decide how to interpret/compile the
resulting number later.
>>> One thing to consider with this kind of interface is whether we want
>>> to have just one STATE-smart recognizer word per syntax, or have two
>>> recognizer words, one for each state, or maybe something else (maybe
>>> something that would also support using the recognizers inside ]]
>>> ... [[).
>> Not sure what you mean by state; the number conversions in W32F are
>> state-free.
>
> At one point, it has to be decided whether to put the number on the
> stack (in interpret state), or whether to compile it (in compile
> state), and how to compile it (LITERAL, 2LITERAL, FLITERAL, or
> something else). I guess you do this after the recognizer stage
> through some of the flags you touch, but for a general interface, this
> has to be done either in the recognizer, or a general way to do it has
> to be provided.
The separation between parsing numbers and compilation makes this a
decision for much later in W32F, and yes, the flags/values play an
important part in deciding which of the 3 types are to be compiled;
cell, 2 cell or float (W32F supports both 8 and 10 bytes floats, but not
simultaneously).
It may be a generalisation too far. I've accepted that words in
dictionaries, and numbers that aren't, have to be parsed and dealt with
separately, and oft times require ugly but essential supporting global
variables. Structs could help though.
>
> - anton
--
Regards
Alex McDonald
INCR is, in code I've seen, the equivalent of ( addr ) 1 swap +! .
You'll have difficulty carrying this forward I suspect.
>
> The 2+ and 2- things are no longer needed and superseded by cell+ and
> cell-.
2+ and 2- never were in the standard. And my cell is much bigger than
yours (as the prisoner might have said to his neighbour).
>
> Regards,
> -Helmar *just likes this discussion*
>
>
--
Regards
Alex McDonald
It's never unfair to post code. But I note it doesn't handle doubles,
nor does it correctly handle the string -,,,,, as it returns 0 and no error?
> Btw, to hook this into default interpreter do:
>
> :: scratch$ >xnum
> convert-error @ if drop [''] word? ;then
> [''] (number) tuck body^ ! ; Numbers >cfind
>
> Regards,
> -Helmar
>
--
Regards
Alex McDonald
: 0- 0 - ; immediate
: 0- over nip ;
It has lots of friends.
--
Regards
Alex McDonald
>
> Why, who's still using octal in this century? I'm perfecty comfortable with
> & as an alternative to CHAR or [CHAR]
>
Oddly, Intel. I struggled to understand the madness that is the Intel
opcodes; but if you look at them in octal, the symmetry that you can't
see in hex springs out of the page in octal. It also makes it easy to
represent the mod-r/m byte and the sib byte; they're octal too.
--
Regards
Alex McDonald
erk; dup nip not over nip. oops.
--
Regards
Alex McDonald
Of course, but in the source of my i86 assembler I use binary with the
%-prefix, only a little bit more difficult to read than octal ;-)
I can't use the sib-byte, my Forth is still %10000 bit. I was too hasty
about mentioning 'century', I guess ;-(
Your INCR function is called TALLY in many systems.
1+ and 1+ are used in many, many programs. I see no need to rename
them, and suddenly rendering them as numbers would cause a nightmare of
silently broken programs.
Cheers,
Elizabeth
--
==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310-491-3356
5155 W. Rosecrans Ave. #1018 Fax: +1 310-978-9454
Hawthorne, CA 90250
http://www.forth.com
"Forth-based products and Services for real-time
applications since 1973."
==================================================
Perhaps it could be added to the TOOLS or TOOLS EXT wordsets.
...
>>> 'a' \ equivalent to char a or [char] a
>> This is a gforth-ism which is not in widespread use.
>
> Actually Gforth-0.6.2 does not support it, so it's not a Gforthism at
> all. iForth, CHForth, Win32Forth (even the ancient 4.x version),
> Gforth-0.6.9 and probably current bigForth support it, so it is widely
> supported and my impression is that it is also widely used (IIRC I
> have seen it in programs posted here, and not just by me).
>
>> Make it a separate proposal.
>
> Now or never, at least as far as I am concerned.
>
> So we have one voice against '<char>'. Any more? Any voices in
> favour?
FORTH, Inc. and customers have used '<char>' as a naming convention for
words that insert a character into an output number string, e.g.,
: '$' ( -- ) [CHAR] $ HOLD ;
It's a popular bit of syntactic sugar that makes the 'pictured' number
conversion more picturesque.
I'd resist standardizing this usage for character literals. It isn't
needed and solves no real problems.
Open Firmware has ASCII (state-smart CHAR/[CHAR]) and a word that I
really like, CONTROL which is like ASCII but masks the low 5 bits of the
character, e.g. CONTROL Q
...
> Open Firmware has ASCII (state-smart CHAR/[CHAR]) and a word that I
> really like, CONTROL which is like ASCII but masks the low 5 bits of the
> character, e.g. CONTROL Q
Aww, nerts! I thought I invented that! :-)
Jerry
--
Engineering is the art of making what you want from things you can get.
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
Maybe you did and they picked it up ;-)
> It's never unfair to post code. But I note it doesn't handle doubles,
> nor does it correctly handle the string -,,,,, as it returns 0 and no error?
It's imho not the task of the system to say that this would be an
error. I've included some checking with '-'-sign only because of 1-
and 2- (where 2- is more or less useless in non-16bit systems).
Valid numbers in my view are also:
0,1,1,1,2
,,,4,3
or even
,,,,
Why not? Otherwise the system does a lot of useless work with fixing
probable errors of the programmer. In Forth you do programming in
small pieces. You've to test it early.
But that's not prefix related only of course.
Regards,
-Helmar
> 0,1,1,1,2
> ,,,4,3
> or even
> ,,,,
So in giving a 65C02 opcode, %001,000,00 would be a valid number?
I have no problem with it being a valid number. That's not
necessarily a valid opcode, though. Checking the appropriateness
of a number for a specific purpose is a separate issue.
[snip]
Is there any existing practice using "<any character except quote>"
for string literals?
Gerry
Certainly. My dynamic strings library, including the ANS Forth
implementation, defines S`, CAT`, and $`. It also includes a
stateful word PARSE>S that does the analog of S" for a character
on the stack. Thus
: s` ( "ccc<`>" -- ) [char] ` parse>s ; immediate
The choice of "`" as an alternate delimiter has worked well for
me.
-- David
> I have no problem with it being a valid number. That's not
> necessarily a valid opcode, though. Checking the appropriateness
> of a number for a specific purpose is a separate issue.
But it does ease detection of typos in defining masks, since the
bottom two bits distinguish between the orthogonal set of accumulator
arithmatic operations, the almost orthogonal set of index register
operations (but on a different scheme), and the unallocated 6002
instructions at "%,,11" that are used for bit operations in the most
recent versions of the 65C02.
Oh, and yes, $20 is a valid opcode.
> Certainly. My dynamic strings library, including the ANS Forth
> implementation, defines S`, CAT`, and $`.
I think the question was detecting
"this is a literal string"
as a literal string without a preceding parsing word. I believe there
is existing practice, though there is a standardization issue if some
have an escape for " and others do not ... in addition to questions of
some being compile only, and implementations returning the address to
a counted string on the stack, an address and count on the stack, or
putting the string on a string stack.
AFAIU, this is part of the situation which led to the Forth-94
standard S" word, since " as a word was a bit tangled up.
If ,, is a number and , isn't, and -, is a number and - , isn't, then
that's a degree of acceptable "numberness" that I can't see having any
advantage except the power to confuse. And all in the name of a few
lines of code.
There are large classes of "numbers" that I can see being useful -- /if
correctly parsed/ -- but allowing any old set of non-contributory
delimiters such as , seems (to me) to allow just about anything that
isn't a word to return 0 instead. Or even having -,, and ,, represent
different things if you're unfortunate enough to be working on a system
that represents -0 and 0 differently.
--
Regards
Alex McDonald
'<char>' is equally picturesque and popular, which seems to be
sufficient if syntactic sugar is your criteria. Your objection seems to
be prior alternative usage. How widespread is it?
>
> Open Firmware has ASCII (state-smart CHAR/[CHAR]) and a word that I
> really like, CONTROL which is like ASCII but masks the low 5 bits of the
> character, e.g. CONTROL Q
>
> Cheers,
> Elizabeth
>
>
The prefix/postfix technique and some parse rules allows for better
checking for a much larger range of numbers. Parsing at number time
seems to me to remove the necessity of the other kind of parsing; such
as CHAR and [CHAR] and state-smartness. I've worked on systems where
strings such as
30/JAN/2007:12:30:00.0DT
are recognisable as numbers -- without having to write
DATETIME 30/JAN/2007:12:30:00.0
and having a [DATETIME] equivalent too.
--
Regards
Alex McDonald
The problem of "everything could be a number" you've since long days
in Forth.
For example, if you read the word
AFFE
somewhere in your source, you know it is something that triggers a
specific behaviour of the system. If you do not have an AFFE in your
dictionary, the system tries to interpret it in some way. In case your
BASE variable has a suitable value, it is a number.
> There are large classes of "numbers" that I can see being useful -- /if
> correctly parsed/ -- but allowing any old set of non-contributory
> delimiters such as , seems (to me) to allow just about anything that
> isn't a word to return 0 instead. Or even having -,, and ,, represent
> different things if you're unfortunate enough to be working on a system
> that represents -0 and 0 differently.
It's easier to correctly parse if the rules are simple. The more
simple the rules the less implementation specific differences you'll
find. Also the pressure for people that implement very small systems
is reduced to find a simpler (maybe not conforming) solution.
Well, Chuck invented this notation at NRAO in the early 70's, and it's
been featured in all FORTH, Inc. products and manuals ever since.
There's no way to measure exactly how many apps have used it, but we've
sold many thousands of systems and books such as Forth Programmer's
Handbook and Forth Application Techniques in which this usage was
encouraged.
> The prefix/postfix technique and some parse rules allows for better
> checking for a much larger range of numbers. Parsing at number time
> seems to me to remove the necessity of the other kind of parsing; such
> as CHAR and [CHAR] and state-smartness. I've worked on systems where
> strings such as
>
> 30/JAN/2007:12:30:00.0DT
>
> are recognisable as numbers -- without having to write
>
> DATETIME 30/JAN/2007:12:30:00.0
>
> and having a [DATETIME] equivalent too.
Well, all FORTH, Inc. products since the early 70's have allowed extra
punctuation in numbers, too, although not month names and notation like
DT. The characters , . - (anywhere except before the most significant
digit) / : and + (before the most significant digit) will all make the
number convert as a double integer. It's very convenient. Common
things like 1,234.56, 229-48-0332, 8/03/1940, or 12:30:00.0 are all
perfectly good numbers.
I don't particularly advocate standardizing FORTH, Inc. usage (although
I wouldn't object ;-) but I'd be irresponsible (to our users) not to
object to something in a standard that would render it illegal.
which, I feel bound to point out, is a country mile from -, .
(As an aside, 0xDEADBEEF was a popular hex value when I was an IBM
sysprog in the early 70s. It was often used as a PSW (the S360
equivalent of the instruction pointer) to halt the machine on error, as
it was odd, as the PSW only permitted even values.)
>
> somewhere in your source, you know it is something that triggers a
> specific behaviour of the system. If you do not have an AFFE in your
> dictionary, the system tries to interpret it in some way. In case your
> BASE variable has a suitable value, it is a number.
>
>> There are large classes of "numbers" that I can see being useful -- /if
>> correctly parsed/ -- but allowing any old set of non-contributory
>> delimiters such as , seems (to me) to allow just about anything that
>> isn't a word to return 0 instead. Or even having -,, and ,, represent
>> different things if you're unfortunate enough to be working on a system
>> that represents -0 and 0 differently.
>
> It's easier to correctly parse if the rules are simple. The more
> simple the rules the less implementation specific differences you'll
> find. Also the pressure for people that implement very small systems
> is reduced to find a simpler (maybe not conforming) solution.
Conforming is one thing; permitting arbitrary non-conforming strings is
another. Hence my objection to ,, and his infinite army of friends.
As to ease of parsing, it's trivial if you understand state machines. It
takes little to no more code to achieve a superior end; reduction of
confusion both for the author /and/ the maintainer of the code, for whom
people seem to have little regard.
--
Regards
Alex McDonald
>
> Well, all FORTH, Inc. products since the early 70's have allowed extra
> punctuation in numbers, too, although not month names and notation like
> DT. The characters , . - (anywhere except before the most significant
> digit) / : and + (before the most significant digit) will all make the
> number convert as a double integer. It's very convenient. Common
> things like 1,234.56, 229-48-0332, 8/03/1940, or 12:30:00.0 are all
> perfectly good numbers.
+5 is a double?
>
> I don't particularly advocate standardizing FORTH, Inc. usage (although
> I wouldn't object ;-) but I'd be irresponsible (to our users) not to
> object to something in a standard that would render it illegal.
>
> Cheers,
> Elizabeth
>
--
Regards
Alex McDonald
So, what exactly would you propose? A rule requiring at least one valid
digit?
Cheers,
Elizabet
Yep. It's really easier (to explain and to understand) if you specify
that punctuation makes a double, rather than to make more complex rules
as to when it might or might not make a double.
If you're writing user interface code and need to know what the user
typed, you can always query DPL after a number conversion. If it's
non-negative you have a double.
So -5 is double too? Either that, or '-' isn't punctuation but '+' is.
...
You got there first. This has me flummoxed.
--
Regards
Alex McDonald
No, - before the most significant digit is a sign. See above.
Everything else is punctuation. Yeah, ok, a small inconsistency. But
it doesn't seem to trouble folks (at least, we don't get any complaints
on our email groups or in classes). If anyone complained, we probably
would treat + as a sign too, but they never have. People don't use +
much except for adding.
Yes; it's a good visual indicator that you're in the right part of an
otherwise large universe of symbols. Call me old fashioned...
--
Regards
Alex McDonald
Well, I could live with one needed digit. But I've to note that the
number of possible symbols does not notably decrease with this.
Regards,
-Helmar
Yes. I don't think this is common, but at least Albert van der Horst
supports that in his systems.
- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2007: http://www.complang.tuwien.ac.at/anton/euroforth2007/
The point isn't so much reducing the number of possible symbols as
ensuring that at least one of the set that's most relevant to number
conversion is there. As much as I value punctuation, punctuation alone
does not make a meaningful number.
The fact that a digit-less number has been valid on many systems for
many years without cries of outrage indicates that it's not a common
error or source of grief for folks. But it's also true that addition of
a rule requiring at least one digit would close the loophole and not
cause anyone much grief, either.
However, I leave it to someone else to propose wording.
Cheers,
Elizabeth
3.4.1.3 Text interpreter input number conversion
When converting input numbers, the text interpreter shall recognize both
positive and negative numbers, with a negative number represented by a
single minus sign, the character -, preceding the digits. The value in
BASE is the radix for number conversion.
8.3.2 Text interpreter input number conversion
When the text interpreter processes a number that is immediately
followed by a decimal point and is not found as a definition name, the
text interpreter shall convert it to a double-cell number.
These suggest that numbers are identified by one or more digits;
3.4.1.3 states - "preceding the digits" which would not be the case
without at least one digit.
8.3.2 explicitly does not permit -. (minus dot).
It may be valid on various systems, but code written for those features
will not be ANS compliant programs.
--
Regards
Alex McDonald
OK. As I wrote, I can live with the need for one digit.
With some HelFORTH/4p-isms, I now came to this conclusion /
experiment:
--------------------------------------------------
: bin 2 base ! ;
: octal 8 base ! ;
: decimal 10 base ! ;
: hex 16 base ! ;
variables| convert-error convert-sign convert-digit |
: >bin dup '9 > if 32 or 87 else 48 then -
dup 0 < over base @ >= or
and: convert-error on drop 0 ;
: (prefix) dup 1 = and: convert-error on ;
: (suffix) convert-digit @ or: convert-error on ;
: >num convert-sign off convert-digit off
dup 0= convert-error !
base @ >r 0 >r while: over c@
case
'- of convert-digit @ 0= if convert-sign on then
(prefix) endof
'% of (prefix) bin endof
'& of (prefix) octal endof
'# of (prefix) decimal endof
'$ of (prefix) hex endof
', of (prefix) (suffix) endof
'/ of (suffix) endof
': of (suffix) endof
base @ 13 < over 'c = and if
(suffix) r> cells >r
else
convert-digit on
dup >bin r> base @ * + >r
then
endcase
s++ repeat drop r> r> base !
convert-sign @ and: negate ;
: no-num? convert-error @ convert-digit @ 0= or ;
: (n#) base @ swap base !
parse-name >num no-num? abort" wrong number"
swap base ! ?literal ;
: h#` 16 (n#) ;
: d#` 10 (n#) ;
: o#` 8 (n#) ;
: b#` 2 (n#) ;
context Numbers
:: scratch$ >num no-num? if drop [''] word? ;then
[''] (number) tuck body^ ! ; Numbers >cfind
--------------------------------------------------
You might criticize that my punctuation does not produce doubles. But
that's not my point at the moment (I dont want doubles for other
reasons - eg. 32bit or 64bit are enough ;) ).
> The fact that a digit-less number has been valid on many systems for
> many years without cries of outrage indicates that it's not a common
> error or source of grief for folks. But it's also true that addition of
> a rule requiring at least one digit would close the loophole and not
> cause anyone much grief, either.
Well, I dont understand the critics completely. Nobody would be forced
to use
%%%%%%%%%%%%%%%%
&&&&&&&&&&&&&&&
%%%%%%%%
############
,,,,,,,,,,,,,
,%,%,%,%,%,%
as representation of 0 (zero). I dont think a system should error
correct the programmer.
It will be always the case that you can not use for example a simple
Perl-Script to recognize what's a number in a Forth source and what is
not. It would be nice if Alex does not argue with "oldfashioned" or
similar but brings some real reasons. The "AFFE"-example shows that's
impossible to easily recognize what's a number or not. If there is
puctuation added to numbers, it simply extends the set of possible
characters usable in a number. The possibility to contruct arcane
things with it should be no messure about usefulness.
Regards,
-Helmar
There's no denying that "numbers" without digits are meaningless and
almost certainly erroneous. But although this text does make the
reasonable assumption that there will be digits, there is no explicit
*requirement* for digits, or even a statement that a digitless "number"
is "ambiguous". "Code written for those features" will not only not be
portable (indeed, code assuming punctuation other than a decimal point
at the far right is not portable), but systems are not required to abort
or provide any particular handling for this case. The Standard does not
mandate either bug-free code or perfect user input.
To put it another way, I don't regard being able to type ,,,,, and get a
double-length zero as a valuable entitlement, I regard typing that as an
error. It's just a very rare kind of error that some systems don't
explicitly handle and for which the Standard mandates no particular
response.
To avoid further discussion about 2/ , I've changed
> '/ of (suffix) endof
to
'/ of (prefix) (suffix) endof
Regards,
-Helmar
Download experimental work in progress version of 4p 1.2:
> http://maschenwerk.de/dl/4p-1.2-wip.tgz
The original bigFORTH-ism was 'a is equal to char a/[char] a (without
closing '). Gforth still supports this, and bigFORTH up to now only
supports the version without closing '. I'm not opposed to 'a', but the
prefix-only solution is slightly easier to implement.
Note that all the prefixes in question, i.e. #$%&' are a consecutive set of
ASCII codes. The code uses this fact:
Create bases #10 c, $10 c, %10 c, &10 c, 0 c,
\ 10 16 2 10 char
: getbase ( addr u -- addr' u' ) over c@ '# - dup 5 u<
IF bases + c@ base ! 1 /string ELSE drop THEN ;
Char base (0) is a special case, though. The question where to put the sign
means that you have to check for sign/base/sign to be compatible (unless
base is 0, then you omit your second sign check). ^C for control chars
needs some more special code, as well, since the caret is not consecutive.
Since ' is already too special case, it's probably better to have these
handled by other logic, anyway; changing base is not the wisest solution.
--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/
> Jerry Avins wrote:
>> Elizabeth D Rather wrote:
>>
>> ...
>>
>>> Open Firmware has ASCII (state-smart CHAR/[CHAR]) and a word that I
>>> really like, CONTROL which is like ASCII but masks the low 5 bits of
>>> the character, e.g. CONTROL Q
>>
>> Aww, nerts! I thought I invented that! :-)
>>
>> Jerry
>
> Maybe you did and they picked it up ;-)
In bigFORTH, it's called CTRL, and it's not inspired by OF (and probably
also not the other way round). It does not mask the lower 5 bit of the
character, but converts it to uppercase and xors it with $40. CTRL ? gives
127, and should do that.
What problem does the '<char>' literal solve? It looks better than CHAR
<char> or [CHAR] <char> and it's not affected by state in the positive
sense.
Extending the number conversion in such a way does not affect users who
define '$' or something like that - words are still found before the number
conversion is done.
bigFORTH's and Gforth's rule is that you need at least one digit between two
punctations. There also must be at least one digit before the first
punctation.
> Gerry <ge...@jackson9000.fsnet.co.uk> writes:
>>Is there any existing practice using "<any character except quote>"
>>for string literals?
>
> Yes. I don't think this is common, but at least Albert van der Horst
> supports that in his systems.
In Albert's ciforth quotes within strings are possible by using them twice:
"this is ""no"" test." TYPE
this is "no" test.
The current (still unavailable) version of CHForth has this too.
--
Coos
CHForth, 16 bit DOS applications
http://home.hccnet.nl/j.j.haak/forth.html
> Op Sun, 05 Aug 2007 19:13:10 GMT schreef Anton Ertl:
[..]
>> Yes. I don't think this is common, but at least Albert van der Horst
>> supports that in his systems.
> In Albert's ciforth quotes within strings are possible by using them twice:
> "this is ""no"" test." TYPE
> this is "no" test.
> The current (still unavailable) version of CHForth has this too.
In iForth it is supported when miscutil.frt is allowed to load.
FORTH> S" this is ""no"" test." TYPE this is "no" test. ok
I prefer to switch to the better optics of S~ this is ""no"" test.~ TYPE .
Of course there is then a problem with wanting to use both &~ and &" in a
string, but that is relatively uncommon.
-marcel
>From this and other replies it doesn't look like it would be common
enough to warrant extending this RfD to include strings, let alone
resolving how to escape the " character.
Gerry
'a' is more widely adopted, that's why I selected that for the proposal.
Concerning ease of implementation, you could just take the value of
the first character after the prefix instead of trying to combine all
the following characters; then the trailing ' would just be ignored.
IIRC you wrote in some earlier discussion that this is necessary
anyway in connection with xchars.
In my current draft I put it in Section 3 (i.e. a part describing the
core), but described it as being an optional extension (see below). I
think that this is more readable than putting this part of number
parsing in yet another place. Here's my current wording:
|When converting input numbers, the text interpreter shall recognize
|integer numbers in the form <BASEnum>; if X:number-prefixes is
|present, the text interpreter shall recognize integer numbers in the
|form <anynum>.
(<anynum> is the same as "Convertible string" in the published proposal).
The way that this proposal (if accepted) will eventually be integrated
in the next standard will be decided by the committee, but if you have
any thoughts on this, I am sure the committee will consider and
appreciate them.
>FORTH, Inc. and customers have used '<char>' as a naming convention for
>words that insert a character into an output number string, e.g.,
>
>: '$' ( -- ) [CHAR] $ HOLD ;
>
>It's a popular bit of syntactic sugar that makes the 'pictured' number
>conversion more picturesque.
Fortunately this is not a conflict with the current proposal, as
programs defining and using these words work as intended on systems
that implement '<char>' as a character literal, since dictionary
lookup takes precedence over number conversion.
>I'd resist standardizing this usage for character literals. It isn't
>needed and solves no real problems.
It's a popular bit of syntactic sugar that makes literal characers
more literal and more characteristic:-).
Yes. Just for fun I tried out '$' and '%' on a few systems that
support these as number prefixes, and they accepted them and pushed 0
on the stack. My guess is that this is just an artifact of the
implementation, not a conscious feature, and that it has not caused
enough grief, if any, to fix it.
>But it's also true that addition of
>a rule requiring at least one digit would close the loophole and not
>cause anyone much grief, either.
>
>However, I leave it to someone else to propose wording.
As it happens, the proposal under discussion constains wording the
should make this even clearer than it is in Forth-94 (although it is
intended to have the same meaning for <BASEnum>):
|Convertible string := { <BASEnum> | <decnum> | <hexnum> | <binnum> | <cnum> }
|<BASEnum> := [-]<bdigit><bdigit>*
|<decnum> := #[-]<decdigit><decdigit>*
|<hexnum> := $[-]<hexdigit><hexdigit>*
|<binnum> := %[-]<bindigit><bindigit>*
However, as in Forth-94 there is no requirement for systems to reject
non-standard programs, so if a system feels like accepting numbers
without digits, double numbers that don't have the '.' at the end, or
additional double-number indicators, that system can still be a
standard system.
I think it's inadvisable to put any kind of optional extension in the
main body of the Standard. It should go in the appropriate *.3.* of
TOOLS or whatever wordset, with a reference to it in the main body.
>
>> FORTH, Inc. and customers have used '<char>' as a naming convention for
>> words that insert a character into an output number string, e.g.,
>>
>> : '$' ( -- ) [CHAR] $ HOLD ;
>>
>> It's a popular bit of syntactic sugar that makes the 'pictured' number
>> conversion more picturesque.
>
> Fortunately this is not a conflict with the current proposal, as
> programs defining and using these words work as intended on systems
> that implement '<char>' as a character literal, since dictionary
> lookup takes precedence over number conversion.
>
>> I'd resist standardizing this usage for character literals. It isn't
>> needed and solves no real problems.
>
> It's a popular bit of syntactic sugar that makes literal characers
> more literal and more characteristic:-).
Sure, I have no problem with the usage, I just don't think it needs
standardizing. CHAR and [CHAR] are adequate for the purpose. I get
more excited about things that solve problems that need solving.
Standard programs currently require all standard words and numbers
to be in upper-case. There is no reason for $ prefixed numbers to
be treated differently.
-----
The real issue is case sensitivity. Most forth systems today are
case-insensitive. Perhaps the time has come to allow standard
programs to follow suit.
Simple case-sensitive systems like eForth could still be 'standard'
but will have a dependency on programs being in the old ANS
format.
Did you want to be.
> >> 'a' \ equivalent to char a or [char] a
> >
> >This is a gforth-ism which is not in widespread use.
>
> Actually Gforth-0.6.2 does not support it, so it's not a Gforthism at
> all. iForth, CHForth, Win32Forth (even the ancient 4.x version),
> Gforth-0.6.9 and probably current bigForth support it, so it is widely
> supported and my impression is that it is also widely used (IIRC I
> have seen it in programs posted here, and not just by me).
Actually Gforth-0.6.2 for Windows has it (else how would I know)
My definition of widespread is that if neither Forth Inc nor MPE
has it, then it isn't widespread. Freeware/hobbyist systems aren't
always good models for a standard for all the usual reasons.
My observation is that instances of CHAR [CHAR] are so few
and far between that nothing is saved by using the 'c' form.
1. Portability
2. Maintainability
3. Parsimony
4. Readability
2 and 4 are different sides of the same coin; maintainability includes
the ability to parse (say with Perl or grep) programs to identify
specific constants and literals.
> The "AFFE"-example shows that's
> impossible to easily recognize what's a number or not. If there is
> puctuation added to numbers, it simply extends the set of possible
> characters usable in a number. The possibility to contruct arcane
> things with it should be no messure about usefulness.
Which I wouldn't use due to 2 above. I'm currently trying to extend a
Forth 32 bit assembler that has several tens of HEX literals of the
form FE FF 66 67; it's difficult. They would have been easier to spot
as $FE $FF $66 $67 etc.
>
> Regards,
> -Helmar
--
Regards
Alex McDonald
Good idea.
--
Regards
Alex McDonald
>
> Note that all the prefixes in question, i.e. #$%&' are a consecutive set of
> ASCII codes.
Neat, but is that true for $ in all mappings? Is it true with UTF-8?
--
Regards
Alex McDonald
Standard words and numbers do *not* have to be upper case. Standard
systems must recognize standard words in upper case (i.e., may not
*require* names to be lower case), but can be case-insensitive (3.4.2).
A program that uses lower-case names has a dependency. Whether
lower-case is recognized in hex is implementation-defined.
> -----
> The real issue is case sensitivity. Most forth systems today are
> case-insensitive. Perhaps the time has come to allow standard
> programs to follow suit.
>
> Simple case-sensitive systems like eForth could still be 'standard'
> but will have a dependency on programs being in the old ANS
> format.
I don't see any need to change the way things are in this regard. I
don't know that the present rules are onerous for anyone.
Sure, UTF-8 is just a superset of ASCII.
Gforth 0.6.2 does not have the functionality described in the RfD (on
any platform). And you could habe known that by reading the RfD, or by
trying it on Gforth and looking at the result.
>My definition of widespread is that if neither Forth Inc nor MPE
>has it, then it isn't widespread. Freeware/hobbyist systems aren't
>always good models for a standard for all the usual reasons.
What are your "usual reasons"?
>My observation is that instances of CHAR [CHAR] are so few
>and far between that nothing is saved by using the 'c' form.
[c8:~/gforth:9592] grep " '[^ ]" *.fs */*.fs|wc -l
443
[c8:~/gforth:9608] fgrep -i '[char]' *.fs */*.fs|wc -l
245
[c8:~/gforth:9609] fgrep -i ' char ' *.fs */*.fs|wc -l
141
I.e., more than 800 occurences of literal characters; the '<char>
syntax saves quite a lot here already, and this or the proposed syntax
could save quite a bit more.
In another corpus of code I find 1087 occurences of [CHAR] and 155
occurences of CHAR.
Searching for: [char] -- Found 711 occurrence(s)
Searching for: char -- Found 3982 occurrence(s)
( using regexp )
Searching for: '?' -- Found 6969 occurrence(s)
-marcel
It's a variable stack effect hole for plugging constant stack effect
words in.
As to the code, it will be something like
BEGIN 2R@ <get-recognizer> EXECUTE UNTIL
Hmmm, and the state sensitivity...
It will be the same LITERAL vs LITERAL, [LITERAL] metastory, but with
the recognizers.
With no common approach to POSTPONE semantics.
PS
The 'x' syntax is a good thing.
They do in a 'standard program' which is what I specified.
> Whether
> lower-case is recognized in hex is implementation-defined.
I'm not talking about systems. It's illegal in a 'standard program'.
ANS 3.2.1.2 Digit Conversion specifies that characters in
numbers must lie in the range 0..0, A...Z. Chars a...z are
currently excluded from numbers in a standard program,
including strings processed by >NUMBER.
> > -----
> > The real issue is case sensitivity. Most forth systems today are
> > case-insensitive. Perhaps the time has come to allow standard
> > programs to follow suit.
> >
> > Simple case-sensitive systems like eForth could still be 'standard'
> > but will have a dependency on programs being in the old ANS
> > format.
>
> I don't see any need to change the way things are in this regard. I
> don't know that the present rules are onerous for anyone.
It will be onerous when $ prefixed numbers can appear lower-case
in a standard program but other numbers must be upper-case.
How many systems today couldn't recognize a 'standard program'
if it was written in lower-case? Consequently what is stopping
*future* standard programs from being written that way?
It's the obvious next step. Why should standard programs be stuck
in uppercase if no-one needs it anymore?
Whether >NUMBER should recognize lower-case is a separate
consideration because that affects applications.
Very few widely used systems are case-sensitive nowadays. I object to
changing the rules mainly out of consideration for the minority systems,
some of which are used for a limited number of rather large applications
that already exist. It would be very hard for them to change.
> It's the obvious next step. Why should standard programs be stuck
> in uppercase if no-one needs it anymore?
>
> Whether >NUMBER should recognize lower-case is a separate
> consideration because that affects applications.
The practical solution for you is to consider what current
implementations you want your program to be able to run on: if they're
all case-insensitive, which is likely, just declare a dependency and
forge ahead. Your program probably already has a host of dependencies:
cell size, 2's complement, probably OS, source files, etc. Having a
dependency on case-insensitivity is unlikely to be very limiting for
your users.
I can assure you that 'a' is recognized by Gforth 0.6.2.
> >My definition of widespread is that if neither Forth Inc nor MPE
> >has it, then it isn't widespread. Freeware/hobbyist systems aren't
> >always good models for a standard for all the usual reasons.
>
> What are your "usual reasons"?
Freeware/hobbyist systems are often idiosyncratic, have unknown
user-bases, don't have to be practical, and are not subject to the
criticisms and pressure from users that commercial systems are.
Do I need to go on?
> >My observation is that instances of CHAR [CHAR] are so few
> >and far between that nothing is saved by using the 'c' form.
>
> [c8:~/gforth:9592] grep " '[^ ]" *.fs */*.fs|wc -l
> 443
> [c8:~/gforth:9608] fgrep -i '[char]' *.fs */*.fs|wc -l
> 245
> [c8:~/gforth:9609] fgrep -i ' char ' *.fs */*.fs|wc -l
> 141
>
> I.e., more than 800 occurences of literal characters; the '<char>
> syntax saves quite a lot here already, and this or the proposed syntax
> could save quite a bit more.
>
> In another corpus of code I find 1087 occurences of [CHAR] and 155
> occurences of CHAR.
There is nothing like that number to be found in the gforth/CHF/
Win32F distributions.
Now for the big question - how many of those found instances
did you personally type? In fact it is likely that you have typed
SWAP DROP THEN far more times than you will ever type
'c' CHAR [CHAR] .
If Forth Inc, MPE and a host of other systems never felt the need
for the 'c' form, I don't see why should it be thrust upon them by
having it included in the standard. And a Forth standard, at that :(
Actually the latter form you've suggested i.e. DATETIME and
[DATETIME] is simpler to program and it's faster.
Why? For the same reason CHAR [CHAR] is better than the
'c' form. Let's compare using Win32Forth's own source.
Here's the code for CHAR [CHAR]
: CHAR ( -- char )
BL WORD 1+ C@ ;
: [CHAR] ( -- char )
CHAR [COMPILE] LITERAL ; IMMEDIATE
Here's the code for the 'c' form:
: -ifzerothrow ( n -- n )
dup 0= throw ; \ -1 throw if zero length
: -ve-test ( addr len -- addr' len' ) \ skip possible - sign, set -ve-num?
-ifzerothrow \ stop if nothing to convert
over c@ [char] - = \ check for sign
if -ve-num? throw \ if already negative, throw
true to -ve-num?
1 /string \ bump past
-ifzerothrow \ nothing left is error
then ;
: quoted-number? ( addr len -- d1 ) \ 'x' type numbers
-ve-test \ might be negative
3 <> throw \ not 3 chars 'x'
dup dup c@ swap 2 + c@ \ fetch first and third chars
over = swap [char] ' =
and invert throw \ equal and ', otherwise error
1+ c@ 0 ; \ fetch the character
quoted-number? is linked into the number chain which means
that considerable code must be be executed before 'c' is finally
recognized.
In contrast CHAR [CHAR] exist in the dictionary and are thus
found quickly. Furthermore their argument is known to be a
character so the need for testing is entirely eliminated.
The speed and efficiency of forth lies in using the dictionary to
make decisions. Implementing the inefficient syntax of other
languages is not only unnecessary, it results in a hopelessly
complex and ever slower compiler.
If future let's start 'thinking forth' instead of trying to emulate
the syntax of C, BASIC, assembler etc. Forth syntax is simply:
[ numeric arguments ] operator [ named arguments ]
Should I mention that infixing $ # % into numbers isn't
particularly efficient? But it's too late to change that now.
Old systems become obsolete and one has consider whether supporting
them has reached the point where it's holding back the language. Old
apps may be need updating, but it's likely to be done on a new compiler
which is case-insensitive and has new features. I can't imagine anyone
updating a program in 2007 and having to ensure that it ran on a 1970's
compiler (assuming one could even be found :)
I guess it's up to Forth Inc, MPE and others who support old systems
to signal when they're ready.
The inefficency or otherwise of this compile time code is not the
question at hand. The issue is; shall we allow the programmer to write
'c' in place of [CHAR] c ? Let's discuss the advantages and
disadvantages of the RfD first, and if the consensus is that it is of
use, then let's discuss how to reasonably implement it.
>
> In contrast CHAR [CHAR] exist in the dictionary and are thus
> found quickly. Furthermore their argument is known to be a
> character so the need for testing is entirely eliminated.
>
> The speed and efficiency of forth lies in using the dictionary to
> make decisions. Implementing the inefficient syntax of other
> languages is not only unnecessary, it results in a hopelessly
> complex and ever slower compiler.
>
> If future let's start 'thinking forth' instead of trying to emulate
> the syntax of C, BASIC, assembler etc. Forth syntax is simply:
>
> [ numeric arguments ] operator [ named arguments ]
>
Thanks for the advice; but I would prefer that you separate out your
reasons for disliking the RfD from your criticisms of Win32Forth or a
specific programming style. Inspecting the RfD through the lens of
Win32Forth is not helpful.
> Should I mention that infixing $ # % into numbers isn't
> particularly efficient? But it's too late to change that now.
The proposal is for prefixing, not infixing.
--
Regards
Alex McDonald
Well, that's just an example of inefficient programming in Win32Forth, not
an example how you should do it. In bigFORTH, the additional code for
supporting all prefixes, including ', is the following:
Create bases #10 c, $10 c, %10 c, &10 c, 0 c, T
\ 10 16 2 10 char
: getbase ( addr u -- addr' u' ) over c@ '# - dup 5 u<
IF bases + c@ base ! 1 /string ELSE drop THEN ;
a "getbase" inside >number, and a check for base=0 in digit?, in which case
it just returns true, and another short special case for base=0 in
accumulate. And I think this is even overkill, because all you need is to
insert the following code in the number converstion:
getbase base @ 0= IF
over xc@+ nip s>d 2swap
1 +x/string s" '" str= dpl ! rdrop EXIT THEN
which includes the sanity check for the closing ', as well (which is checked
by looking at dpl further on - -1 means "single", 0 means "invalid").
Without sanity check (which equals the use of CHAR), it's just that code:
getbase base @ 0= IF drop xc@+ nip s>d rdrop EXIT THEN
And this includes all the prefixes discussed here.
Forth systems from Forth Inc. or MPE also have their fair share of
idiosyncratism, and I only know few of their users (I think I know no Forth
Inc. user in person, and only one MPE user) - but I know some of my users,
and get feedback from them, as well. If you sell your Forth system, you
won't get feedback from most users, either (so "know your user base" only
means you have a list of addresses, and you don't know the people who have
an illegal copy).
And it's not that a "freeware" system only has hobbyist users.
Note that in the embedded C world, GCC has pretty much eliminated most of
the non-free offerings. There are still some compilers around for CPUs
which are too small for GCC (like Keil), or too difficult to generate code
for with GCC's algorithm (like some DSPs), but that's it. So to say, you
can argue that GCC is a commercial system, too, because Cygnus (now owned
by Red Hat) drives GCC development, and makes money by support and porting
work for GCC.
> Very few widely used systems are case-sensitive nowadays. I object to
> changing the rules mainly out of consideration for the minority systems,
> some of which are used for a limited number of rather large applications
> that already exist. It would be very hard for them to change.
The question is rather: Do these minority systems really need to be updated
to Forth 200x? I think the only reason they still exist are the limited
number of rather large applications. I know people who have applications
running on LMI Forth, which is maybe Forth-83, and unsupported for quite
some time. They won't update anything, not even to ANS Forth, the system
runs on MS-DOS.