Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

type

540 views
Skip to first unread message

NN

unread,
Apr 2, 2021, 8:29:26 AM4/2/21
to
https://forth-standard.org/standard/core/TYPE

For suggested ref implementations how about.

: type ( a u -- )
swap >r begin dup 0> while
r> count emit >r 1-
repeat drop r> drop ;

Brian Fox

unread,
Apr 2, 2021, 9:06:48 AM4/2/21
to
On ITC systems DO/LOOP is faster because the comparison and branching
and return stack operations are not separated by NEXT.

So this is arguably better on those systems:

: TYPE ( caddr u --) ( PAUSE) OVER + SWAP DO I C@ EMIT LOOP ;

And if OVER + SWAP is a primitive in the system like BOUNDS
it's even smaller and faster.

I am acutely aware of these differences working with a 3 MHz processor.
:)



NN

unread,
Apr 2, 2021, 11:41:32 AM4/2/21
to
Fair enough.
BTW, I think your example would fail on an empty string which is the
examples in the link use ?do

: type ( a u -- ) abs bounds ?do i c@ emit loop ;

If its good enough someone can add it to the list.

Paul Rubin

unread,
Apr 2, 2021, 3:59:15 PM4/2/21
to
NN <novembe...@gmail.com> writes:
> For suggested ref implementations how about.
> : type ( a u -- )
> swap >r begin dup 0> while
> r> count emit >r 1-
> repeat drop r> drop ;

Is something wrong with DO ?

: type ( a u -- ) dup IF 0 DO dup c@ emit 1+ LOOP ELSE drop THEN ;

I first used ?DO to avoid the 0 test, but that turns out to be in
CORE-EXT rather than CORE.

P Falth

unread,
Apr 2, 2021, 6:13:29 PM4/2/21
to
And how is EMIT defined?
I use something like

VARIABLE tmp
: EMIT tmp c! tmp 1 type ;

or if you want it to work with unicode chars

: EMIT tmp dup >r xc!+ r@ - r> swap type ;

BR
Peter

Brian Fox

unread,
Apr 2, 2021, 6:30:25 PM4/2/21
to
On my system EMIT is primitive since it has to talk directly to
a Video chip. (TMS9918) So that gave me the freedom to do TYPE
the way I did.



Doug Hoffman

unread,
Apr 2, 2021, 6:52:03 PM4/2/21
to
On 4/2/21 3:59 PM, Paul Rubin wrote:

> Is something wrong with DO ?
>
> : type ( a u -- ) dup IF 0 DO dup c@ emit 1+ LOOP ELSE drop THEN ;

I think you need a 2drop:

: type ( a u -- ) dup IF 0 DO dup c@ emit 1+ LOOP ELSE 2drop THEN ;

-Doug

Brian Fox

unread,
Apr 2, 2021, 8:50:51 PM4/2/21
to
On 2021-04-02 11:41 AM, NN wrote:

> Fair enough.
> BTW, I think your example would fail on an empty string which is the
> examples in the link use ?do
>
> : type ( a u -- ) abs bounds ?do i c@ emit loop ;
>
> If its good enough someone can add it to the list.
>

You are correct. I implement ?DO in my kernel and use it for
TYPE. I was typing from "hip".

Since a U is specified for the string length I think ABS
is incorrect. I have never seen it placed in TYPE.

It would be wise perhaps for Forth79 and FIG Forth DO LOOPS.

dxforth

unread,
Apr 2, 2021, 9:02:56 PM4/2/21
to
Also the memory saved.

dxforth

unread,
Apr 2, 2021, 9:19:54 PM4/2/21
to
A distinction only a Standard could make. For a small system to not
include ?DO would be to needlessly waste memory.

dxforth

unread,
Apr 3, 2021, 1:17:53 AM4/3/21
to
: TYPE ?DUP IF 0 DO COUNT EMIT LOOP THEN DROP ;

Paul Rubin

unread,
Apr 3, 2021, 2:12:40 AM4/3/21
to
dxforth <dxf...@gmail.com> writes:
> A distinction only a Standard could make. For a small system to not
> include ?DO would be to needlessly waste memory.

I don't understand why DO exists instead of ?DO being the default.

Paul Rubin

unread,
Apr 3, 2021, 2:15:41 AM4/3/21
to
Doug Hoffman <dhoff...@gmail.com> writes:
> I think you need a 2drop:

Yep. Or I suppose I could have used ?DUP but that word always makes me
squirm because of its variable stack effect.

Paul Rubin

unread,
Apr 3, 2021, 2:17:54 AM4/3/21
to
P Falth <peter....@gmail.com> writes:
> And how is EMIT defined?

I've usually thought of EMIT as a primitive that writes directly to a
hardware port, on low level systems. TYPE then uses EMIT.

dxforth

unread,
Apr 3, 2021, 3:25:49 AM4/3/21
to
?DO came much later and involves an extra test you may not need.

Anton Ertl

unread,
Apr 3, 2021, 4:28:41 AM4/3/21
to
Paul Rubin <no.e...@nospam.invalid> writes:
>NN <novembe...@gmail.com> writes:
>> For suggested ref implementations how about.
>> : type ( a u -- )
>> swap >r begin dup 0> while
>> r> count emit >r 1-
>> repeat drop r> drop ;
>
>Is something wrong with DO ?

Yes.

>: type ( a u -- ) dup IF 0 DO dup c@ emit 1+ LOOP ELSE drop THEN ;
>
>I first used ?DO to avoid the 0 test, but that turns out to be in
>CORE-EXT rather than CORE.

So what. It's still the better word for this purpose. It's not your
job to find workarounds for systems without ?DO.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2020: https://euro.theforth.net/2020

Anton Ertl

unread,
Apr 3, 2021, 5:10:47 AM4/3/21
to
NN <novembe...@gmail.com> writes:
>https://forth-standard.org/standard/core/TYPE
>
>For suggested ref implementations how about.

You can suggest a reference implementation for TYPE there.

>: type ( a u -- )
> swap >r begin dup 0> while
> r> count emit >r 1-
> repeat drop r> drop ;

Others have presented versions that use ?DO (or DO surrounded by IF).
There are the following variants:

1) Use the address as the loop index:

: type1 ( c-addr u -- )
over + swap ?do i c@ emit loop ;

2) Use the array index as loop index:

: type2 ( c-addr u -- )
0 ?do dup i + c@ emit loop drop ;

3) Use the length as loop limit, don't use the loop index:

: type3 ( c-addr u -- )
0 ?do dup c@ emit 1+ loop drop ;

or less obvious

: type3 ( c-addr u -- )
0 ?do count emit loop drop ;

or you could use a variant of FOR..NEXT that supports 0-trip loops.

If you want to cater for "1 chars > 1" systems (try to get one for
testing!), these definitions become more complicated:

: type1 ( c-addr u -- )
chars over + swap ?do i c@ emit 1 chars +loop ;

: type2 ( c-addr u -- )
0 ?do dup i chars + c@ emit loop drop ;

: type3 ( c-addr u -- )
0 ?do dup c@ emit char+ loop drop ;

In this case the differences between TYPE1, TYPE2 and TYPE3 are small,
but in general, when looping over arrays, I prefer to use the address
as loop index (i.e. TYPE1), because it usually means that I don't have
to keep the (base or running) address elsewhere throughout the loop
body, resulting in less stack juggling.

Anton Ertl

unread,
Apr 3, 2021, 5:31:35 AM4/3/21
to
P Falth <peter....@gmail.com> writes:
>or if you want it to work with unicode chars
>
>: EMIT tmp dup >r xc!+ r@ - r> swap type ;

EMIT is defined to work on chars, not on xchars. We have XEMIT for
printing one xchar on the stack (or TYPE for printing one or more
xchars in memory).

And in general, EMIT cannot be extended to work like XEMIT: EMIT
prints the raw byte; and on, e.g., a system like Gforth (with UTF-8
encoding) where XEMIT takes a Unicode code point number as input and
produces UTF-8 as output, the output for, e.g., $C4 XEMIT consists of
two bytes ($c3 $84), whule $C4 EMIT just outputs $c4. That's why the
xchar wordset contains XEMIT (unlike what I proposed in 2005); the use
of EMIT and KEY for dealing with raw bytes was pointed out by Stephen
Pelc.

I know that you do not use codepoint numbers for the on-stack
representation of characters, but something derived from the in-memory
(i.e., string) representation. It makes me wonder if, with your
on-stack representation, XEMIT can output any raw byte, or if this
results in the byte followed by a number of 0 bytes for certain byte
values.

P Falth

unread,
Apr 3, 2021, 6:06:10 AM4/3/21
to
On a forth hosted on an OS, I found it convenient to implement
TYPE as the primitive. On my lxf, 32 bit uner linux type is

: (type) ( addr len -- ) swap 1 4 syscall3 drop ;

syscall3 implements a kernel call with 3 parameters
( plus the syscall nr)

Peter

Doug Hoffman

unread,
Apr 3, 2021, 7:44:34 AM4/3/21
to
Some Forths don't default to always showing the stack depth when
execution is done. I think Gforth is one of those but maybe it can be
configured to do so. It is a pain to type .s every time after you test a
word. Anyway, I am guessing that is why you didn't notice that your TYPE
was leaving something on the stack.

I only tested the edge case where the string length was zero. My fault.
dxforth caught it and showed one possible correction.

-Doug


P Falth

unread,
Apr 3, 2021, 8:22:20 AM4/3/21
to
On Saturday, 3 April 2021 at 11:31:35 UTC+2, Anton Ertl wrote:
> P Falth <peter....@gmail.com> writes:
> >or if you want it to work with unicode chars
> >
> >: EMIT tmp dup >r xc!+ r@ - r> swap type ;
> EMIT is defined to work on chars, not on xchars. We have XEMIT for
> printing one xchar on the stack (or TYPE for printing one or more
> xchars in memory).
>
> And in general, EMIT cannot be extended to work like XEMIT: EMIT
> prints the raw byte; and on, e.g., a system like Gforth (with UTF-8
> encoding) where XEMIT takes a Unicode code point number as input and
> produces UTF-8 as output, the output for, e.g., $C4 XEMIT consists of
> two bytes ($c3 $84), whule $C4 EMIT just outputs $c4. That's why the
> xchar wordset contains XEMIT (unlike what I proposed in 2005); the use
> of EMIT and KEY for dealing with raw bytes was pointed out by Stephen
> Pelc.
>
> I know that you do not use codepoint numbers for the on-stack
> representation of characters, but something derived from the in-memory
> (i.e., string) representation. It makes me wonder if, with your
> on-stack representation, XEMIT can output any raw byte, or if this
> results in the byte followed by a number of 0 bytes for certain byte
> values.
> - anton

I found that my stack representation was a dead end, creating problems
with no other gains. I switched to stack representation being the codepoint.

My systems are also unicode only, no other encodings supported. For this
reason I have emit=xemit and key=xkey. This has so far not given me any
problem.

$C4 EMIT outputs Ä.
What else should it output?
In fact that was the reason I started with unicode about 20 years ago.
To be able to spell and type my last name properly. It is Fälth!

BR
Peter

Anton Ertl

unread,
Apr 3, 2021, 8:53:05 AM4/3/21
to
P Falth <peter....@gmail.com> writes:
>I found that my stack representation was a dead end, creating problems
>with no other gains.

Interesting. What were the problems?

>My systems are also unicode only, no other encodings supported. For this
>reason I have emit=3Dxemit and key=3Dxkey. This has so far not given me any=
>=20
>problem.
>
> $C4 EMIT outputs =C3=84.=20
>What else should it output?

Just the raw byte $c4. For xchars, there is XEMIT, which should
output UTF-8 on your system.

>To be able to spell and type my last name properly. It is F=C3=A4lth!

And then Google Groups mangles it into quoted-printable encoding:-)

P Falth

unread,
Apr 3, 2021, 9:20:59 AM4/3/21
to
On Saturday, 3 April 2021 at 14:53:05 UTC+2, Anton Ertl wrote:
> P Falth <peter....@gmail.com> writes:
> >I found that my stack representation was a dead end, creating problems
> >with no other gains.
> Interesting. What were the problems?

I tried to avoid coding and decoding by just using c!/c@, w!/w@ and a 3@3! to get 3
bytes. easy to implement but:

I could not take a codepoint directly and use it. I had to store, it look at the dump
or @ it to understand how I should write it.

sorting order was completely lost

No other language used a similar encoding

> >My systems are also unicode only, no other encodings supported. For this
> >reason I have emit=3Dxemit and key=3Dxkey. This has so far not given me any=
> >=20
> >problem.
> >
> > $C4 EMIT outputs =C3=84.=20
> >What else should it output?
> Just the raw byte $c4. For xchars, there is XEMIT, which should
> output UTF-8 on your system.

the first paragraph in forth2012 of EMIT says

6.1.1320 EMIT CORE ( x – – )
If x is a graphic character in the implementation-defined character set, display x. The
effect of EMIT for all other values of x is implementation-defined.

$C4 is Ä in my implementation-defined character set so it displays that!

>
> >To be able to spell and type my last name properly. It is F=C3=A4lth!
>
> And then Google Groups mangles it into quoted-printable encoding:-)

After soon 27 years in Italy I have become used to different spellings and
pronunciations of my name. The 2 dots have almost always disappeared

Peter

Anton Ertl

unread,
Apr 3, 2021, 10:39:44 AM4/3/21
to
P Falth <peter....@gmail.com> writes:
>On Saturday, 3 April 2021 at 14:53:05 UTC+2, Anton Ertl wrote:
>> P Falth <peter....@gmail.com> writes:=20
>> > $C4 EMIT outputs =3DC3=3D84.=3D20
>> >What else should it output?
>> Just the raw byte $c4. For xchars, there is XEMIT, which should=20
>> output UTF-8 on your system.=20
>
>the first paragraph in forth2012 of EMIT says
>
>6.1.1320 EMIT CORE ( x =E2=80=93 =E2=80=93 )
>If x is a graphic character in the implementation-defined character set, di=
>splay x. The
>effect of EMIT for all other values of x is implementation-defined.

The second paragraph says

|When passed a character whose character-defining bits have a value
|between hex 20 and 7E inclusive, the corresponding standard character,
|specified by 3.1.2.1 Graphic characters, is displayed. Because
|different output devices can respond differently to control
|characters, programs that use control characters to perform specific
|functions have an environmental dependency. Each EMIT deals with only
|one character.

It does not really say what happens for other characters. In any
case, the intention of adding XEMIT was so that EMIT could be used for
raw bytes. And several systems (I think all, but yours) work that
way. I guess we should fix the definition of EMIT.

Paul Rubin

unread,
Apr 3, 2021, 12:07:29 PM4/3/21
to
P Falth <peter....@gmail.com> writes:
> My systems are also unicode only, no other encodings supported. For this
> reason I have emit=xemit and key=xkey. This has so far not given me any
> problem.

That means more complicated methods for actual binary i/o, I guess.
It's nice to be able to write raw bytes when you want to.

P Falth

unread,
Apr 3, 2021, 1:43:09 PM4/3/21
to
I checked VfX. The windows version outputs Ä for $C4 EMIT
but for $20ac (€) it does not work. probably it supports Latin-1 or similar.
Linux version does not work for EMIT, same output as gforth.

Already when we discussed this 20? years ago I was against xemit and
xkey. My position has always been that emit and key should work with
the implemented character encoding. If you need to send byte by byte
these functions should be named differently, like pemit and pkey.
If input or output is redirected key and emit should behave according
to the specifics of the new source for example by being defered

Peter

P Falth

unread,
Apr 3, 2021, 1:45:28 PM4/3/21
to
WRITE-FILE and READ-FILE works for that!

Peter

NN

unread,
Apr 4, 2021, 8:02:35 AM4/4/21
to
128 emit € <- depends on the font.

Anton Ertl

unread,
Apr 4, 2021, 10:05:43 AM4/4/21
to
P Falth <peter....@gmail.com> writes:
>I checked VfX. The windows version outputs =C3=84 for $C4 EMIT

That's strange. Stephen Pelc argued against the extension of EMIT to
work on xchars and for EMIT to process raw bytes. His argument
resulted in the introduction of XEMIT for dealing with xchars. My
guess is that this behaviour is not intentional.

On Linux Gforth, SwiftForth and VFX seem to process raw bytes. iForth
behaves stragely:

iforth "1 $c3 emit .s 1 $a4 emit .s bye"

shows an empty stack twice, so EMIT apparently consumes two stack
items.

lxf behaves as you described.

>Already when we discussed this 20? years ago I was against xemit and
>xkey. My position has always been that emit and key should work with
>the implemented character encoding. If you need to send byte by byte
>these functions should be named differently, like pemit and pkey.

Well, the decision has been to add XEMIT and XKEY, and that's water
down the river. We also have EMIT and KEY, and the intention at some
time was for them to deal with raw bytes, but that intentions has not
been reflected in the text of the standard document yet. I don't
expect a proposal for PEMIT and PKEY (if somebody writes it) to be
successful, but who knows.

Originally I proposed to let EMIT and KEY work on xchars (as XEMIT and
XKEY do now) and was not enthusiastic about EMIT and KEY for raw
bytes. But it has the advantage that the Forth-94 code like

: type1 ( c-addr u -- )
over + swap ?do i c@ emit loop ;

works as intended on Forth-2012, even when runnung on an UTF-8 system
and passing an UTF-8 string to TYPE1. However, the difference between
VFX for Windows and for Linux indicates that in practice, this
advantage is not used (at least not by VFX users on Windows).

>If input or output is redirected key and emit should behave according
>to the specifics of the new source for example by being defered

A deferred word should implement a common interface. In case of EMIT,
all implementations should process a raw byte, or all implementations
should process a code point. If you want EMIT to behave like XEMIT,
then it should do that even when redirected to a serial port;
conversely, if we want EMIT to process raw bytes, all the
implementations of EMIT should do that.

P Falth

unread,
Apr 4, 2021, 10:08:41 AM4/4/21
to
No it also depends on what codepage you use. That could be Windows 1252.
It is definitely not unicode.

P Falth

unread,
Apr 4, 2021, 1:31:19 PM4/4/21
to
It will work on Linux but not on a Windows console.
For that to work on a Windows console I need to define emit as

\ emit for raw bytes

variable tmpbytes
variable #tmp
variable #expected


: XCS ( xcaddr -- n ) \ size of xc in bytes stored at addr
c@
dup $80 u< if drop 1 exit then
dup $e0 u< if drop 2 exit then
dup $f0 u< if drop 3 exit then
dup $f8 u< if drop 4 exit then
dup $fc u< if drop 5 exit then
dup $fe u< if drop 6 exit then
drop 6 ;


: leadingchar ( pchar -- )
tmpbytes c!
1 #tmp !
tmpbytes xcs 1- #expected ! ;

: trailingchar ( pchar -- )
tmpbytes #tmp @ + c!
1 #tmp +!
-1 #expected +! ;

: emit ( pchar -- )
#tmp @ if trailingchar else leadingchar then
#expected @ 0= if tmpbytes #tmp @ type 0 #tmp ! then ;

Where type is defined like
: prtmp here 4096 + aligned ;

: type ( addr len -- )
dup if
prtmp over 2* utf>wc16-string 2/
0 temp 2swap swap conout @ writeconsolew drop exit then
2drop ;

the utf8 string is transformed to utf16 and sent to the WriteConsoleW
system call.

So first I need to split the character to print it as raw bytes and then
in EMIT put them together again. I guess this becomes cooked bytes now!

The Windows console has no similarities with a Linux VT-console.
Fortunately Microsoft has introduced the Windows Terminal.
It has almost full support for UTF8 and VT codes. Printing utf8
strings to the console is now working. It is not even limited to the
first 64K codepoints as before. Reading does not work yet.

> >If input or output is redirected key and emit should behave according
> >to the specifics of the new source for example by being defered
> A deferred word should implement a common interface. In case of EMIT,
> all implementations should process a raw byte, or all implementations
> should process a code point. If you want EMIT to behave like XEMIT,
> then it should do that even when redirected to a serial port;
> conversely, if we want EMIT to process raw bytes, all the
> implementations of EMIT should do that.

What I mean is that if emit is redirected to a serial port I would regard that
as having a character set of 0-255 and every byte will be raw bytes.

Peter

Anton Ertl

unread,
Apr 7, 2021, 5:22:19 AM4/7/21
to
P Falth <peter....@gmail.com> writes:
>It will work on Linux but not on a Windows console.
>
>So first I need to split the character to print it as raw bytes and then
>in EMIT put them together again. I guess this becomes cooked bytes now!
>
>The Windows console has no similarities with a Linux VT-console.
>Fortunately Microsoft has introduced the Windows Terminal.
>It has almost full support for UTF8 and VT codes. Printing utf8
>strings to the console is now working.

Ruvim reports in
<https://forth-standard.org/proposals/emit-and-non-ascii-values#reply-627>:

|I tested SP-Forth/4 in Windows (by setting UTF-8 code page in the
|console via chcp 65001 command), and in Linux. The test:
|
|HEX C3 EMIT A4 EMIT
|
|outputs ä
|
|In SP-Forth the word EMIT is implemented via TYPE (that is via WRITE-FILE).
|
|In the test
|
|HEX C3 EMIT KEY DROP A4 EMIT
|
|we can see that after the first emit nothing is shown, and after the
|second emit the character ä is shown.

I don't know how SP-Forth/4 calls Windows, and whether Ruvim used the
Windows Terminal, but it's apparently possible to implement EMIT in
the "raw byte" way on Windows, too.

I wonder if in VFX on Windows the "typical use" case works as intended
if you do "chcp 65001" first. On Linux you also don't get UTF-8 (and
don't pass the test) unless you tell the system that you use UTF-8
(but these days, UTF-8 is the default setting).

>> >If input or output is redirected key and emit should behave according
>> >to the specifics of the new source for example by being defered
>> A deferred word should implement a common interface. In case of EMIT,
>> all implementations should process a raw byte, or all implementations
>> should process a code point. If you want EMIT to behave like XEMIT,
>> then it should do that even when redirected to a serial port;
>> conversely, if we want EMIT to process raw bytes, all the
>> implementations of EMIT should do that.
>
>What I mean is that if emit is redirected to a serial port I would regard that
>as having a character set of 0-255 and every byte will be raw bytes.

But if the serial port is connected to something expecting UTF-8, the
behaviour would differ from the behaviour of EMIT on the console: the
"typical use" example would work, while it does not work on the
console.

In any case, I have written a proposal on the wording of EMIT
<https://forth-standard.org/proposals/emit-and-non-ascii-values#contribution-184>,
and you may want to contribute to it.

none albert

unread,
Apr 7, 2021, 10:19:57 AM4/7/21
to
In article <a526461b-316d-4baa...@googlegroups.com>,
P Falth <peter....@gmail.com> wrote:
>On Friday, 2 April 2021 at 21:59:15 UTC+2, Paul Rubin wrote:
>> NN <novembe...@gmail.com> writes:
>> > For suggested ref implementations how about.
>> > : type ( a u -- )
>> > swap >r begin dup 0> while
>> > r> count emit >r 1-
>> > repeat drop r> drop ;
>> Is something wrong with DO ?
>>
>> : type ( a u -- ) dup IF 0 DO dup c@ emit 1+ LOOP ELSE drop THEN ;
>>
>> I first used ?DO to avoid the 0 test, but that turns out to be in
>> CORE-EXT rather than CORE.
>
>And how is EMIT defined?
>I use something like
>
>VARIABLE tmp
>: EMIT tmp c! tmp 1 type ;
>
>or if you want it to work with unicode chars
>
>: EMIT tmp dup >r xc!+ r@ - r> swap type ;

Indeed it is much better to have
: TYPE 1 ( stdout) WRITE-FILE THROW ;

You don't need a tmp as long as you can point into
the data stack:
: EMIT DSP@ 1 TYPE DROP ;

>
>BR
>Peter

Groetjes Albert
--
"in our communism country Viet Nam, people are forced to be
alive and in the western country like US, people are free to
die from Covid 19 lol" duc ha
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

none albert

unread,
Apr 7, 2021, 10:35:53 AM4/7/21
to
In article <64189ef8-0164-4239...@googlegroups.com>,
Let's try
\----------------
HEX
CREATE buffer C4 ,
: doit buffer 1 TYPE ;
\----------------
lina -c doemit.frt
doemit | hd
00000000 c4 |.|
00000001
Anybody who expects something different?

Now xterm prints an upper case a with an umlaut, and linux terminal
show a square with a query sign. So the interpretation of the character
has little to do with Forth I'd say, more with the terminal.
>
>Already when we discussed this 20? years ago I was against xemit and
>xkey. My position has always been that emit and key should work with
>the implemented character encoding. If you need to send byte by byte
>these functions should be named differently, like pemit and pkey.
>If input or output is redirected key and emit should behave according
>to the specifics of the new source for example by being defered

There is a difference if one types <esc>[A or <cursor> up on this keyboard.
It is in timing. So I can differentiate between the two and
cursor up can be recognized as a single (extended) key.
: XKEY KEY BEGIN KEY? WHILE 8 LSHIFT KEY OR REPEAT ;

>
>Peter

P Falth

unread,
Apr 7, 2021, 1:14:36 PM4/7/21
to
How do you manage to type <esc>[A ?
I cannot manage the fingers on my keyboard to do it.
It also returns immediately on <esc>

P Falth

unread,
Apr 7, 2021, 5:49:36 PM4/7/21
to
On Wednesday, 7 April 2021 at 11:22:19 UTC+2, Anton Ertl wrote:
> P Falth <peter....@gmail.com> writes:
> >It will work on Linux but not on a Windows console.
> >
> >So first I need to split the character to print it as raw bytes and then
> >in EMIT put them together again. I guess this becomes cooked bytes now!
> >
> >The Windows console has no similarities with a Linux VT-console.
> >Fortunately Microsoft has introduced the Windows Terminal.
> >It has almost full support for UTF8 and VT codes. Printing utf8
> >strings to the console is now working.
> Ruvim reports in
> <https://forth-standard.org/proposals/emit-and-non-ascii-values#reply-627>:
>
> |I tested SP-Forth/4 in Windows (by setting UTF-8 code page in the
> |console via chcp 65001 command), and in Linux. The test:
> |
> |HEX C3 EMIT A4 EMIT
> |
> |outputs ä
> |
> |In SP-Forth the word EMIT is implemented via TYPE (that is via WRITE-FILE).
> |
> |In the test
> |
> |HEX C3 EMIT KEY DROP A4 EMIT
> |
> |we can see that after the first emit nothing is shown, and after the
> |second emit the character ä is shown.
>
> I don't know how SP-Forth/4 calls Windows, and whether Ruvim used the
> Windows Terminal, but it's apparently possible to implement EMIT in
> the "raw byte" way on Windows, too.

It has always been possible to set the codepage to 65001 for output.
It has not always worked correctly. One problem being wrong number of
written bytes reported. Input of utf8 has never worked and unfortunately not
even now with the new Windows Terminal

> I wonder if in VFX on Windows the "typical use" case works as intended
> if you do "chcp 65001" first. On Linux you also don't get UTF-8 (and
> don't pass the test) unless you tell the system that you use UTF-8
> (but these days, UTF-8 is the default setting).

No that does not make any change. But loading xchar.fth in VFX and
defining:
VARIABLE etmp
: EMIT etmp dup >r xc!+ r@ - r> swap type ;

Make emit work as I want!
$20ac emit € ok

This works on both Windows and Linux versions.
Also in Gforth that definition works.

> >> >If input or output is redirected key and emit should behave according
> >> >to the specifics of the new source for example by being defered
> >> A deferred word should implement a common interface. In case of EMIT,
> >> all implementations should process a raw byte, or all implementations
> >> should process a code point. If you want EMIT to behave like XEMIT,
> >> then it should do that even when redirected to a serial port;
> >> conversely, if we want EMIT to process raw bytes, all the
> >> implementations of EMIT should do that.
> >
> >What I mean is that if emit is redirected to a serial port I would regard that
> >as having a character set of 0-255 and every byte will be raw bytes.
> But if the serial port is connected to something expecting UTF-8, the
> behaviour would differ from the behaviour of EMIT on the console: the
> "typical use" example would work, while it does not work on the
> console.
>
> In any case, I have written a proposal on the wording of EMIT
> <https://forth-standard.org/proposals/emit-and-non-ascii-values#contribution-184>,
> and you may want to contribute to it.

I have seen that. I have registered and will contribute

BR
Peter

Coos Haak

unread,
Apr 8, 2021, 6:57:24 AM4/8/21
to
Op Wed, 7 Apr 2021 10:14:34 -0700 (PDT) schreef P Falth:

> How do you manage to type <esc>[A ?
> I cannot manage the fingers on my keyboard to do it.
> It also returns immediately on <esc>
>
HEX
: <ESC> 1B EMIT ;
: UP <ESC> S" [A" TYPE

groet, Coos

Anton Ertl

unread,
Apr 8, 2021, 7:32:10 AM4/8/21
to
P Falth <peter....@gmail.com> writes:
>It has always been possible to set the codepage to 65001 for output.
>It has not always worked correctly. One problem being wrong number of
>written bytes reported.

Reported by whom? And why ist that a problem?

>> I wonder if in VFX on Windows the "typical use" case works as intended=20
>> if you do "chcp 65001" first. On Linux you also don't get UTF-8 (and=20
>> don't pass the test) unless you tell the system that you use UTF-8=20
>> (but these days, UTF-8 is the default setting).
>
>No that does not make any change. But loading xchar.fth in VFX and
>defining:
>VARIABLE etmp
>: EMIT etmp dup >r xc!+ r@ - r> swap type ;=20
>
>Make emit work as I want!
>$20ac emit =E2=82=AC ok

A simpler implementation of what you want is, after loading VFX's
xchar.fth:

: emit xemit ;

Or you could just use XEMIT directly, which is what I would recommend
if you want to deal with an xchar.

P Falth

unread,
Apr 8, 2021, 9:21:03 AM4/8/21
to
On Thursday, 8 April 2021 at 13:32:10 UTC+2, Anton Ertl wrote:
> P Falth <peter....@gmail.com> writes:
> >It has always been possible to set the codepage to 65001 for output.
> >It has not always worked correctly. One problem being wrong number of
> >written bytes reported.
> Reported by whom? And why ist that a problem?

Google is our friend here. Here is one example from a Perl bug
https://github.com/perl/perl5/issues/13794

WriteFile returns characters written and not bytes. If your function
checks what has been written it can see a lower value then expected
and try to write the "missing" bytes. This is obviously not a problem
in our case as we expect the display to consume the whole string and
do not check.

>
> >> I wonder if in VFX on Windows the "typical use" case works as intended=20
> >> if you do "chcp 65001" first. On Linux you also don't get UTF-8 (and=20
> >> don't pass the test) unless you tell the system that you use UTF-8=20
> >> (but these days, UTF-8 is the default setting).
> >
> >No that does not make any change. But loading xchar.fth in VFX and
> >defining:
> >VARIABLE etmp
> >: EMIT etmp dup >r xc!+ r@ - r> swap type ;=20
> >
> >Make emit work as I want!
> >$20ac emit =E2=82=AC ok
>
> A simpler implementation of what you want is, after loading VFX's
> xchar.fth:
>
> : emit xemit ;
>
> Or you could just use XEMIT directly, which is what I would recommend
> if you want to deal with an xchar.

I could also do
: EMIT dup $80 < if emit else xemit then ;

But this is silly! You mention somewhere that we do not have an XTYPE
as TYPE knows how to deal correctly with the string. I think the same should
be true for EMIT and KEY. They should know how to deal with a codepoint if
I implement Unicode support in my Forth.

This works on my systems. ( I hope google does not mess up this to much)
'A' emit A ok
'Ä' emit Ä ok
'$' emit $ ok
'£' emit £ ok
'€' emit € ok

On Gforth I get
'A' emit A ok
'Ä' emit � ok
'$' emit $ ok
'£' emit � ok
'€' emit � ok

And you are saying that my system is non standard!
If I can enter a character from my keyboard I also expect EMIT to display it.

Yes I could have used XEMIT and both examples would have been the same.
But I see no use of them in any programs, people continue to use emit and key.
How many systems have implemented unicode support as part of the core
system and not as a loadable file buried in a library?

BR
Peter

Ruvim

unread,
Apr 8, 2021, 2:48:29 PM4/8/21
to
On 2021-04-08 16:21, P Falth wrote:
> On Thursday, 8 April 2021 at 13:32:10 UTC+2, Anton Ertl wrote:
[...]
>> A simpler implementation of what you want is, after loading VFX's
>> xchar.fth:
>>
>> : emit xemit ;
>>
>> Or you could just use XEMIT directly, which is what I would recommend
>> if you want to deal with an xchar.
>
> I could also do
> : EMIT dup $80 < if emit else xemit then ;
>
> But this is silly!

Having EMIT that is equivalent to XEMIT is also silly.
Do you think it's worth to deprecate XEMIT ?

> You mention somewhere that we do not have an XTYPE
> as TYPE knows how to deal correctly with the string. I think the same should
> be true for EMIT and KEY. They should know how to deal with a codepoint if
> I implement Unicode support in my Forth.
>
> This works on my systems. ( I hope google does not mess up this to much)
> 'A' emit A ok
> 'Ä' emit Ä ok
> '$' emit $ ok
> '£' emit £ ok
> '€' emit € ok
>
> On Gforth I get
> 'A' emit A ok
> 'Ä' emit � ok
> '$' emit $ ok
> '£' emit � ok
> '€' emit � ok

The optional Extended-Character word set suggests that [CHAR] and
character literal return xchar (code point) and then a program should
use XEMIT to print it as:

'Ä' xemit

Perhaps emit may throw an exception if the given pchar cannot be a part
of a correct xchar in the sequence.


> And you are saying that my system is non standard!

Actually not because of that.


> If I can enter a character from my keyboard I also expect EMIT to display it.

Yes.

So the sequence "KEY EMIT" should be always correct (ditto "XKEY XEMIT")

Hence if EMIT handles only pchar, then KEY should also return only pchar.



The idea is that the expressions

s" Ä" type

s" Ä" over 1 type 1 /string type

s" Ä" drop dup c@ emit char+ c@ emit


should all produce the same result when UTF-8 encoding is used.


How could you explain it if they produce the different results?


> Yes I could have used XEMIT and both examples would have been the same.

> But I see no use of them in any programs, people continue to use emit and key.



--
Ruvim

P Falth

unread,
Apr 8, 2021, 3:28:11 PM4/8/21
to
On Thursday, 8 April 2021 at 20:48:29 UTC+2, Ruvim wrote:
> On 2021-04-08 16:21, P Falth wrote:
> > On Thursday, 8 April 2021 at 13:32:10 UTC+2, Anton Ertl wrote:
> [...]
> >> A simpler implementation of what you want is, after loading VFX's
> >> xchar.fth:
> >>
> >> : emit xemit ;
> >>
> >> Or you could just use XEMIT directly, which is what I would recommend
> >> if you want to deal with an xchar.
> >
> > I could also do
> > : EMIT dup $80 < if emit else xemit then ;
> >
> > But this is silly!
> Having EMIT that is equivalent to XEMIT is also silly.
> Do you think it's worth to deprecate XEMIT ?

Yes and also XKEY, that is why I respond to this thread.
My windows system uses UTF-16 for type. The UTF-8 string is converted
to UTF-16 before calling the OS WriteConsoleW function.
The Linux version types the UTF-8 string directly.

One of the problems with EMIT being restricted to pchars is that you need
to know the encoding of the underlying string as in your example above.

If EMIT take a Unicode codepoint as I suggest the encoding does not
need to be know to the programmer

Peter

Paul Rubin

unread,
Apr 8, 2021, 3:47:31 PM4/8/21
to
P Falth <peter....@gmail.com> writes:
> This works on my systems. ( I hope google does not mess up this to much)
> 'A' emit A ok
> 'Ä' emit Ä ok ...
> And you are saying that my system is non standard!
> If I can enter a character from my keyboard I also expect EMIT to display it.

At odds here is an idea, I guess under dispute, that in the old days 1
character was 1 byte so EMIT would always send a byte; but now with
Unicode, some chars have multibyte encoding. So we have EMIT for bytes
and XEMIT for codepoints under whatever encoding.

We also had the idea that KEY would read a character (i.e. byte) from a
keyboard, but for decades before anyone cared about Unicode, keyboards
had cursor keys and function keys that send escape sequences. Should we
expect KEY to properly read those and encode them somehow? Are there
even Unicode codepoints for them (I don't know)?

What does your keyboard actually transmit when you type "Ä" (capital A
with umlaut, codepoint 00C4)? My guess is it actually send an ISO
8859-1 character (single byte) which also happens to be 00C4 so your
EMIT possibly has to translate it to some other encoding like UTF-8 on
output. Do you want EMIT to also be able to display CJK characters if
your keyboard can transmit them? Or maybe your system simply displays
ISO 8859-1 directly and doesn't bother with Unicode. Today that is in
some ways an annoying legacy system, but it was workable way to deal
with European alphabets for a while, and maybe still is for your
particular application.

It seems to me that 1) there's no point having EMIT and XEMIT as
separate words if they both do the same thing; 2) having a simple way to
read and write single bytes is still important.

P Falth

unread,
Apr 8, 2021, 4:55:53 PM4/8/21
to
On Thursday, 8 April 2021 at 21:47:31 UTC+2, Paul Rubin wrote:
> P Falth <peter....@gmail.com> writes:
> > This works on my systems. ( I hope google does not mess up this to much)
> > 'A' emit A ok
> > 'Ä' emit Ä ok ...
> > And you are saying that my system is non standard!
> > If I can enter a character from my keyboard I also expect EMIT to display it.
> At odds here is an idea, I guess under dispute, that in the old days 1
> character was 1 byte so EMIT would always send a byte; but now with
> Unicode, some chars have multibyte encoding. So we have EMIT for bytes
> and XEMIT for codepoints under whatever encoding.
>
> We also had the idea that KEY would read a character (i.e. byte) from a
> keyboard, but for decades before anyone cared about Unicode, keyboards
> had cursor keys and function keys that send escape sequences. Should we
> expect KEY to properly read those and encode them somehow? Are there
> even Unicode codepoints for them (I don't know)?

KEY in my systems does not return function or cursor keys. EKEY does
this. They are coded in the 32 bit space not covered by Unicode codepoints.

>
> What does your keyboard actually transmit when you type "Ä" (capital A
> with umlaut, codepoint 00C4)? My guess is it actually send an ISO
> 8859-1 character (single byte) which also happens to be 00C4 so your
> EMIT possibly has to translate it to some other encoding like UTF-8 on
> output. Do you want EMIT to also be able to display CJK characters if
> your keyboard can transmit them? Or maybe your system simply displays
> ISO 8859-1 directly and doesn't bother with Unicode. Today that is in
> some ways an annoying legacy system, but it was workable way to deal
> with European alphabets for a while, and maybe still is for your
> particular application.

My systems, ntf on Windows and lxf on Linux support Unicode since 20 years.
I have had no problems having emit and key working on Unicode codepoints.

The solution to achieve this is very different from Win and Linux
On Windows a 2 byte codepoint is returned directly from the OS with KEY. This limits
the codepoints to the first 64K. This was a problem of the windows console.
(Microsoft has now improved the console and my ntf64 can use the complete
Unicode codepoints). In the same way EMIT uses the OS WriteConsoleW to directly
write the 16 bit codepoint to the screen.

On Linux characters arrive as UTF-8 streams that are converted by KEY to the proper
codepoint. EMIT saves the codepoint as UTF-8 in a string that is sent to type

Internally strings are UTF-8 encoded. On Windows they are translated to UTF-16
inside TYPE before being sent to the OS for output.

Two very different implementation due to different operating system capabilities
but totally transparent while using the systems

> It seems to me that 1) there's no point having EMIT and XEMIT as
> separate words if they both do the same thing; 2) having a simple way to
> read and write single bytes is still important.

Yes XEMIT is not needed in my opinion
KEY and EMIT in my systems can read and write from 0 to 0x10FFFF.
0-0xFF is included in that range

BR
Peter

Ruvim

unread,
Apr 8, 2021, 4:57:25 PM4/8/21
to
On 2021-04-08 22:28, P Falth wrote:
> On Thursday, 8 April 2021 at 20:48:29 UTC+2, Ruvim wrote:
>> On 2021-04-08 16:21, P Falth wrote:
>>> On Thursday, 8 April 2021 at 13:32:10 UTC+2, Anton Ertl wrote:
>> [...]
>>>> A simpler implementation of what you want is, after loading VFX's
>>>> xchar.fth:
>>>>
>>>> : emit xemit ;
>>>>
>>>> Or you could just use XEMIT directly, which is what I would recommend
>>>> if you want to deal with an xchar.
>>>
>>> I could also do
>>> : EMIT dup $80 < if emit else xemit then ;
>>>
>>> But this is silly!
>> Having EMIT that is equivalent to XEMIT is also silly.
>> Do you think it's worth to deprecate XEMIT ?
>
> Yes and also XKEY, that is why I respond to this thread.

But then we will have the problem (1) below.


[...]
>>> This works on my systems. ( I hope google does not mess up this to much)
>>> 'A' emit A ok
>>> 'Ä' emit Ä ok

>>> On Gforth I get
>>> 'A' emit A ok
>>> 'Ä' emit � ok

>> The optional Extended-Character word set suggests that [CHAR] and
>> character literal return xchar (code point) and then a program should
>> use XEMIT to print it as:
>>
>> 'Ä' xemit
>>
>> Perhaps emit may throw an exception if the given pchar cannot be a part
>> of a correct xchar in the sequence.


>>> If I can enter a character from my keyboard I also expect EMIT to display it.
>> Yes.
>> So the sequence "KEY EMIT" should be always correct (ditto "XKEY XEMIT")
>>
>> Hence if EMIT handles only pchar, then KEY should also return only pchar.
>>
>>
>>
>> The idea is that the expressions
>>
>> s" Ä" type
>>
>> s" Ä" over 1 type 1 /string type
>>
>> s" Ä" drop dup c@ emit char+ c@ emit
>>
>>
>> should all produce the same result when UTF-8 encoding is used.
>
> My windows system uses UTF-16 for type. The UTF-8 string is converted
> to UTF-16 before calling the OS WriteConsoleW function.
> The Linux version types the UTF-8 string directly.
>
> One of the problems with EMIT being restricted to pchars is that you need
> to know the encoding of the underlying string as in your example above.

Actually, you don't need to know the encoding. I mentioned encoding just
for the sake of the third expression, but it can be replaced by the
following variant:

s" Ä" over c@ emit 1 /string type


Now these tree expressions should produce the same result regardless of
the encoding. Also, the result should be the same for any non-empty string.

And that is why the correct programs can continue to use EMIT and KEY.


If your system produce the different results, how you can explain that? (1)


--
Ruvim

P Falth

unread,
Apr 8, 2021, 5:30:37 PM4/8/21
to
No this is still depending on knowing the encoding of the string. The right way to
write it is

s" Ä" over xc@+ emit drop +x/string type

or with xemit if your emit and xemit are not the same

Peter

Ruvim

unread,
Apr 8, 2021, 7:30:14 PM4/8/21
to
On 2021-04-09 00:30, P Falth wrote:
> On Thursday, 8 April 2021 at 22:57:25 UTC+2, Ruvim wrote:
>> On 2021-04-08 22:28, P Falth wrote:
>>> On Thursday, 8 April 2021 at 20:48:29 UTC+2, Ruvim wrote:
[...]
>>>> The idea is that the expressions
>>>>
>>>> s" Ä" type
>>>>
>>>> s" Ä" over 1 type 1 /string type
>>>>
[...]

>>> One of the problems with EMIT being restricted to pchars is that you need
>>> to know the encoding of the underlying string as in your example above.
>> Actually, you don't need to know the encoding.
[...]
>>
>> s" Ä" over c@ emit 1 /string type
>
> No this is still depending on knowing the encoding of the string.

If EMIT is restricted to pchar, why this is depending on the encoding
that a Forth system uses under the hood?

In a standard Forth system the results should be the same independently
of the encoding. Otherwise the system just is not standard compliant in
this aspect.




> The right way to write it is
>
> s" Ä" over xc@+ emit drop +x/string type
>
> or with xemit if your emit and xemit are not the same

Of course, by the Forth-2012 you have to use xemit in this case. And
then this variant is also possible.




>> Now these tree expressions should produce the same result regardless of
>> the encoding. Also, the result should be the same for any non-empty string.

>> If your system produce the different results, how you can explain that?


Don't you think that your system, that uses the same encoding
independently of the platform, should produce the same result in Windows
and in Linux for each expression from my three above?

If not, what is your ground?


--
Ruvim

P Falth

unread,
Apr 9, 2021, 2:06:10 AM4/9/21
to
On Friday, 9 April 2021 at 01:30:14 UTC+2, Ruvim wrote:
> On 2021-04-09 00:30, P Falth wrote:
> > On Thursday, 8 April 2021 at 22:57:25 UTC+2, Ruvim wrote:
> >> On 2021-04-08 22:28, P Falth wrote:
> >>> On Thursday, 8 April 2021 at 20:48:29 UTC+2, Ruvim wrote:
> [...]
> >>>> The idea is that the expressions
> >>>>
> >>>> s" Ä" type
> >>>>
> >>>> s" Ä" over 1 type 1 /string type
> >>>>
> [...]
> >>> One of the problems with EMIT being restricted to pchars is that you need
> >>> to know the encoding of the underlying string as in your example above.
> >> Actually, you don't need to know the encoding.
> [...]
> >>
> >> s" Ä" over c@ emit 1 /string type
> >
> > No this is still depending on knowing the encoding of the string.
> If EMIT is restricted to pchar, why this is depending on the encoding
> that a Forth system uses under the hood?

You use c@ to access a string you do not know the encoding of!


> In a standard Forth system the results should be the same independently
> of the encoding. Otherwise the system just is not standard compliant in
> this aspect.
> > The right way to write it is
> >
> > s" Ä" over xc@+ emit drop +x/string type
> >
> > or with xemit if your emit and xemit are not the same
> Of course, by the Forth-2012 you have to use xemit in this case. And
> then this variant is also possible.
> >> Now these tree expressions should produce the same result regardless of
> >> the encoding. Also, the result should be the same for any non-empty string.
> >> If your system produce the different results, how you can explain that?
> Don't you think that your system, that uses the same encoding
> independently of the platform, should produce the same result in Windows
> and in Linux for each expression from my three above?
>
> If not, what is your ground?

Internally both my Linux and Windows systems uses UTF-8 encoded strings.
But the Windows systems translate this to an 16 bit char representation
inside type, to be able to write it to the screen with the WriteConsoleW
OS function. You remove 1 part of a multibyte char and send the remaining
string to type that will see an illegal utf8 char to translate and will fail.

Br
Peter


>
>
> --
> Ruvim