Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Case sensitivity

311 views
Skip to first unread message

Brad Eckert

unread,
Mar 12, 2018, 3:20:26 PM3/12/18
to
Hi All,

What are some advantages/disadvantages to using case-sensitive Forth? I can think of a few:

Advantages:
It behaves like most mainstream languages (C, Java, Python, etc.).

Disadvantages:
Standard words are in uppercase, which makes your code yell a lot.

But, what's the experience of people who have actually used case-sensitive Forth?

minf...@arcor.de

unread,
Mar 12, 2018, 3:31:53 PM3/12/18
to
Like in F83 I use a CAPS variable. In practice it is only used when
storing strings like file names in the directory.

My favourite writing style is to write all IMMEDIATE words, such as flow
control words, in uppercase - and all other words in lower case. But they
are converted to upper case when entered into the dictionary.

menti...@gmail.com

unread,
Mar 12, 2018, 4:32:26 PM3/12/18
to
On Monday, March 12, 2018 at 12:20:26 PM UTC-7, Brad Eckert wrote:
> Hi All,
>
> What are some advantages/disadvantages to using case-sensitive Forth?
> [...]
> But, what's the experience of people who have actually used case-sensitive Forth?

Sometimes there is an advantage in making a Forth word with two separated uppercase elements, as in "MindBoot" or "InFerence", so that the same ForthWord will automatically become a clickable item on a wiki-page, as can be seen with the word "EnThink" for English-thinking module on the

http://theai.wiki/Consciousness

wiki-page of the AI Wiki.

Arthur
--
http://www.researchgate.net/publication/220178410_MindForth_Thoughts_on_Artificial_Intelligence_and_Forth

Albert van der Horst

unread,
Mar 12, 2018, 5:56:20 PM3/12/18
to
In article <1e5fce9c-da42-4106...@googlegroups.com>,
I think that case-insensitivity is very parochial, it makes no sense
for chinese and it presents problems with diacritical marks.
That alone is enough reason to hate it.

lina/wina is case-sensitive. You get used to the yelling, especially
if you started in FORTRAN. What remains annoying is the need to switch
case and the situation that the CAPS lock remains on longer than
expected. I never got used to that, so in interactive work
I switch to CASE-SENSITIVE (a loadable extension).

I expected that the revised standard would have required from a
compiler that it accepts standard words in upper and lower case.

Groetjes Albert
--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

dxf...@gmail.com

unread,
Mar 12, 2018, 8:03:57 PM3/12/18
to
In my case-insensitive system, words appear in
the dictionary as written. Typically end-user
words are defined in uppercase and so-called
helpers in lower. Thus WORDS shows at a glance
which are important.

minf...@arcor.de

unread,
Mar 12, 2018, 8:21:22 PM3/12/18
to
Am Montag, 12. März 2018 22:56:20 UTC+1 schrieb Albert van der Horst:
> In article <1e5fce9c-da42-4106...@googlegroups.com>,
> Brad Eckert <hwf...@gmail.com> wrote:
> >Hi All,
> >
> >What are some advantages/disadvantages to using case-sensitive Forth? I can think of a few:
> >
> >Advantages:
> > It behaves like most mainstream languages (C, Java, Python, etc.).
> >
> >Disadvantages:
> > Standard words are in uppercase, which makes your code yell a lot.
> >
> >But, what's the experience of people who have actually used case-sensitive Forth?
>
> I think that case-insensitivity is very parochial, it makes no sense
> for chinese and it presents problems with diacritical marks.
> That alone is enough reason to hate it.
>
> lina/wina is case-sensitive. You get used to the yelling, especially
> if you started in FORTRAN. What remains annoying is the need to switch
> case and the situation that the CAPS lock remains on longer than
> expected. I never got used to that, so in interactive work
> I switch to CASE-SENSITIVE (a loadable extension).
>
> I expected that the revised standard would have required from a
> compiler that it accepts standard words in upper and lower case.
>

Notwithstanding that case-sensitivity is implementation defined
the standard § 3.4.2 requires:

A system shall be capable of finding the definition names defined by this
standard when they are spelled with upper-case letters.

so:
finding dup is implementation defined
finding DUP is required, regardless of how it is stored in the dictionary

dxf...@gmail.com

unread,
Mar 12, 2018, 9:59:03 PM3/12/18
to
On Tuesday, March 13, 2018 at 6:20:26 AM UTC+11, Brad Eckert wrote:
> Hi All,
>
> What are some advantages/disadvantages to using case-sensitive Forth? I can think of a few:
>
> Advantages:
> It behaves like most mainstream languages (C, Java, Python, etc.).
>
> Disadvantages:
> Standard words are in uppercase, which makes your code yell a lot.

The alternative - all lowercase including user-
defined names as in C - makes everything look
bland. Due to factoring forth tends to generate lots of word names. When I look at gforth code
new definitions barely stand out from the pack
and the comment that often follows. YMMV

Anton Ertl

unread,
Mar 13, 2018, 5:08:09 AM3/13/18
to
Brad Eckert <hwf...@gmail.com> writes:
>But, what's the experience of people who have actually used case-sensitive Forth?

Completely unusable:

SP-FORTH - ANS FORTH 94 for Linux
Open source project at http://spf.sf.net
Russian FIG at http://www.forth.org.ru ; Started by A.Cherezov
Version 4.20 Build 001 at 21.Jan.2009

65 emit
65 emit
^ ERROR #-2003
bye
bye
^ ERROR #-2003

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2017: http://euro.theforth.net/

Ron Aaron

unread,
Mar 13, 2018, 5:38:47 AM3/13/18
to


On 13/03/2018 11:06, Anton Ertl wrote:
> Brad Eckert <hwf...@gmail.com> writes:
>> But, what's the experience of people who have actually used case-sensitive Forth?
>
> Completely unusable:

Huh?

8th is case-sensitive (though since I prefer lowercase, the words are
lc). It's only unusable if you insist on typing them incorrectly.

minf...@arcor.de

unread,
Mar 13, 2018, 6:06:30 AM3/13/18
to
Would make sense but a bit convoluted:
If the SEARCH wordlist is present, let SEARCH-WORDLIST do a case-insensitive
search i.e. dup DuP DUP will all be found when wid = FORTH-WORDLIST.
For any other wid do a case-sensitive search.

Anton Ertl

unread,
Mar 13, 2018, 7:37:15 AM3/13/18
to
minf...@arcor.de writes:
>Would make sense but a bit convoluted:
>If the SEARCH wordlist is present, let SEARCH-WORDLIST do a case-insensitiv=
>e
>search i.e. dup DuP DUP will all be found when wid =3D FORTH-WORDLIST.
>For any other wid do a case-sensitive search.

The mainstream systems use case-insensitive search for all wordlists,
and many programs have an environmental dependency on
case-insensitivity for all wordlists. In Gforth WORDLIST produces
case-insensitive wordlist, but it also has TABLE and now CS-WORDLIST
that create case-sensitive wordlists. The difference between TABLE
and CS-WORDLIST is that words in TABLEs are not LOCATEable etc.

Mark Humphries

unread,
Mar 13, 2018, 7:39:28 AM3/13/18
to
My Forths are case sensitive. Although I code in lowercase I still like to keep the option of using uppercase for special purpose words if ever I feel the need.

Mark Wills

unread,
Mar 13, 2018, 7:55:01 AM3/13/18
to
TurboForth is switchable via the CSEN variable.

Words are stored in the dictionary as entered, with case preserved.
The CSEN switch simply alters the behaviour of FIND.

In general, I use a java like syntax:

Words: MixedCase
Variables: camelCase
Constants: UPPER_CASE

This gives me a clue later on when I'm reading source code without having
to scroll up/down to find the definition of something.

m...@iae.nl

unread,
Mar 13, 2018, 8:12:42 AM3/13/18
to
On Tuesday, March 13, 2018 at 12:37:15 PM UTC+1, Anton Ertl wrote:
> minf...@arcor.de writes:
> >Would make sense but a bit convoluted:
> >If the SEARCH wordlist is present, let SEARCH-WORDLIST do a case-insensitiv=
> >e
> >search i.e. dup DuP DUP will all be found when wid =3D FORTH-WORDLIST.
> >For any other wid do a case-sensitive search.
>
> The mainstream systems use case-insensitive search for all wordlists,
> and many programs have an environmental dependency on
> case-insensitivity for all wordlists. In Gforth WORDLIST produces
> case-insensitive wordlist, but it also has TABLE and now CS-WORDLIST
> that create case-sensitive wordlists. The difference between TABLE
> and CS-WORDLIST is that words in TABLEs are not LOCATEable etc.

iForth has the CASESENSITIVE variable. I find it hard to grasp
what is meant when somebody says their system is casesensitive,
so here's an example:
FORTH> casesensitive ? 0 ok
FORTH> ' DUP . 19141584 ok
FORTH> ' dup . 19141584 ok
FORTH> ' DuP . 19141584 ok
FORTH> casesensitive on ok
FORTH> ' DUP . 19141584 ok
FORTH> ' dup .
Error -13
dup ?

The underlying implementation searches
the dictionaries twice (typically it takes
only two probes to find a word). Words are
entered in the dictionary exactly as written.

For my work, I frequently need to specify
if words are found considering case or not.
This has resulted in iForth having SEARCH-NC
and COMPARE-NC.

To specify filenames it is necessary to have
case-sensitive input. It is also handy in a
metacompiler or when writing parsers for
other languages.

I do not like source code in all lower or upper
case. Over the years it has proved to be most
efficient to write Forth words exactly as spelled
in the standard. (This is not exactly a popular
format.) Constants are (mostly) uppercase and
variables (mostly) lowercase.

-marcel

Albert van der Horst

unread,
Mar 13, 2018, 9:59:43 AM3/13/18
to
In article <c009ac31-4e24-4c69...@googlegroups.com>,
I understand this. What I mean by ciforth is case sensitive, is
that if you run it without options are inclusions it is.

>
>The underlying implementation searches
>the dictionaries twice (typically it takes
>only two probes to find a word). Words are
>entered in the dictionary exactly as written.
>
>For my work, I frequently need to specify
>if words are found considering case or not.
>This has resulted in iForth having SEARCH-NC
>and COMPARE-NC.
>
>To specify filenames it is necessary to have
>case-sensitive input. It is also handy in a
>metacompiler or when writing parsers for
>other languages.
>
>I do not like source code in all lower or upper
>case. Over the years it has proved to be most
>efficient to write Forth words exactly as spelled
>in the standard. (This is not exactly a popular
>format.) Constants are (mostly) uppercase and
>variables (mostly) lowercase.

I agree with all of this. I think that systems that cannot
be in a case sensitive mode are defective.
All modern languages are case sensitive. (FORTRAN and LISP
are not modern).
Certainly there is loss of expressivity.

Like you I don't like specifying " this requires
case-insensitivity " in all my sources in order to write
drop instead of DROP .

>
>-marcel


groetjes Albert

Alex

unread,
Mar 13, 2018, 10:27:18 AM3/13/18
to
My Forth (https://github.com/alextangent/wf32) uses a class-like
wordlist that provides 3 methods;

Define word (used by CREATE et al)
Search (used by SEARCH-WORDLIST)
Traversal (used by TRAVERSE-WORDLIST)

Each of these methods is a deferred word, so this

VOCABULARY TEST CASE-ASIS

uses case-sensitive define, search and traverse. The default uses a
lowercase define, and a case-insensitive search and traverse. That is,
using the word SEARCH-WORDLIST will use that wordlist's or vocabulary's
particular search.

CASE-ASIS wordlists are used by the Windows FFI to support mixed case
words like GetProcAddress LoadLibrary GetLastError etc that are defined
in Windows DLLs.


--
Alex

Alex

unread,
Mar 13, 2018, 10:34:11 AM3/13/18
to
I agree with Anton to the degree that opening up a console-like window
and having to CAPS EVERYTHING TO GET IT TO UNDERSTAND is a bit 1950s
teletype.

--
Alex

Anton Ertl

unread,
Mar 13, 2018, 12:11:17 PM3/13/18
to
m...@iae.nl writes:
>Over the years it has proved to be most
>efficient to write Forth words exactly as spelled
>in the standard.

The result is something that I then probably won't read, but at least
a standard Forth system can understand it, and that's much better than
replacing "constant" with "=:".

>(This is not exactly a popular
>format.)

Right. There's a reason why all serious systems are case-insensitive.

Matthias Koch

unread,
Mar 13, 2018, 12:34:18 PM3/13/18
to
> Right. There's a reason why all serious systems are case-insensitive.

Does case-insensitivity only holds true for 'A' to 'Z' and 'a' to 'z',
or are there systems around which also handle additional characters
like Ää Öö Üü ẞß Ðð Þþ Яя Σσ ?

Albert van der Horst

unread,
Mar 13, 2018, 1:47:38 PM3/13/18
to
In article <2018Mar1...@mips.complang.tuwien.ac.at>,
Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>m...@iae.nl writes:
>>Over the years it has proved to be most
>>efficient to write Forth words exactly as spelled
>>in the standard.
>
>The result is something that I then probably won't read, but at least
>a standard Forth system can understand it, and that's much better than
>replacing "constant" with "=:".
>
>>(This is not exactly a popular
>>format.)
>
>Right. There's a reason why all serious systems are case-insensitive.

Let me guess.
The reason is that computing started in an
English culture, a language without diacritical remarks.
Bits being in short supply this allowed early systems to get
by with as little as 5.3 bits, as long as case was ignored.
Because characters in uppercase could be represented in
a 5 by 7 matrix, and lowercase could not, programming languages
tended to using uppercase characters, much like early
teletypes the successor of more code.

Enters the era of personal computers. The predominant
os of the time was MS-DOS, with primitive BASIC as computer
language and case-insensitive filenames.
This has evolved into MS-WINDOWS but the errors of the past
could never be remedied till the present day., much to everybody's
chagrin.
>
>- anton

Groetjes Albert

Anton Ertl

unread,
Mar 13, 2018, 1:51:26 PM3/13/18
to
Matthias Koch <matthi...@hot.uni-hannover.de> writes:
>> Right. There's a reason why all serious systems are case-insensitive.
>
>Does case-insensitivity only holds true for 'A' to 'Z' and 'a' to 'z',

My recommendation is to have case-insensitivity for ASCII characters
(i.e., those above), and case-sensitivity for non-ASCII characters.

Advantages:

* Standard words would be matched case-insensitively, making the large
number of programs that spell "CREATE" as "create" or "Create"
standard.

* The case-handling would not depend on the encoding (i.e., if
somebody gives you a program encoded in Latin1 and you load it in a
system set up for Latin2, the case-handling does not change).

Disadvantage:

* Users may find it surprising that case insensitivity does not extend
to Umlauts or accented characters.

Paul Rubin

unread,
Mar 13, 2018, 4:00:48 PM3/13/18
to
> My recommendation is to have case-insensitivity for ASCII characters
> (i.e., those above), and case-sensitivity for non-ASCII characters.

Do any Forths support non-ASCII characters at all? Do they give any
attention to encodings? One approach might be treat all chars as opaque
8-bit codes and leave the rest to the display device, basically relying
on something like Latin-1. The trend in other languages is to use
Unicode for everything, but by Forth standards, that's very complicated.

peter....@gmail.com

unread,
Mar 13, 2018, 4:19:56 PM3/13/18
to
Mine do also uppercase for non-ASCII

ntf64 version 0.9 compiled on 2017-12-11, ForthDll version 0.9 compiled on 2017-12-02
Running on Windows 64bit ver 6.2 build 9200
Current directory: d:\dev\forth\lxf64v9
ok
: FÄLTH ." Peter" ; ok
Fälth Peter ok

Only encoding supported is unicode. Strings are UTF8.

I started this many years ago to be able to get my name correct between
Windows and Linux.

BR
Peter

Matthias Trute

unread,
Mar 13, 2018, 4:34:29 PM3/13/18
to
Am Dienstag, 13. März 2018 21:00:48 UTC+1 schrieb Paul Rubin:
> > My recommendation is to have case-insensitivity for ASCII characters
> > (i.e., those above), and case-sensitivity for non-ASCII characters.
>
> Do any Forths support non-ASCII characters at all? Do they give any
> attention to encodings?

my amforth is at least capable to use Chinese (what? words, signs?)
see a screenshot of a working program at
https://sourceforge.net/projects/amforth/ It's opensource, but hardly
readable ;)

> The trend in other languages is to use
> Unicode for everything, but by Forth standards, that's very complicated.

IMHO only if Forth insists on case-insensitivity which is easy to
achieve in English/plain ascii only, as Anton an Matthias pointed out.

Matthias (another one)

dxf...@gmail.com

unread,
Mar 13, 2018, 8:55:26 PM3/13/18
to
On Wednesday, March 14, 2018 at 3:11:17 AM UTC+11, Anton Ertl wrote:
> m...@iae.nl writes:
> >Over the years it has proved to be most
> >efficient to write Forth words exactly as spelled
> >in the standard.
>
> The result is something that I then probably won't read, but at least
> a standard Forth system can understand it, and that's much better than
> replacing "constant" with "=:".
>
> >(This is not exactly a popular
> >format.)
>
> Right. There's a reason why all serious systems are case-insensitive.

The more claims made about what a 'serious forth
system' should have, the more autocratic and
less appealing Forth becomes. It's not ancient
Rome. Forthers do have more than one choice.

Ron Aaron

unread,
Mar 14, 2018, 12:45:03 AM3/14/18
to
Yes. 8th uses UTF-8 in strings (and identifiers). It knows about UCS-2
as well, natively.

minf...@arcor.de

unread,
Mar 14, 2018, 3:11:26 AM3/14/18
to
UTF-8 practically rules the digital text world.
Any decent modern desktop Forth ought to support UTF-8.

UTF-8 chars < 128 are identical to ASCII chars, so it wouldn't break
legacy standard names and strings.

A nice little overview:
https://www.cprogramming.com/tutorial/unicode.html

Ron Aaron

unread,
Mar 14, 2018, 3:24:26 AM3/14/18
to


On 14/03/2018 9:11, minf...@arcor.de wrote:

> UTF-8 practically rules the digital text world.
> Any decent modern desktop Forth ought to support UTF-8.

That was my thought :)

In particular, it makes interacting with modern OSes easier regarding
text display etc. And there's no good reason to avoid it.

Paul Rubin

unread,
Mar 14, 2018, 3:52:29 AM3/14/18
to
Ron Aaron <ramb...@gmail.com> writes:
> In particular, it makes interacting with modern OSes easier regarding
> text display etc. And there's no good reason to avoid it.

In Forth it seems excessive. You no longer know how many bytes are
required to store a given 20 character string, because of variable
length encoding. Plus there's all the stuff with canonical
representations, composing characters, yada yada. Text as used by
humans in the real world is complicated, so Unicode is also necessarily
complicated because it tries to completely handle the complexity of
human-directed text. Forth aims at a more limited audience and I'm
surprised that there's much Unicode support in Forth implementations. I
wouldn't count 8th for this purpose, since 8th is a Forth descendant
several generations away from traditional Forths where a cell is a
machine word, etc.

minf...@arcor.de

unread,
Mar 14, 2018, 8:51:35 AM3/14/18
to
Am Mittwoch, 14. März 2018 08:52:29 UTC+1 schrieb Paul Rubin:
> Ron Aaron <ramb...@gmail.com> writes:
> > In particular, it makes interacting with modern OSes easier regarding
> > text display etc. And there's no good reason to avoid it.
>
> In Forth it seems excessive. You no longer know how many bytes are
> required to store a given 20 character string, because of variable
> length encoding.

This is easy. Just use 2 definitions of string length: number of bytes
and number of chars. The only problem is that one has to overallocate buffers.

lehs

unread,
Mar 14, 2018, 10:19:52 AM3/14/18
to
There are options for case insensitivity. SP-Forth is a very fast and free Forth. Just saying!

Anton Ertl

unread,
Mar 14, 2018, 10:40:24 AM3/14/18
to
Paul Rubin <no.e...@nospam.invalid> writes:
>> My recommendation is to have case-insensitivity for ASCII characters
>> (i.e., those above), and case-sensitivity for non-ASCII characters.
>
>Do any Forths support non-ASCII characters at all?

Yes.

>Do they give any
>attention to encodings?

Not that much, but that's mostly unnecessary. The ones that I tested
work nicely with 8-bit encodings (such as Latin-1), and mostly work
with UTF-8. The problems that Gforth <0.7, SwiftForth, and VFX have
with UTF-8 are in command-line editing, and in pointing to the
erroneous word in error messages. We fixed that in Gforth 0.7, the
others still have these issues. But apart from these minor blemishes,
they can all deal with UTF-8, and I have tested
<http://rosettacode.org/wiki/Unicode_variable_names#Forth> in all of
them.

>One approach might be treat all chars as opaque
>8-bit codes and leave the rest to the display device, basically relying
>on something like Latin-1. The trend in other languages is to use
>Unicode for everything, but by Forth standards, that's very complicated.

Actually, Unicode encoded as UTF-8 is designed so well that it is
mostly transparent to the code that handles it.

Forth-2012 has <http://forth-standard.org/standard/xchar> for dealing
with variable-width encodings such as UTF-8, and it has been
implemented in Gforth for at least a decade, and one would use it for
dealing with individual code points, but I find that that's rarely
necessary. The few times that you deal with individual characters
(e.g., in number conversion), they are usually ASCII characters. Most
of the time you deal with strings, and there you don't care whether
it's a string of ASCII characters, a string of UTF-8 characters, or
some other string.

Anton Ertl

unread,
Mar 14, 2018, 10:54:12 AM3/14/18
to
Paul Rubin <no.e...@nospam.invalid> writes:
>In Forth it seems excessive. You no longer know how many bytes are
>required to store a given 20 character string, because of variable
>length encoding.

So what? Who cares about characters? You deal with strings of bytes,
and you know how many bytes your 25-byte string has: 25. If you
really want to know the number of code points, you can use the xchar
wordset for that; characters are harder in Unicode, thanks to
composing characters, but supposedly "character"-oriented stuff like
UTF-32 and the Python string representation does not solve this
problem, either (they, too, are about Unicode code points).

>Plus there's all the stuff with canonical
>representations, composing characters, yada yada.

Here we take the Forth solution: simplify the problem by just treating
the strings as strings of bytes.

>Text as used by
>humans in the real world is complicated, so Unicode is also necessarily
>complicated because it tries to completely handle the complexity of
>human-directed text.

Yes, but fortunately UTF-8 is designed such that you don't have to
deal with that complexity most of the time.

Anton Ertl

unread,
Mar 14, 2018, 10:58:53 AM3/14/18
to
dxf...@gmail.com writes:
>On Wednesday, March 14, 2018 at 3:11:17 AM UTC+11, Anton Ertl wrote:
>> m...@iae.nl writes:
[standard words in upper case]
>> >(This is not exactly a popular
>> >format.)
>>
>> Right. There's a reason why all serious systems are case-insensitive.
>
>The more claims made about what a 'serious forth
>system' should have, the more autocratic and
>less appealing Forth becomes. It's not ancient
>Rome. Forthers do have more than one choice.

Yes, and looking at the published Forth code, most Forthers choose to
write many standard words in lower case or in mixed case. A Forth
system that is case-sensitive won't load this code, and I certainly
don't take such a system seriously.

john

unread,
Mar 14, 2018, 11:25:30 AM3/14/18
to
In article <2018Mar1...@mips.complang.tuwien.ac.at>,
an...@mips.complang.tuwien.ac.at says...
>
> Yes, but fortunately UTF-8 is designed such that you don't have to
> deal with that complexity most of the time.
>
> - anton
>

This is true Anton but it only applies if you are trading/selling or offering your
software to anglo oriented countries. UTF8 alone doesn't account for most of the
planet.

I understand the US administration is this week looking into restrictive practices
in the Chinese communications markets - this is all well and good but if western
(US/UK/France/Germany/Switzerland/Netherlands etc. and all the other anglo
countries software industries) don't have software products capable of being
exported.- there isn't much point having governments trying to support them.

If your interest is only in tinkering around in your own back yard then all well
and good - no change required.

But lets not forget - the communication industry is about ... communication.

Not bothering because it's "complicated" is an excuse I've seen in this
newsgroup before. And we can all see how far that has got Forth and
all its potential.

--

john

=========================
http://johntech.co.uk
=========================

Gerard Sontag

unread,
Mar 14, 2018, 11:59:56 AM3/14/18
to
Le mardi 13 mars 2018 21:34:29 UTC+1, Matthias Trute a écrit :
...
> my amforth is at least capable to use Chinese (what? words, signs?)
ideogram
...
> Matthias (another one)

a...@littlepinkcloud.invalid

unread,
Mar 14, 2018, 12:11:24 PM3/14/18
to
Paul Rubin <no.e...@nospam.invalid> wrote:
> Ron Aaron <ramb...@gmail.com> writes:
>> In particular, it makes interacting with modern OSes easier regarding
>> text display etc. And there's no good reason to avoid it.
>
> In Forth it seems excessive. You no longer know how many bytes are
> required to store a given 20 character string, because of variable
> length encoding.

You don't really need to know how many characters are in a string.
Input and output can be UTF-8: all you need to know is the length of
the string in pchars. A pchar is a primitive character one byte long,
i.e. what Forth used to call a character. I/O words such as KEY,
EMIT, TYPE, and READ-FILE operate on pchars. How a terminal displays
such things is up to it.

Andrew.


http://lars.nocrew.org/forth2012/xchar.html

minf...@arcor.de

unread,
Mar 14, 2018, 1:21:22 PM3/14/18
to
Iz is not that unambiguous.

Do UTF-8 source files have to be marked with a BOM by prepending hex EF BB BF ?

a...@littlepinkcloud.invalid

unread,
Mar 14, 2018, 1:34:25 PM3/14/18
to
john <jo...@example.com> wrote:
> In article <2018Mar1...@mips.complang.tuwien.ac.at>,
> an...@mips.complang.tuwien.ac.at says...
>>
>> Yes, but fortunately UTF-8 is designed such that you don't have to
>> deal with that complexity most of the time.
>
> UTF8 alone doesn't account for most of the planet.

Why not? You can certainly argue that it's not optimal, but that's
another matter.

Andrew.

Anton Ertl

unread,
Mar 14, 2018, 1:37:00 PM3/14/18
to
minf...@arcor.de writes:
>Do UTF-8 source files have to be marked with a BOM by prepending hex EF BB =
>BF ?

Why would UTF-8 need a BOM (byte order mark)? There is only one byte
order for UTF-8.

Gforth currently does not know how to ignore such garbage, and I guess
that SwiftForth and VFX don't know it, either.

So don't put a BOM at the start of a Forth file.

a...@littlepinkcloud.invalid

unread,
Mar 14, 2018, 1:38:30 PM3/14/18
to
minf...@arcor.de wrote:
> Am Mittwoch, 14. M?rz 2018 17:11:24 UTC+1 schrieb a...@littlepinkcloud.invalid:
>> Paul Rubin <no.e...@nospam.invalid> wrote:
>> > Ron Aaron <ramb...@gmail.com> writes:
>> >> In particular, it makes interacting with modern OSes easier regarding
>> >> text display etc. And there's no good reason to avoid it.
>> >
>> > In Forth it seems excessive. You no longer know how many bytes are
>> > required to store a given 20 character string, because of variable
>> > length encoding.
>>
>> You don't really need to know how many characters are in a string.
>> Input and output can be UTF-8: all you need to know is the length of
>> the string in pchars. A pchar is a primitive character one byte long,
>> i.e. what Forth used to call a character. I/O words such as KEY,
>> EMIT, TYPE, and READ-FILE operate on pchars. How a terminal displays
>> such things is up to it.
>
> Iz is not that unambiguous.
>
> Do UTF-8 source files have to be marked with a BOM by prepending hex
> EF BB BF ?

No. That means that you have to know how to interpret the file: you
have to know its encoding. That's also true if you want to display it
on a terminal.

Andrew.

Anton Ertl

unread,
Mar 14, 2018, 1:56:43 PM3/14/18
to
john <jo...@example.com> writes:
>In article <2018Mar1...@mips.complang.tuwien.ac.at>,
>an...@mips.complang.tuwien.ac.at says...
>>
>> Yes, but fortunately UTF-8 is designed such that you don't have to
>> deal with that complexity most of the time.
>>
>> - anton
>>
>
>This is true Anton but it only applies if you are trading/selling or offering your
>software to anglo oriented countries. UTF8 alone doesn't account for most of the
>planet.

Actually, Unicode (including UTF-8) does account for all of the
planet, and, since you mention China, Unicode is the currently
relevant standard in the PRC; the PRC has standardized the GB 18030
encoding of Unicode instead of UTF-8 (I expect that UTF-8 also plays a
role for the PRC, though). GB 18030 is also designed such that you
don't have to deal with the complexity most of the time: it is an
ASCII-compatible multi-byte encoding.

>I understand the US administration is this week looking into restrictive practices
>in the Chinese communications markets - this is all well and good but if western
>(US/UK/France/Germany/Switzerland/Netherlands etc. and all the other anglo
>countries software industries)

How did France, Germany, Switzerland, and the Netherlands become
Anglo?

>don't have software products capable of being
>exported.

Actually the biggest known Forth program, from CCS, has been used in
the construction of the Hongkong airport IIRC. That actually was even
before the xchars wordset etc.

>Not bothering because it's "complicated" is an excuse I've seen in this
>newsgroup before. And we can all see how far that has got Forth and
>all its potential.

If we need the complexity, we will implement it. But as long as we
don't, there is no point.

minf...@arcor.de

unread,
Mar 14, 2018, 3:32:07 PM3/14/18
to
Whether debatable or not, here it is stated that UTF-8 accounts for 90%
of the planet's web character encoding:

https://w3techs.com/technologies/overview/character_encoding/all

minf...@arcor.de

unread,
Mar 14, 2018, 3:40:08 PM3/14/18
to
Am Mittwoch, 14. März 2018 18:37:00 UTC+1 schrieb Anton Ertl:
> minf...@arcor.de writes:
> >Do UTF-8 source files have to be marked with a BOM by prepending hex EF BB =
> >BF ?
>
> Why would UTF-8 need a BOM (byte order mark)? There is only one byte
> order for UTF-8.
>
> Gforth currently does not know how to ignore such garbage, and I guess
> that SwiftForth and VFX don't know it, either.
>
> So don't put a BOM at the start of a Forth file.

Sure. I don't intend to do so. And gforth is only your playhorse.
But thanks anyhow.

I just wondered because I was surprised recently to see that the
notepad++ editor recognizes BOMs.

john

unread,
Mar 15, 2018, 8:12:04 AM3/15/18
to
> Actually, Unicode (including UTF-8) does account for all of the
> planet, and, since you mention China, Unicode is the currently
> relevant standard in the PRC;

UTF8 is not just another name for Unicode.
My response was to the suggestion that UTF8 was sufficient.
It isn't.

> The PRC has standardized the GB 18030
> encoding of Unicode instead of UTF-8

QED

Perhaps a little less history and a bit more looking forward?

When you fail to implement functionality because you're afraid of complexity
you end up with stone age tools in the AI millenium.

And - it seems - a lot of hot air.

Gerard Sontag

unread,
Mar 15, 2018, 8:52:38 AM3/15/18
to
Le mercredi 14 mars 2018 18:56:43 UTC+1, Anton Ertl a écrit :
> john <jo...@example.com> writes:
> >In article <2018Mar1...@mips.complang.tuwien.ac.at>,
> >an...@mips.complang.tuwien.ac.at says...
....
> >I understand the US administration is this week looking into restrictive practices
> >in the Chinese communications markets - this is all well and good but if western
> >(US/UK/France/Germany/Switzerland/Netherlands etc. and all the other anglo
> >countries software industries)
>
> How did France, Germany, Switzerland, and the Netherlands become
> Anglo?
>
Please tell us!
....

Anton Ertl

unread,
Mar 15, 2018, 9:03:16 AM3/15/18
to
john <jo...@example.com> writes:
>In article <2018Mar1...@mips.complang.tuwien.ac.at>,
>an...@mips.complang.tuwien.ac.at says...
>>
>
>> Actually, Unicode (including UTF-8) does account for all of the
>> planet, and, since you mention China, Unicode is the currently
>> relevant standard in the PRC;
>
>UTF8 is not just another name for Unicode.
>My response was to the suggestion that UTF8 was sufficient.

There was no such suggestion.

>> The PRC has standardized the GB 18030
>> encoding of Unicode instead of UTF-8
>
>QED

However, GB 18030 is not very popular on the WWW. UTF-8, dominates
there, and older Chinese encodings are rare, but still more frequent
than GB 18030. And an example, the official Chinese site
<http://www.china.com.cn/> uses UTF-8, while <https://cn.china.cn/>
(apparently a shopping site) uses GBK.

>When you fail to implement functionality because you're afraid of complexity
>you end up with stone age tools in the AI millenium.

Whatever that may mean.

The nice thing about avoiding complexity by treating strings as
strings of bytes (which is possible most of the time) is that the
programs work for GB 18030 as well as for UTF-8.

And if that is not sufficient, the xchar wordset provides an interface
that can work with 8bit-encodings, UTF-8, GBK, GB18030, and other
ASCII-compatible encodings. Currently I am aware only of
implementations for 8bit and UTF-8, so if you think that we will also
need GB18030, go ahead and implement that.

Matthias Koch

unread,
Mar 16, 2018, 9:38:05 AM3/16/18
to
Thank you, Anton. This is what I currently use.


Yes, all Forths of the Mecrisp family are using UTF-8 encoding.

Mecrisp-Stellaris was tested with German, French, Russian and Japanese.

There are two places in the cores which need special handling beyond 8 bit
clean string handling to be Unicode ready:

When the user backspaces a character (fully implemented), and
for case-insensitivity. This is currently done as Anton recommends,
lowercasing 7-bit ASCII only.

Unicode specifies that beyond upper- and lowercase, there is tilecase:

"Default Case Operations"
http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf#G33992

Proper handling is difficult, but for the sake of case-insensitive string
comparison, there is a simplified method, which relies on a character table:

http://www.unicode.org/Public/8.0.0/ucd/CaseFolding.txt

But this table includes 1350 entries, an heavy weight for a small
embedded Forth. No simple logic operation exists on Unicode points
for case conversion.

Matthias

hughag...@gmail.com

unread,
Mar 17, 2018, 3:14:47 PM3/17/18
to
On Monday, March 12, 2018 at 5:03:57 PM UTC-7, dxf...@gmail.com wrote:
> In my case-insensitive system, words appear in
> the dictionary as written. Typically end-user
> words are defined in uppercase and so-called
> helpers in lower. Thus WORDS shows at a glance
> which are important.

What I did in MFX is to give each dictionary header a flag called SMUDGE-READY (in UR/Forth there were some unused bits in the flags byte of the header).
I had a LOCAL that would set the SMUDGE-READY flag of the last word defined.
I had END-MODULE that would search the entire dictionary looking for any SMUDGE-READY words.
If it found a word with SMUDGE-READY set, it would clear the SMUDGE-READY flag and set the SMUDGE flag.

In each module (source file) I would put LOCAL after all the "helper words" were only used within the module.
After the module was debugged, I would put END-MODULE at the end of the file.
This way, when the file was compiled with INCLUDE, only the API words would be exposed in the dictionary and all the helper words were smudged.
This helps a lot in reducing namespace pollution, which is a problem in a large program.
This also makes the program more readable because somebody looking at a source-file knows to only learn the API words but not bother learning the local words.

In Straight Forth I will have two levels of smudge-ready words rather than just one.
This way the private parts of a class definition can be smudged by the END-CLASS word.
Also, I can still have LOCAL and END-MODULE for the whole file.

Forth-200x doesn't have any way to smudge words from the dictionary --- it will only support small programs because of the namespace pollution problem ---
this is similar to ANS-Forth.

Rod Pemberton

unread,
Mar 19, 2018, 9:43:20 AM3/19/18
to
On Mon, 12 Mar 2018 12:20:25 -0700 (PDT)
Brad Eckert <hwf...@gmail.com> wrote:

> What are some advantages/disadvantages to using case-sensitive Forth?
> I can think of a few:
>
> Advantages:
> It behaves like most mainstream languages (C, Java, Python, etc.).
>
> Disadvantages:
> Standard words are in uppercase, which makes your code yell a lot.
>

...

> But, what's the experience of people who have actually used
> case-sensitive Forth?

Did you mean case-insensitive? Aren't most Forths already
case-sensitive? I.e., DUP will find DUP, while dup will not. That's
case sensitive. If dup finds DUP and DUP finds DUP, then it's
case-insensitive.

I do think that Forth would look more modern and be easier to read, if
it was purely lower case. Or, perhaps, core words could be upper, and
user words could be lower. This could help by providing a more
readable format.

ASCII seems to be sufficient as long as you don't need Forth in an
Asian language.


Rod Pemberton
--
feedback loop: Russian aggression -> World complains -> Russian
paranoia -> Russian threats -> repeat

0 new messages