Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Example of large Forth project with source

2,385 views
Skip to first unread message

Brian Fox

unread,
Nov 22, 2021, 8:59:12 AM11/22/21
to
Over on a hobby forum that I frequent, a couple competent people
who have written their own Forth systems but work professionally in C,
Verilog and other more popular languages are asking for examples of large
Forth projects. They know what Forth is but want to see how fluent
users deal with large projects. One quote was "I understand Forth but feel
like I still don't 'get it'"

I know that much of what they would want to see is proprietary but
are there any examples online that I could point them too?

Nickolay Kolchin

unread,
Nov 22, 2021, 9:14:17 AM11/22/21
to

Marcel Hendrix

unread,
Nov 22, 2021, 12:42:40 PM11/22/21
to
I am working on iSPICE, a circuit simulator specialized for power electronics
and digital control. Like Forth, it will never be ready.

Unfortunately I did not publish anything on it yet. It is currently in a state
where it replaces PLECS, NGSPICE, and LTspice for me personally (it is
about 10x faster overall).

-marcel

Brian Fox

unread,
Nov 22, 2021, 1:15:12 PM11/22/21
to
Thanks Nickolay and Marcel for these examples.

minf...@arcor.de

unread,
Nov 22, 2021, 2:39:58 PM11/22/21
to

David Schultz

unread,
Nov 22, 2021, 3:12:00 PM11/22/21
to
Define "large".

I don't consider it very large but...

http://davesrocketworks.com/electronics/logger/TeensyLog.html


--
http://davesrocketworks.com
David Schultz

A.T. Murray

unread,
Nov 22, 2021, 3:34:57 PM11/22/21
to

Ian Yellowley

unread,
Nov 23, 2021, 10:13:40 PM11/23/21
to
The following describes a fairly large system developed over several years
https://doi.org/10.24908/pceea.v0i0.4004
There are also many papers that describe the details... cnc language, process planning,
process optimisation, high speed profiling etc etc. I think the link given might
give some insight as to where FORTH can be extremely useful at the core of
a large complex system...
Ian

minf...@arcor.de

unread,
Nov 24, 2021, 6:58:53 AM11/24/21
to
Unsurprising (at least for me) one core element (eCAF) is a Forth system (pfe) written in C
running on top of a multitasking OS.

However, GUI management is supported by Python scripts.

Hugh Aguilar

unread,
Nov 28, 2021, 7:30:33 PM11/28/21
to
On Monday, November 22, 2021 at 6:59:12 AM UTC-7, Brian Fox wrote:
> Over on a hobby forum that I frequent...

What hobby is this forum dedicated to?
I mean, is this robotics, board-games, or what?

> a couple competent people
> who have written their own Forth systems but work professionally in C,
> Verilog and other more popular languages are asking for examples of large
> Forth projects. They know what Forth is but want to see how fluent
> users deal with large projects.

Most likely they want to see an example of Forth in a large project
because they don't believe that Forth has ever been used for a large project.

As a practical matter, a large project takes a lot of sweat-equity,
so the programmer wants to get paid --- meaning that it will be closed-source.

> One quote was "I understand Forth but feel
> like I still don't 'get it'"

I don't think that looking at someone else's code is a good way to learn.
Programmers have to learn programming by programming.

As a practical matter, to write a large project (or a medium-sized project)
successfully, it is necessary to have a code-library available that provides
general-purpose data-structures --- all non-trivial projects work with data,
and they need some way to store this data and work with it.

The student doesn't necessarily have to look at the source-code for the
code-library to use the code-library. Looking at this source-code isn't a good idea
anyway, because it is not representative of application programming.
The way that I wrote the internal workings of the novice-package is not the same
way that I write application programs --- my application programs have readability
as a priority --- the internal workings mostly had efficiency as a priority (and also
include a lot of kludges to work around the major flaws in ANS-Forth).

Brian Fox

unread,
Nov 28, 2021, 7:55:10 PM11/28/21
to
On Sunday, November 28, 2021 at 7:30:33 PM UTC-5, Hugh Aguilar wrote:
> On Monday, November 22, 2021 at 6:59:12 AM UTC-7, Brian Fox wrote:
> > Over on a hobby forum that I frequent...
>
> What hobby is this forum dedicated to?
> I mean, is this robotics, board-games, or what?

> Most likely they want to see an example of Forth in a large project
> because they don't believe that Forth has ever been used for a large project.
>
> As a practical matter, a large project takes a lot of sweat-equity,
> so the programmer wants to get paid --- meaning that it will be closed-source.
> > One quote was "I understand Forth but feel
> > like I still don't 'get it'"
> I don't think that looking at someone else's code is a good way to learn.
> Programmers have to learn programming by programming.
>
> As a practical matter, to write a large project (or a medium-sized project)
> successfully, it is necessary to have a code-library available that provides
> general-purpose data-structures --- all non-trivial projects work with data,
> and they need some way to store this data and work with it.
>
> The student doesn't necessarily have to look at the source-code for the
> code-library to use the code-library. Looking at this source-code isn't a good idea
> anyway, because it is not representative of application programming.
> The way that I wrote the internal workings of the novice-package is not the same
> way that I write application programs --- my application programs have readability
> as a priority --- the internal workings mostly had efficiency as a priority (and also
> include a lot of kludges to work around the major flaws in ANS-Forth).

It is for retro computers. Atari, Apple, Commodore, Tandy. TI etc.
I think you are right. It's hard for people to understand how you could
do big projects in Forth when they don't see the things they think are
essential. They see this thing that barely qualifies as a "language"
to most people. :-)

I agree that good libraries make the heavy lifting a little lighter.
It seems that most old hands at Forth "roll their own" or they use
the commercial system's offerings or both. ?



Hugh Aguilar

unread,
Nov 28, 2021, 8:15:31 PM11/28/21
to
On Sunday, November 28, 2021 at 5:55:10 PM UTC-7, Brian Fox wrote:
> It is for retro computers. Atari, Apple, Commodore, Tandy. TI etc.

Why would anybody want to do a big project on a retro computer?
There are some serious hardware limitations on how big of a project
can be done --- choice of language isn't the major problem!

> I think you are right. It's hard for people to understand how you could
> do big projects in Forth when they don't see the things they think are
> essential. They see this thing that barely qualifies as a "language"
> to most people. :-)
>
> I agree that good libraries make the heavy lifting a little lighter.

Yes.
Nobody has the time or the patience to re-invent the wheel every time.

This is certainly true of professional work where you get paid by the week
and there is an expectation of some results --- if a program would take
one hour in another language, you can't take all day and then say:
"Well, I have a self-balancing binary tree now, which I intend to use to
hold the data --- I might be done with the program by the end of the week."

Hobbyists' time is less valuable (has zero monetary value). Even by the
low standards of hobbyists though, spending weeks dinking around with
a simple program is a waste of time.

> It seems that most old hands at Forth "roll their own" or they use
> the commercial system's offerings or both. ?

I'm not aware of any commercial system that has any offerings
in regard to general-purpose data-structures.
SwiftForth and VFX are both completely devoid of offerings.
Stephen Pelc claims that anybody can write a better string-stack
than mine, but he doesn't have one. Also, mine has COW (copy-on-write),
so it doesn't have to shuffle entire strings around, but usually only
has to move pointers --- nobody else has done this --- but the
"old hands at Forth" on comp.lang.forth totally denounce it.
Forth fails due to bad culture, not lack of technical capability.

dxforth

unread,
Nov 28, 2021, 9:07:56 PM11/28/21
to
FWIW here's the 1986 StarFlight source (some of it at least). It was written
in a modified PolyForth.

https://web.archive.org/web/20210313153919/http://www.oocities.org/timessquare/maze/4979/SFFiles.zip

dxforth

unread,
Nov 28, 2021, 9:16:51 PM11/28/21
to
On 29/11/2021 12:15, Hugh Aguilar wrote:
> ...
> Forth fails due to bad culture, not lack of technical capability.

The tradition of stoning prophets is probably bad form :)

Brian Fox

unread,
Nov 29, 2021, 8:58:31 AM11/29/21
to
On Sunday, November 28, 2021 at 8:15:31 PM UTC-5, Hugh Aguilar wrote:
> Why would anybody want to do a big project on a retro computer?

They don't want to do big projects on retro computers.

After being exposed to Forth they want to know how can anyone
make a large project with such a strange thing. They did not
believe it is possible.

Nickolay Kolchin

unread,
Nov 29, 2021, 9:43:14 AM11/29/21
to
There were many Forth applications between 1980-1990. Actually, C
compilers were dumb, processors were poorly suited to run C/Pascal code
and Forth was the only sane way to develop for embedded. Forth was really
big at that time.

But I doubt, that "large project" (in modern terms, like Linux kernel for
example) in Forth ever existed.

Anton Ertl

unread,
Nov 29, 2021, 10:28:12 AM11/29/21
to
Nickolay Kolchin <nbko...@gmail.com> writes:
>But I doubt, that "large project" (in modern terms, like Linux kernel for
>example) in Forth ever existed.

There is one large project, the CCS software, which had 1.2M lines of
code last time the number was reported here. Linux had 27M lines of
code in 2020, but I think that 1.2M lines is plenty large.

For comparison, a recent Gforth development version contains 108,650
lines in .fs files and 3,309 lines in .4th files. Looking at only
files included in the AMD64 image, it's 19,867 lines of .fs files (no
.4th files). The rest are tools, test files, files for other
architectures, Minos2, etc.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2021: https://euro.theforth.net/2021

Nickolay Kolchin

unread,
Nov 29, 2021, 11:29:48 AM11/29/21
to
On Monday, November 29, 2021 at 6:28:12 PM UTC+3, Anton Ertl wrote:
> Nickolay Kolchin <nbko...@gmail.com> writes:
> >But I doubt, that "large project" (in modern terms, like Linux kernel for
> >example) in Forth ever existed.
> There is one large project, the CCS software, which had 1.2M lines of
> code last time the number was reported here. Linux had 27M lines of
> code in 2020, but I think that 1.2M lines is plenty large.

I'm curious, were they always developing in Forth since 1978? But this is
"large" (>1M sloc) application. I've actually downloaded trial to check if
it is still in Forth. It is. :)

Probably their lead developer is a huge Forth fan...

dxforth

unread,
Nov 29, 2021, 8:45:20 PM11/29/21
to
MPE regularly quotes CCS' forth application as having a million lines
of code. I gather the latter has been around since the 1980's.
I imagine writing a large application is like running a large business
i.e. good management. Not that I could do either.

dxforth

unread,
Nov 30, 2021, 4:56:26 AM11/30/21
to

Nickolay Kolchin

unread,
Nov 30, 2021, 5:13:32 AM11/30/21
to
Well, can somebody ask Willem Botha to write a postmortem to their
software history? They are probably the largest Forth users in the world.
Like Janestreet for Ocaml.

Albert van der Horst

unread,
Dec 14, 2021, 6:42:29 AM12/14/21
to
On Monday, November 22, 2021 at 2:59:12 PM UTC+1, Brian Fox wrote:
Most examples are tools. That is not interesting because they cannot serve
as an example how to write applications.

The following come with executables that you can just run.

A disassembler reverse engineering tool:
https://github.com/albertvanderhorst/ciasdis
Famous for disassembling colorforth,

The win32forth distribution has several applications, such as a sudoku interactive solver
with colors, changing size, menu's what not.

Then there is manx2. Originally intended to play mechanical instruments,
it plays musical scores on the internal multi media of MS-windows as one
of the instruments. It awaits publication.
The program runs on 32/64 bit linux, intel and arm and 32 bits windows,
as standalone program, despite the hardware differences.

[The original iforth come with manx, but this is not a standalone program, and
you have to buy a iforth license, but the manx source is available.
This is a substantial program but probably not where you're after.
A plethora of iforthism among other things makes it unportable.
(and I have tried, and I have decennia experience in maintenance of often
poorly documented/designed software.)
manx2 is a rewrite, keeping the original format of musical scores. ]

Brian Fox

unread,
Dec 14, 2021, 8:22:39 PM12/14/21
to
Thanks Albert.

I think I made some headway after giving links to VFX, SwiftForth and the
NASA projects and a link I found to SP-Forth libraries which are quite
comprehensive.

Most of the forum people did not know there were commercial dev. systems
or compilers that went beyond the hobby systems one finds online or that
Forth was ever used for anything serious.

dxforth

unread,
Dec 14, 2021, 9:07:27 PM12/14/21
to
On 15/12/2021 12:22, Brian Fox wrote:
> ...
> Most of the forum people did not know there were commercial dev. systems
> or compilers that went beyond the hobby systems one finds online or that
> Forth was ever used for anything serious.

They'd not heard of FORTH Inc ? Must be some youngsters in that group :)

minf...@arcor.de

unread,
Dec 15, 2021, 4:00:45 AM12/15/21
to
You mean below 60? Unbelievable!

Hugh Aguilar

unread,
Dec 16, 2021, 1:37:32 AM12/16/21
to
When has a Forth Inc. system (PolyForth or SwiftForth) ever been used for anything serious?
I'm only aware of crap such as this:
https://groups.google.com/g/comp.lang.forth/c/D8s9cQwQGjQ/m/UDXMAIeARGIJ
Also, Elizabeth Rather endlessly brags about the Saudi Arabia airport project that was
written in PolyForth for the PDP-11 --- that is ancient history --- continuing to brag about this
in the 21st century inspires the question: "What have you done after Charles Moore left?"

To the best of my knowledge, the only commercial Forth system that was ever used
routinely for serious commercial programming was UR/Forth from LMI. I wrote MFX in UR/Forth.
In the early 1990s, getting a job as a Forth programmer primarily involved passing a job interview
in which you were required to demonstrate knowledge of UR/Forth internal workings.
In the good old days, UR/Forth was the only serious Forth system available --- this largely
kept Forth afloat (other Forth systems, such as those by Tom Almy, did get some serious use,
but they weren't keeping the leaky Forth boat afloat).

The purpose of ANS-Forth was to declare by fiat that UR/Forth was non-standard and that
Forth Inc. defined what was standard Forth --- when ANS-Forth came out, LMI went out of business
and Forth died --- Forth was never used for a serious project again.

I wrote MFX in 1994 after LMI went out of business. My employer Testra had the source-code
for UR/Forth, so UR/Forth was actually used for a serious project after ANS-Forth had killed
UR/Forth. MFX was the last hurrah for Forth! The MiniForth processor was the last serious
project done in Forth. Testra continues to use UR/Forth today (they have the source-code
and they upgraded UR/Forth to work under Windows 10) --- so, just like in a George Romero
movie, the dead continue to march forward into the future --- but John Hart is approaching
80 years old, so the UR/Forth zombie won't continue forever, but will soon be d-d-d-dead!

To the best of my knowledge, John Hart has retired from Forth --- he is riding his
pro-life hobby-horse now and no longer programming --- he was the last serious
Forth programmer, so Forth died when he retired.

I have said before (https://groups.google.com/g/comp.lang.forth/c/wydQr643gX0/m/mcIND8h6CgAJ)
that the Forth community today is similar to the Donner Party. Forth is no longer viable,
but the self-proclaimed Forth experts strive to survive by cannibalizing those
Forth programmers (such as myself) who had actual achievements. For example,
VFX is just a pirated version of the public-domain SP-Forth that MPE sells for big $$$.
Similarly, MPE is selling a "clean-room implementation of the proprietary RTX-2000
(https://www.mpeforth.com/software/rtx2000-and-rtx2010rh-tools/) --- nobody does
original Forth programming anymore, but the Forth-200x committee are just
disgusting carrion eaters surviving on other people's accomplishments.

Nickolay Kolchin

unread,
Dec 16, 2021, 1:56:20 AM12/16/21
to
On Thursday, December 16, 2021 at 9:37:32 AM UTC+3, Hugh Aguilar wrote:
>
> To the best of my knowledge, the only commercial Forth system that was ever used
> routinely for serious commercial programming was UR/Forth from LMI. I wrote MFX in UR/Forth.
> In the early 1990s, getting a job as a Forth programmer primarily involved passing a job interview
> in which you were required to demonstrate knowledge of UR/Forth internal workings.
> In the good old days, UR/Forth was the only serious Forth system available --- this largely
> kept Forth afloat (other Forth systems, such as those by Tom Almy, did get some serious use,
> but they weren't keeping the leaky Forth boat afloat).

https://ribccs.com/ -- This is written in MPEForth. You can download trial version
and check yourself. And it is huge...

> I have said before (https://groups.google.com/g/comp.lang.forth/c/wydQr643gX0/m/mcIND8h6CgAJ)
> that the Forth community today is similar to the Donner Party. Forth is no longer viable,
> but the self-proclaimed Forth experts strive to survive by cannibalizing those
> Forth programmers (such as myself) who had actual achievements. For example,
> VFX is just a pirated version of the public-domain SP-Forth that MPE sells for big $$$.
> Similarly, MPE is selling a "clean-room implementation of the proprietary RTX-2000
> (https://www.mpeforth.com/software/rtx2000-and-rtx2010rh-tools/) --- nobody does
> original Forth programming anymore, but the Forth-200x committee are just
> disgusting carrion eaters surviving on other people's accomplishments.

Doubt.

1. VFX is the only Forth that does some kind of control flow analyse during
optimisation pass.

2. VFX is ported to several architectures: AMD64, ARM. This is impossible
with SP-Forth, cause its optimiser works on machine code sequences. AFAIK,
SP wasn't even ported to 64-bit despite several attempts.

dxforth

unread,
Dec 16, 2021, 5:48:13 AM12/16/21
to
"Why, sometimes I've believed as many as six impossible things before breakfast."
- Lewis Carroll

Marcel Hendrix

unread,
Dec 16, 2021, 6:06:37 AM12/16/21
to
On Thursday, December 16, 2021 at 7:56:20 AM UTC+1, Nickolay Kolchin wrote:
[..]
> 1. VFX is the only Forth that does some kind of control flow analyse during
> optimisation pass.
[..]
[1] iForth x64 server 1.32 (console), Jul 11 2021, 09:43:54.
[2] Stuffed iForth at $0109D7E0 [entry: $01100000]
[3] Having a Windows terminal.
[4] Console is active.
[5] Sound devices are internal.

iForth version 6.9.109, generated 18:39:31, September 27, 2021.
x86_64 binary, native floating-point, extended precision.
Copyright 1996 - 2021 Marcel Hendrix.
[6] Use --- iForth.prf
Creating --- Locate support Version 2.01 ---
Creating --- Several utilities Version 3.53 ---
Creating --- Extended OS words Version 3.19 ---
Creating --- Terminal Driver Version 3.60 ---
Creating --- Command line Editor Version 1.36 ---
Creating --- Online help Version 1.36 ---
Creating --- Glossary Generator Version 1.05 ---
Creating --- Introspection Version 0.01 ---
Creating --- Disassembler Version 2.41 ---

FORTH> : test1 cr timer-reset 0 #1000000000 0 do i + loop .elapsed space . ; ok
FORTH> : test2 cr timer-reset #1000000000 1- 0 begin over while over + -1 under+ repeat nip .elapsed space . ; ok
FORTH> test1 test2
0.516 seconds elapsed. 499999999500000000
0.432 seconds elapsed. 499999999500000000 ok
FORTH> see test1
Flags: ANSI
$0133D800 : test1
$0133D80A lea rbp, [rbp -8 +] qword
$0133D80E mov [rbp 0 +] qword, $0133D81B d#
$0133D816 jmp CR+10 ( $012C8F2A ) offset NEAR
$0133D81B lea rbp, [rbp -8 +] qword
$0133D81F mov [rbp 0 +] qword, $0133D82C d#
$0133D827 jmp TIMER-RESET+10 ( $012CA52A ) offset NEAR
$0133D82C push 0 b#
$0133D82E mov rcx, $3B9ACA00 d#
$0133D835 xor rbx, rbx
$0133D838 call (DO) offset NEAR
$0133D842 nop
$0133D843 lea rax, [rax 0 +] qword
$0133D848 mov rdi, [rbp 0 +] qword
$0133D84C lea rbx, [rbx rdi*1] qword
$0133D850 add [rbp 0 +] qword, 1 b#
$0133D855 add [rbp 8 +] qword, 1 b#
$0133D85A jno $0133D848 offset NEAR
$0133D860 add rbp, #24 b#
$0133D864 push rbx
$0133D865 lea rbp, [rbp -8 +] qword
$0133D869 mov [rbp 0 +] qword, $0133D876 d#
$0133D871 jmp .ELAPSED+10 ( $012CA6BA ) offset NEAR
$0133D876 lea rbp, [rbp -8 +] qword
$0133D87A mov [rbp 0 +] qword, $0133D887 d#
$0133D882 jmp SPACE+10 ( $01249A3A ) offset NEAR
$0133D887 jmp .+10 ( $0124A102 ) offset NEAR
$0133D88C ;
FORTH> see test2
Flags: ANSI
$0133D8C0 : test2
$0133D8CA lea rbp, [rbp -8 +] qword
$0133D8CE mov [rbp 0 +] qword, $0133D8DB d#
$0133D8D6 jmp CR+10 ( $012C8F2A ) offset NEAR
$0133D8DB lea rbp, [rbp -8 +] qword
$0133D8DF mov [rbp 0 +] qword, $0133D8EC d#
$0133D8E7 jmp TIMER-RESET+10 ( $012CA52A ) offset NEAR
$0133D8EC mov rcx, $3B9AC9FF d#
$0133D8F3 xor rbx, rbx
$0133D8F6 nop
$0133D8F7 nop
$0133D8F8 cmp rcx, 0 b#
$0133D8FC push rcx
$0133D8FD je $0133D913 offset NEAR
$0133D903 pop rdi
$0133D904 lea rax, [rdi -1 +] qword
$0133D908 push rax
$0133D909 lea rbx, [rbx rdi*1] qword
$0133D90D pop rcx
$0133D90E jmp $0133D8F8 offset SHORT
$0133D910 push rcx
$0133D911 push rbx
$0133D912 pop rbx
$0133D913 pop rdi
$0133D914 push rbx
$0133D915 lea rbp, [rbp -8 +] qword
$0133D919 mov [rbp 0 +] qword, $0133D926 d#
$0133D921 jmp .ELAPSED+10 ( $012CA6BA ) offset NEAR
$0133D926 lea rbp, [rbp -8 +] qword
$0133D92A mov [rbp 0 +] qword, $0133D937 d#
$0133D932 jmp SPACE+10 ( $01249A3A ) offset NEAR
$0133D937 jmp .+10 ( $0124A102 ) offset NEAR
$0133D93C ;
FORTH>

-marcel

Nickolay Kolchin

unread,
Dec 16, 2021, 6:24:07 AM12/16/21
to
Sorry, I don't have access to this version. iForth4 doesn't have any...

Anton Ertl

unread,
Dec 16, 2021, 7:39:37 AM12/16/21
to
Nickolay Kolchin <nbko...@gmail.com> writes:
>1. VFX is the only Forth that does some kind of control flow analyse during
>optimisation pass.

This claim contains two subclaims:

1a. VFX does some kind of control flow analysis during optimisation.

Can you elaborate on this claim? What kind of control-flow analysis
does VFX perform and do you have an example where this shows up in the
resulting code.

1b. There are no other Forth systems that perform any kind of
control-flow analysis.

On what basis do you make this claim?

My impression is that VFX does not do anything with control flow
except inlining. My impression is that iForth does funny things, and
I would not be surprised if it did things that can be called
"control-flow analysis".

Nickolay Kolchin

unread,
Dec 16, 2021, 8:34:49 AM12/16/21
to
On Thursday, December 16, 2021 at 3:39:37 PM UTC+3, Anton Ertl wrote:
> Nickolay Kolchin <nbko...@gmail.com> writes:
> >1. VFX is the only Forth that does some kind of control flow analyse during
> >optimisation pass.
> This claim contains two subclaims:
>
> 1a. VFX does some kind of control flow analysis during optimisation.
>
> Can you elaborate on this claim? What kind of control-flow analysis
> does VFX perform and do you have an example where this shows up in the
> resulting code.
>
> 1b. There are no other Forth systems that perform any kind of
> control-flow analysis.
>
> On what basis do you make this claim?
>
> My impression is that VFX does not do anything with control flow
> except inlining. My impression is that iForth does funny things, and
> I would not be surprised if it did things that can be called
> "control-flow analysis".
> - anton

1. I have only access to iForth4 and it doesn't look competive against
lxf and vfxforth.

2. I've made simple tests (can't find ones based on IF/ELSE, sorry)

\ Only VFXForth is clever enough to simplify no-operation to single RET
: pure2 ( n n - ) DUP * + ;
: no-operation ( -- ) 5 4 pure2 DROP ;

\ code for ttt1 and ttt2 must be equal
\ Only VfxForth and LXF do this
: ttt1 3 + ;
: ttt2 2 swap 3 + nip ;

Vfx is ok at dead code removal.

3. From my impressions, VfxForth currently is by far the most
sophisticated Forth implementation.

4. Only two Forths can be called "optimizing compilers": VfxForth
and LXF.

5. The claim that Pelc periodically made about "as complicated as C
compilers" is complete bullshit. No Forths are even near modern (this
century) C compilers.

minf...@arcor.de

unread,
Dec 16, 2021, 8:55:23 AM12/16/21
to
Well... I don't want to heat any flames. Just for sake of completeness:
At least MinForth compiles standard Forth words to simple C and leaves
all higher optimization to a C compiler. The idea came up (long time ago)
also because modern C compilers can do much better optimization "for free".
Trying to reinvent all optimization techniques in Forth would never have paid off.

Marcel Hendrix

unread,
Dec 16, 2021, 9:03:15 AM12/16/21
to
On Thursday, December 16, 2021 at 2:34:49 PM UTC+1, Nickolay Kolchin wrote:
[..]
> 1. I have only access to iForth4 and it doesn't look competive against
> lxf and vfxforth.
>
> 2. I've made simple tests (can't find ones based on IF/ELSE, sorry)
>
[..]
> 4. Only two Forths can be called "optimizing compilers": VfxForth
> and LXF.
FORTH> : pure2 ( n n - ) DUP * + ; ok
FORTH> : no-operation ( -- ) 5 4 pure2 DROP ; ok
FORTH> ' no-operation idis
$0133D780 : no-operation
$0133D78A ;
FORTH> see pure2
Flags: TOKENIZE, ANSI
: pure2 DUP * + ; ok
FORTH> ' pure2 idis
$0133D700 : pure2
$0133D70A pop rbx
$0133D70B imul rbx, rbx
$0133D70F pop rdi
$0133D710 lea rbx, [rdi rbx*1] qword
$0133D714 push rbx
$0133D715 ;
FORTH> \ code for ttt1 and ttt2 must be equal ok
FORTH> \ Only VfxForth and LXF do this ok
FORTH> : ttt1 3 + ; ok
FORTH> : ttt2 2 swap 3 + nip ; ok
FORTH> ' ttt1 idis
$0133D800 : ttt1
$0133D80A pop rbx
$0133D80B lea rbx, [rbx 3 +] qword
$0133D80F push rbx
$0133D810 ;
FORTH> : ttt2 2 swap 3 + nip ;
Redefining `ttt2` ok
FORTH> ' ttt2 idis
$01340100 : ttt2
$0134010A pop rbx
$0134010B lea rbx, [rbx 3 +] qword
$0134010F push rbx
$01340110 ;

There is a difference between SEE (tries to show source)
and IDIS (disassembly). Furthermore, the inlining process takes
the operand types and the stack picture of the calling words
into account.

>
> 5. The claim that Pelc periodically made about "as complicated as C
> compilers" is complete bullshit. No Forths are even near modern (this
> century) C compilers.

There may be a difference between "as complicated as"
and "as good as" or "as efficient at getting the job done."


-marcel

Nickolay Kolchin

unread,
Dec 16, 2021, 9:17:57 AM12/16/21
to
On Thursday, December 16, 2021 at 5:03:15 PM UTC+3, Marcel Hendrix wrote:
> On Thursday, December 16, 2021 at 2:34:49 PM UTC+1, Nickolay Kolchin wrote:
> [..]
> > 1. I have only access to iForth4 and it doesn't look competive against
> > lxf and vfxforth.
> >
> > 2. I've made simple tests (can't find ones based on IF/ELSE, sorry)
> >
> [..]
> > 4. Only two Forths can be called "optimizing compilers": VfxForth
> > and LXF.
>
> There is a difference between SEE (tries to show source)
> and IDIS (disassembly). Furthermore, the inlining process takes
> the operand types and the stack picture of the calling words
> into account.

I know about idis, but on I4 I've got this and assumed that idis doesn't
work.

FORTH> ' no-operation idis
$01208400 : [trashed] 488BC04883ED088F4500 H.@H.m..E.

> >
> > 5. The claim that Pelc periodically made about "as complicated as C
> > compilers" is complete bullshit. No Forths are even near modern (this
> > century) C compilers.
> There may be a difference between "as complicated as"
> and "as good as" or "as efficient at getting the job done."
>

I think you understand what I mean.

Anton Ertl

unread,
Dec 16, 2021, 11:34:42 AM12/16/21
to
"minf...@arcor.de" <minf...@arcor.de> writes:
>Nickolay Kolchin schrieb am Donnerstag, 16. Dezember 2021 um 14:34:49 UTC+1:
>> On Thursday, December 16, 2021 at 3:39:37 PM UTC+3, Anton Ertl wrote:
>> > Nickolay Kolchin <nbko...@gmail.com> writes:
>> > >1. VFX is the only Forth that does some kind of control flow analyse during
>> > >optimisation pass.
...
>> > My impression is that VFX does not do anything with control flow
>> > except inlining.
...
>> 2. I've made simple tests (can't find ones based on IF/ELSE, sorry)
>>
>> \ Only VFXForth is clever enough to simplify no-operation to single RET
>> : pure2 ( n n - ) DUP * + ;
>> : no-operation ( -- ) 5 4 pure2 DROP ;

I don't see any control-flow analysis here, only inlining.

Interestingly, if I inline this manually, the constant folder of
Gforth also optimizes this away (which says more about this test than
about the strength of Gforth's optimizations):

Gforth 0.7.9_20211118
: no-operation 5 4 DUP * + DROP ; ok
see no-operation
: no-operation ; ok

Concerning iForth, here's iForth 2.1.2541:

FORTH> : pure2 ( n n - ) DUP * + ; ok
FORTH> : no-operation ( -- ) 5 4 pure2 DROP ; ok
FORTH> ' no-operation idis
$08166300 : [trashed] 8BC083ED048F4500 .@.m..E.
$08166308 ; 8B450083C504 .E..E. ok

No code between : and ;, like for VFX (but iForth uses a different
definition entry and exit).

>> \ code for ttt1 and ttt2 must be equal
>> \ Only VfxForth and LXF do this
>> : ttt1 3 + ;
>> : ttt2 2 swap 3 + nip ;

These are straight-line words, no control-flow analysis necessary.

And here's iForth 2.1.2541:

$08166380 : [trashed] 8BC083ED048F4500 .@.m..E.
$08166388 pop ebx 5B [
$08166389 lea ebx, [ebx 3 +] dword 8D5B03 .[.
$0816638C push ebx 53 S
$0816638D ; 8B450083C504 .E..E. ok
FORTH> ' ttt2 idis
$08166400 : [trashed] 8BC083ED048F4500 .@.m..E.
$08166408 pop ebx 5B [
$08166409 lea ebx, [ebx 3 +] dword 8D5B03 .[.
$0816640C push ebx 53 S
$0816640D ; 8B450083C504 .E..E. ok

>> 4. Only two Forths can be called "optimizing compilers": VfxForth
>> and LXF.

Lxf did not inline last I looked, iForth does. All three are analytic
about the data stack (within straight-line code). Concerning being
analytic about other data:

VFX LXF iForth
no yes part return stack
no no no FP stack
no yes no locals

Concerning control flow, my impression is that VFX does nothing except
inlining, LXF does nothing, and iForth inlines and sometimes does
something else (or I just don't understand what it is doing).

The versions used are:

VFX Forth 64 5.11 RC2 [build 0112]
lxf 1.6-982-823
iforth 5.1-mini

Tests definitions:
: return-stack >r r> ;
: fp-stack fswap fswap ;
: locals {: a :} a ; \ : locals locals| a | a ;

The disassemblies are:

VFX:
see return-stack
RETURN-STACK
( 004E3E60 53 ) PUSH RBX
( 004E3E61 5B ) POP RBX
( 004E3E62 C3 ) RET/NEXT
( 3 bytes, 3 instructions )
ok
see fp-stack
FP-STACK
( 004E3EA0 E87BF7FDFF ) CALL 004C3620 FSWAP
( 004E3EA5 E876F7FDFF ) CALL 004C3620 FSWAP
( 004E3EAA C3 ) RET/NEXT
( 11 bytes, 3 instructions )
ok
see locals
LOCALS
( 004E3EE0 488BD4 ) MOV RDX, RSP
( 004E3EE3 53 ) PUSH RBX
( 004E3EE4 52 ) PUSH RDX
( 004E3EE5 57 ) PUSH RDI
( 004E3EE6 488BFC ) MOV RDI, RSP
( 004E3EE9 4881EC00000000 ) SUB RSP, # 00000000
( 004E3EF0 488B5D00 ) MOV RBX, [RBP]
( 004E3EF4 488D6D08 ) LEA RBP, [RBP+08]
( 004E3EF8 488D6DF8 ) LEA RBP, [RBP+-08]
( 004E3EFC 48895D00 ) MOV [RBP], RBX
( 004E3F00 488B5F10 ) MOV RBX, [RDI+10]
( 004E3F04 488B6708 ) MOV RSP, [RDI+08]
( 004E3F08 488B3F ) MOV RDI, 0 [RDI]
( 004E3F0B C3 ) RET/NEXT
( 44 bytes, 14 instructions )

LXF (I used shorter word names because pasting into LXF does not work):
see r
8691F10 804FBE4 1 88C8000 5 normal R

804FBE4 C3 ret near
ok
see f
8691F24 804FBE5 5 88C8000 5 normal F

804FBE5 D9C9 fxch ST(1)
804FBE7 D9C9 fxch ST(1)
804FBE9 C3 ret near
ok
see l
8691F38 804FBEA 1 88C8000 5 normal L

804FBEA C3 ret near
ok

iForth
$10226000 : return-stack 488BC04883ED088F4500 H.@H.m..E.
$1022600A mov rbx, [rsp] qword
488B1C64 H..d
$1022600E ; 488B45004883C508FFE0 H.E.H.E..`
$10226018 nop 90 . ok
FORTH> see fp-stack
Flags: TOKENIZE, ANSI
: fp-stack FSWAP FSWAP ; ok
FORTH> ' fp-stack idis
$10226080 : fp-stack 488BC04883ED088F4500 H.@H.m..E.
$1022608A fpop, 41DB6D00D9C94D8D6D10 A[m.YIM.m.
$10226094 fxch ST(1) D9C9 YI
$10226096 fxch ST(1) D9C9 YI
$10226098 fpush, 4D8D6DF0D9C941DB7D00 M.mpYIA[}.
$102260A2 ; 488B45004883C508FFE0 H.E.H.E..` ok
FORTH> ' locals idis
$10226100 : locals 488BC04883ED088F4500 H.@H.m..E.
$1022610A pop rbx 5B [
$1022610B lea rsi, [rsi #-16 +] qword
488D76F0 H.vp
$1022610F mov [esi] dword, rbx
48891E H..
$10226112 push [rsi] qword FF36 .6
$10226114 add rsi, #16 b# 4883C610 H.F.
$10226118 ; 488B45004883C508FFE0 H.E.H.E..` ok

Nickolay Kolchin

unread,
Dec 16, 2021, 12:10:43 PM12/16/21
to
You are probably right. I thought too good about VFX. This makes
LXF the most advanced (from Forths I have access to).

Marcel Hendrix

unread,
Dec 16, 2021, 2:27:04 PM12/16/21
to
On Thursday, December 16, 2021 at 3:17:57 PM UTC+1, Nickolay Kolchin wrote:
[..]
> I know about idis, but on I4 I've got this and assumed that idis doesn't
> work.
>
> FORTH> ' no-operation idis
> $01208400 : [trashed] 488BC04883ED088F4500 H.@H.m..E.

The IDIS ( addr -- ) in the current release will find the name (if there
is one) of the code snippet passed by the address addr. *Not* having the
name but only "[thrashed]", combined with the fact that two lines are
snipped and the dump is mangled by improper tab and space
expansion/compression in the terminal and/or by reader software,
causes the current confusion.

FORTH> : pure2 ( n n - ) DUP * + ; ok
FORTH> : no-operation ( -- ) 5 4 pure2 DROP ; ok
FORTH> ' no-operation idis
\ 23456789012345678901234567890123456789012345678901234567890123456789|
$0133D780 : no-operation 488BC04883ED088F4500 H.@H.m..E.
$0133D78A ; 488B45004883C508FFE0 H.E.H.E..`
$0133D794 nop 90 . ok

At address $0133D3780 we have the iForth prelude. Its hex code
is show from column 34 onwards (488BC0 ... ). This code is re-interpreted
as text from column 60 onwards.

The iForth postlude is shown from address $0133D78A onwards (488B45..).
For this definition there is nothing between prelude and postlude, which means
that no bytes will be inlined when no-operation is used in other column
definitions.

This might be clearer when the address of no-operation is not passed exactly:

FORTH> ' no-operation 6 - idis
$0133D77A nop 90 .
$0133D77B nop 90 .
$0133D77C nop 90 .
$0133D77D nop 90 .
$0133D77E nop 90 .
$0133D77F nop 90 .
$0133D780 : no-operation 488BC04883ED088F4500 H.@H.m..E.
$0133D78A ; 488B45004883C508FFE0 H.E.H.E..`
$0133D794 nop 90 .
$0133D795 nop 90 .
$0133D796 nop 90 .
$0133D797 nop 90 .

> > There may be a difference between "as complicated as"
> > and "as good as" or "as efficient at getting the job done."
> >
> I think you understand what I mean.

No, you are not posting here long enough for me to be able
to do that.

At this point in time, I suspect you are deliberately abrasive
and wilfully misunderstanding in order to cause discussion
on subjects that you find interesting. I don't mind at all, as
you appear to do this armed with observations that take
considerable time for you to research.

-marcel

dxforth

unread,
Dec 16, 2021, 6:23:40 PM12/16/21
to
On 17/12/2021 01:17, Nickolay Kolchin wrote:
> On Thursday, December 16, 2021 at 5:03:15 PM UTC+3, Marcel Hendrix wrote:
>> On Thursday, December 16, 2021 at 2:34:49 PM UTC+1, Nickolay Kolchin wrote:
>> [..]
>> >
>> > 5. The claim that Pelc periodically made about "as complicated as C
>> > compilers" is complete bullshit. No Forths are even near modern (this
>> > century) C compilers.
>> There may be a difference between "as complicated as"
>> and "as good as" or "as efficient at getting the job done."
>>
>
> I think you understand what I mean.
>

I'm somewhat confused. If the only measure between Forth and C is the compiler,
then the implication is a Forth programmer has no more control over outcome than
a C programmer. While such may be the end result of Standard Forth, which promotes
a mindless conformity, we're not there yet by any stretch.

Paul Rubin

unread,
Dec 16, 2021, 10:22:21 PM12/16/21
to
dxforth <dxf...@gmail.com> writes:
> I'm somewhat confused. If the only measure between Forth and C is the
> compiler, then the implication is a Forth programmer has no more
> control over outcome than a C programmer. While such may be the end
> result of Standard Forth, which promotes a mindless conformity, we're
> not there yet by any stretch.

I read the exchange somewhat differently:

1) Forth has traditionally been implemented as an interpreter or very
simple compiler.

2) But in fact, relatively fancy Forth optimizing compilers exist, such
as VFX. They do the kinds of optimizations you can find in compiler
books like the Dragon book, and produce target code comparable to that
of (say) fancy C compilers from the 1980s or 1990s.

3) The claim that might be under dispute is that C compilers have
advanced since the 1990s, so Forth compilers are somewhat behind C
compilers in terms of optimizations.

My own feeling is that this just isn't important. For almost all
application areas (measuring the space of application areas by the
number of programmers working in a given area at a given moment),
extreme optimization of the target code is just not that important.

I'm currently involved in 3 different unrelated software projects. One
is in Python, one is in C, and one is in C++.

a) The Python one could run 10x faster by rewriting in C++, but it's
not worth the effort. The Python one takes a few seconds to run
and that is fast enough for the purpose.

b) The C++ one really was written for speed (it was originally in
Python), and I'd happily take another 10% speedup if it just meant
upgrading compilers, but that doesn't matter so much either. It is
probably within 2x of the fastest feasible implementation and that
is good enough for me. I can get a 2x speedup by simply running it
on two computers in parallel, after all. Or a 100x speedup by
running it on 100 computers. Computer cycles are cheap enough that
this is the most cost-effective approach an awful lot of the time.

c) The C one runs on a very resource constrained MCU, but the main
constraint is code space rather than CPU cycles. So it is compiled
with gcc -Os (optimize for size), which generates inefficient code
at times. The program has some inline assembler in a few places
where this matters, and that too is good enough.

Nickolay Kolchin

unread,
Dec 16, 2021, 10:42:43 PM12/16/21
to
The confusion come from two parts:

1. In my iForth version I get this:

FORTH> ' no-operation idis
$01208400 : [trashed] 488BC04883ED088F4500 H.@H.m..E.

2. In samples you post to mailing list:

FORTH> ' no-operation idis
$0133D780 : no-operation
$0133D78A ;

I.e

- There is no word name in my output, just "[trashed]".
- Your output doesn't have "hex/ascii" part.

But now I see, that after "trashed" line we get proper
assembler.

> > > There may be a difference between "as complicated as"
> > > and "as good as" or "as efficient at getting the job done."
> > >
> > I think you understand what I mean.
> No, you are not posting here long enough for me to be able
> to do that.
>

I thought I was clear enough: optimisations made by Forth
compilers are primitive and shouldn't be even compared to
modern C compilers.

I.e. this sentence is a blatant lie:

"Modern Forth implementations usually generate optimised
native code, and the good ones produce code of the same
quality as the good C compilers."

And this is a sad thing, because 30 years ago the situation
was different. From ForthCMP Readme:

"When CFORTH was sold, it ran faster than C compilers, and created
executables that were far smaller and faster. C compiler technology is
catching up; for instance the most recent versions of Microsoft and
Borland Cs allow passing arguments in registers, something that CFORTH
did back in 1984. But the C compilers have gotten much bigger and
slower over the years. ForthCMP still runs fine on a machine without a
hard disk. The compiler is only about 32k bytes long."

FLK from mid-nineties still stand strong against current Forth
compilers in benchmarks.

From my point of view, this means that Forth ecosystem stopped
evolving about 30 years ago and I'm trying to understand what
happened.

> At this point in time, I suspect you are deliberately abrasive
> and wilfully misunderstanding in order to cause discussion
> on subjects that you find interesting. I don't mind at all, as
> you appear to do this armed with observations that take
> considerable time for you to research.
>

Now I'm confused. I've reread your text several times and
still don't get your point.

Nickolay Kolchin

unread,
Dec 16, 2021, 10:44:05 PM12/16/21
to
Elaborate please, I don't get you.

Nickolay Kolchin

unread,
Dec 16, 2021, 11:05:55 PM12/16/21
to
On Friday, December 17, 2021 at 6:22:21 AM UTC+3, Paul Rubin wrote:
> dxforth <dxf...@gmail.com> writes:
> > I'm somewhat confused. If the only measure between Forth and C is the
> > compiler, then the implication is a Forth programmer has no more
> > control over outcome than a C programmer. While such may be the end
> > result of Standard Forth, which promotes a mindless conformity, we're
> > not there yet by any stretch.
> I read the exchange somewhat differently:
>
> 1) Forth has traditionally been implemented as an interpreter or very
> simple compiler.
>
> 2) But in fact, relatively fancy Forth optimizing compilers exist, such
> as VFX. They do the kinds of optimizations you can find in compiler
> books like the Dragon book, and produce target code comparable to that
> of (say) fancy C compilers from the 1980s or 1990s.

Wrong. Ertl just pointed that VFX does nothing beyond stack analyse and
inlining. Also, when we talk about compiler optimisations, we should
reference not Dragon Book, but Muchick at least.

>
> 3) The claim that might be under dispute is that C compilers have
> advanced since the 1990s, so Forth compilers are somewhat behind C
> compilers in terms of optimizations.

More correctly -- "are ages behind". You admit that Forth stopped
evolving since 1990s?

>
> My own feeling is that this just isn't important. For almost all
> application areas (measuring the space of application areas by the
> number of programmers working in a given area at a given moment),
> extreme optimization of the target code is just not that important.

No. We are talking about things that we get for "free", not with
"extreme optimization" effort.

> ....
>
> c) The C one runs on a very resource constrained MCU, but the main
> constraint is code space rather than CPU cycles. So it is compiled
> with gcc -Os (optimize for size), which generates inefficient code
> at times. The program has some inline assembler in a few places
> where this matters, and that too is good enough.

And with something like Keil you may get much better code size. GCC
is not best "size optimising" compiler.

Paul Rubin

unread,
Dec 16, 2021, 11:25:11 PM12/16/21
to
Nickolay Kolchin <nbko...@gmail.com> writes:
> FLK from mid-nineties still stand strong against current Forth
> compilers in benchmarks.
>
> From my point of view, this means that Forth ecosystem stopped
> evolving about 30 years ago and I'm trying to understand what
> happened.

Well, 1) what are are the benchmarks measuring, and 2) how much have C
compilers evolved in the same period? How does GCC from 30 years ago
compare with GCC today?

Compilers today burn a lot of cycles and memory doing expensive
optimizations, relying on today's faster and bigger computers to make
compile times tolerable. New optimizations (at least for C) tend to
bring a few percent more application performance per major compiler
release. So there's maybe 1.5x speedup since 1990s compilers. I think
this is not enough to really be important. If I have to speed up a
too-slow program it's not really worth thinking about significant
optimization efforts unless I think I can get 2x from them. Otherwise
I'd rather put the effort into parallelization rather than optimization.

Fwiw here's a Forth implementation that compiles with LLVM:

https://github.com/Reschivon/movForth

I haven't looked at it at all, I don't know what Forth dialect it uses,
benchmark results, or anything else about it right now.

Nickolay Kolchin

unread,
Dec 16, 2021, 11:52:01 PM12/16/21
to
On Friday, December 17, 2021 at 7:25:11 AM UTC+3, Paul Rubin wrote:
> Nickolay Kolchin <nbko...@gmail.com> writes:
> > FLK from mid-nineties still stand strong against current Forth
> > compilers in benchmarks.
> >
> > From my point of view, this means that Forth ecosystem stopped
> > evolving about 30 years ago and I'm trying to understand what
> > happened.
> Well, 1) what are are the benchmarks measuring, and 2) how much have C
> compilers evolved in the same period? How does GCC from 30 years ago
> compare with GCC today?

We can take any benchmark that have C and Forth implementation.
Unfortunately Forth doesn't have complete set of "Shootout" benchmarks.

>
> Compilers today burn a lot of cycles and memory doing expensive
> optimizations, relying on today's faster and bigger computers to make
> compile times tolerable. New optimizations (at least for C) tend to
> bring a few percent more application performance per major compiler
> release. So there's maybe 1.5x speedup since 1990s compilers. I think
> this is not enough to really be important. If I have to speed up a
> too-slow program it's not really worth thinking about significant
> optimization efforts unless I think I can get 2x from them. Otherwise
> I'd rather put the effort into parallelization rather than optimization.
>

SSA was a game changer. For floating point you can easily expect 10x
modern GCC advantage due to SIMD usage.

> Fwiw here's a Forth implementation that compiles with LLVM:
>
> https://github.com/Reschivon/movForth
>
> I haven't looked at it at all, I don't know what Forth dialect it uses,
> benchmark results, or anything else about it right now.

I've looked at it briefly. It is in early stages and won't compile even
simple programs now.

Nickolay Kolchin

unread,
Dec 17, 2021, 1:35:24 AM12/17/21
to
Sorry, I've missed your post.

To achieve "optimizations" in C you must also convert code from stack to
variables form. And you don't do that, from what I see in mf3.c file.

C compiler won't be able to optimize "mfpush(2); mfpush(3);mf2E0C9DAA;" to
simple "5".

Marcel Hendrix

unread,
Dec 17, 2021, 2:42:22 AM12/17/21
to
On Friday, December 17, 2021 at 5:25:11 AM UTC+1, Paul Rubin wrote:
> Nickolay Kolchin <nbko...@gmail.com> writes:
> > FLK from mid-nineties still stand strong against current Forth
> > compilers in benchmarks.
> >
> > From my point of view, this means that Forth ecosystem stopped
> > evolving about 30 years ago and I'm trying to understand what
> > happened.

No, make that 20, or 10 years ago.

> Well, 1) what are are the benchmarks measuring, and 2) how much have C
> compilers evolved in the same period? How does GCC from 30 years ago
> compare with GCC today?

Anton Ertl has reported that GCC's performance (and certainly its usability)
is decreasing and, IIRC, he has the benchmarks to prove it.

-marcel

Nickolay Kolchin

unread,
Dec 17, 2021, 3:12:42 AM12/17/21
to
On Friday, December 17, 2021 at 10:42:22 AM UTC+3, Marcel Hendrix wrote:
> On Friday, December 17, 2021 at 5:25:11 AM UTC+1, Paul Rubin wrote:
> > Nickolay Kolchin <nbko...@gmail.com> writes:
> > > FLK from mid-nineties still stand strong against current Forth
> > > compilers in benchmarks.
> > >
> > > From my point of view, this means that Forth ecosystem stopped
> > > evolving about 30 years ago and I'm trying to understand what
> > > happened.
> No, make that 20, or 10 years ago.

Great. We agree that the problem exist.

> > Well, 1) what are are the benchmarks measuring, and 2) how much have C
> > compilers evolved in the same period? How does GCC from 30 years ago
> > compare with GCC today?
> Anton Ertl has reported that GCC's performance (and certainly its usability)
> is decreasing and, IIRC, he has the benchmarks to prove it.
>

It will be interesting to see. IMHO, gcc and llvm/clang infrastructures
develop at high pace. Sanitizers are the best thing for developing
C/C++ applications and I really don't understand how we debug apps
without them.

Each gcc/clang release is deeply investigated for performance regressions
on sites like phoronix, so I don't believe that serious regressions exist. There
may be changes related to "new standard understanding" (like aliasing), but
those are part of language evolution.


Brian Fox

unread,
Dec 17, 2021, 9:40:10 AM12/17/21
to
On Friday, December 17, 2021 at 3:12:42 AM UTC-5, Nickolay Kolchin wrote:

A couple of comments related to this.
There is a post on comp.lang.forth, many years ago where Anton transpiled
Forth to C. I can't find the topic here with a quick search.
The compiled C code ran faster than some of the Forth systems.

I think the current concerns about C producing un-intended
effects where optimising is pushed hard makes a case for a compiler
like Forth where one knows what code will be generated because it is not
pushed to the extreme.

Nickolay Kolchin

unread,
Dec 17, 2021, 9:59:45 AM12/17/21
to
There is compcert -- formally verified C compiler.

minf...@arcor.de

unread,
Dec 17, 2021, 11:08:16 AM12/17/21
to
Yes, good point. Indeed there is still room for peephole-optimizing on Forth-level,
before evoking the C compiler.

But tests with 3-instruction pipeline a few years did yield only little speed
improvement by peephole optimization. So that idea had been dropped.

Anyhow it is very easy to code new application-specific primitives in C when
they have to be faster. There you can use registers as much as you like.

Stephen Pelc

unread,
Dec 17, 2021, 11:21:12 AM12/17/21
to
On 16 Dec 2021 at 14:34:48 CET, "Nickolay Kolchin" <nbko...@gmail.com>
wrote:

> 5. The claim that Pelc periodically made about "as complicated as C
> compilers" is complete bullshit. No Forths are even near modern (this
> century) C compilers.

I have never (I hope) claimed thaat the VFX code generator is as complicated
as C compilers, especially since the only models we had were LCC and GCC.
However, I have claimed that VFX uses many algorithms from the usual compiler
set. Note that VFX development started in the late 1990s, and the comparisons
of VFX against C compilers date from then, probably gcc 2.9.5 as far as I
remember.

VFX only optimises basic blocks, but the tokeniser makes a big difference when
unfolding small procedures.

We'll get around to updating VFX when we have plenty of spare time (or
funding) and there's a commercial reason for it. So far, VFX has done very
well. A major client is much more interested in compilation speed than in
execution speed. He compiles 1.2 million lines of Forth source code in 29
seconds.

Stephen
--
Stephen Pelc, ste...@vfxforth.com
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, +44 (0)78 0390 3612, +34 649 662 974
http://www.mpeforth.com - free VFX Forth downloads

Anton Ertl

unread,
Dec 17, 2021, 12:24:56 PM12/17/21
to
"minf...@arcor.de" <minf...@arcor.de> writes:
>At least MinForth compiles standard Forth words to simple C and leaves
>all higher optimization to a C compiler. The idea came up (long time ago)
>also because modern C compilers can do much better optimization "for free".

Can you give examples of how that works out? E.g., show the resulting
code and/or compare performance against other Forth systems.

Anton Ertl

unread,
Dec 17, 2021, 12:51:37 PM12/17/21
to
Paul Rubin <no.e...@nospam.invalid> writes:
>2) But in fact, relatively fancy Forth optimizing compilers exist, such
>as VFX. They do the kinds of optimizations you can find in compiler
>books like the Dragon book, and produce target code comparable to that
>of (say) fancy C compilers from the 1980s or 1990s.

That's an overstatement IMO. Sure you may be able to point to some
places in a compiler textbook for the things that VFX does, but there
are also things that VFX does not do: register allocation across basic
blocks (aka global register allocation); register allocation for
locals. And anything that requires data-flow analysis across basic
blocks.

GCC did all of that in the late 1980s and all through the 1990s (but
it was pretty much a pioneer; I certainly remember working at HP in
1988 and 1989, and one of the people there commented that gcc trounced
their C compiler, and the HP engineers had put in a lot of work to
catch up, but had not quite caught up at the time).

>My own feeling is that this just isn't important. For almost all
>application areas (measuring the space of application areas by the
>number of programmers working in a given area at a given moment),
>extreme optimization of the target code is just not that important.

There are also comments by Stephen Pelc along these lines, but then
one could also interpret that as sour grapes. After all, similar
things were said about threaded-code Forth systems before the move to
native-code systems.

> I can get a 2x speedup by simply running it
> on two computers in parallel, after all. Or a 100x speedup by
> running it on 100 computers.

Amdahl's Law not at work?

minf...@arcor.de

unread,
Dec 17, 2021, 12:53:10 PM12/17/21
to
Anton Ertl schrieb am Freitag, 17. Dezember 2021 um 18:24:56 UTC+1:
> "minf...@arcor.de" <minf...@arcor.de> writes:
> >At least MinForth compiles standard Forth words to simple C and leaves
> >all higher optimization to a C compiler. The idea came up (long time ago)
> >also because modern C compilers can do much better optimization "for free".
> Can you give examples of how that works out? E.g., show the resulting
> code and/or compare performance against other Forth systems.
> - anton

Unlike obviously many others, I don't run speed tests against other Forths
because I am more interested in ease of use and portability. "Good enough"
is good enough for me.

But just for the fun of it I checked Nickolay's notion with Godbolt. C code
equivalent to transpiler output:
// +++ MinForth ctest.c

#include <stdlib.h>
#include <stdio.h>

#define push(x) *++sp=x

int stk[10], *sp=stk;

static inline void add(void) {
sp--;
*sp+=sp[1]; }

int main(void) {
push(2);
push(3);
add();
putchar('0'+*sp); }
// +++ end

With Clang compiler and flag -O2 compiles to:

main: # @main
push rax
mov rax, qword ptr [rip + sp]
lea rcx, [rax + 4]
mov dword ptr [rax + 8], 3
mov qword ptr [rip + sp], rcx
mov dword ptr [rax + 4], 5
mov rsi, qword ptr [rip + stdout]
mov edi, 53
call putc
xor eax, eax
pop rcx
ret
stk:
.zero 40
sp:
.quad stk

So Clang pre-calculates that 2+3=5 and fills the global stack accordingly.
But it does not know that stk[10] is a stack but treats it as an array, and
(in)correctly assumes that it must place the 3 into stk[2]. So this is just
a partial optimization.

GCC performs worse and does push the operands onto the stack before
doing the addition, like most other compilers.


Nickolay Kolchin

unread,
Dec 17, 2021, 1:03:58 PM12/17/21
to
On Friday, December 17, 2021 at 7:21:12 PM UTC+3, Stephen Pelc wrote:
> On 16 Dec 2021 at 14:34:48 CET, "Nickolay Kolchin" <nbko...@gmail.com>
> wrote:
> > 5. The claim that Pelc periodically made about "as complicated as C
> > compilers" is complete bullshit. No Forths are even near modern (this
> > century) C compilers.
> I have never (I hope) claimed thaat the VFX code generator is as complicated
> as C compilers, especially since the only models we had were LCC and GCC.
> However, I have claimed that VFX uses many algorithms from the usual compiler
> set. Note that VFX development started in the late 1990s, and the comparisons
> of VFX against C compilers date from then, probably gcc 2.9.5 as far as I
> remember.

This is from your book dated 2005, last updated 2011 (or 2018?)

"Modern Forth implementations usually generate optimised native code, and the good
ones produce code of the same quality as the good C compilers." (page 16)

It even has your photo on cover...

Or we misunderstood you and VFX shouldn't be treated as "Modern Forth"?

> We'll get around to updating VFX when we have plenty of spare time (or
> funding) and there's a commercial reason for it. So far, VFX has done very
> well. A major client is much more interested in compilation speed than in
> execution speed. He compiles 1.2 million lines of Forth source code in 29
> seconds.
>

Well, according to this https://openbenchmarking.org/test/pts/build-llvm you
are only 1.5 times slower than optimising C++ compiler... (LLVM is about 10M
lines).

Nickolay Kolchin

unread,
Dec 17, 2021, 1:18:37 PM12/17/21
to
https://godbolt.org/z/385P83crE

That is much better than I expected. What exactly you don't like in GCC
output?

Anton Ertl

unread,
Dec 17, 2021, 1:33:42 PM12/17/21
to
Nickolay Kolchin <nbko...@gmail.com> writes:
>And this is a sad thing, because 30 years ago the situation
>was different. From ForthCMP Readme:
>
>"When CFORTH was sold, it ran faster than C compilers, and created
>executables that were far smaller and faster. C compiler technology is
>catching up; for instance the most recent versions of Microsoft and
>Borland Cs allow passing arguments in registers, something that CFORTH
>did back in 1984. But the C compilers have gotten much bigger and
>slower over the years. ForthCMP still runs fine on a machine without a
>hard disk. The compiler is only about 32k bytes long."
>
>FLK from mid-nineties still stand strong against current Forth
>compilers in benchmarks.
>
>From my point of view, this means that Forth ecosystem stopped
>evolving about 30 years ago and I'm trying to understand what
>happened.

But did CForth/ForthCMP make great inroads among C programmers? Did
it make great inroads among Forth programmers?

If you only look at execution speed, yes, there may not have been much
advance since ForthCMP. Hmm, did ForthCMP inline?

As for what happened: Apparently there was not enough interest in
higher Forth performance to make the additional work worthwhile for
commercial Forth vendors. And the rest of us also has not found the
time for doing it yet; I have planned to do it for many years, but
other things have always pushed the project from the top of my ToDo
list.

As for the 30 years of analytic Forth compilers:

1984? cforth
1986 jForth (AFAIK analytic, not sure if from the start)
1990 MOPS (AFAIK analytic)
1998 VFX
1998 FLK
2005 LXF/NTF
2011 Mecrisp

Did I forget any released ones?
Message has been deleted

Nickolay Kolchin

unread,
Dec 17, 2021, 2:55:17 PM12/17/21
to
On Friday, December 17, 2021 at 9:33:42 PM UTC+3, Anton Ertl wrote:
> Nickolay Kolchin <nbko...@gmail.com> writes:
> >And this is a sad thing, because 30 years ago the situation
> >was different. From ForthCMP Readme:
> >
> >"When CFORTH was sold, it ran faster than C compilers, and created
> >executables that were far smaller and faster. C compiler technology is
> >catching up; for instance the most recent versions of Microsoft and
> >Borland Cs allow passing arguments in registers, something that CFORTH
> >did back in 1984. But the C compilers have gotten much bigger and
> >slower over the years. ForthCMP still runs fine on a machine without a
> >hard disk. The compiler is only about 32k bytes long."
> >
> >FLK from mid-nineties still stand strong against current Forth
> >compilers in benchmarks.
> >
> >From my point of view, this means that Forth ecosystem stopped
> >evolving about 30 years ago and I'm trying to understand what
> >happened.
> But did CForth/ForthCMP make great inroads among C programmers? Did
> it make great inroads among Forth programmers?
>
> If you only look at execution speed, yes, there may not have been much
> advance since ForthCMP. Hmm, did ForthCMP inline?

I've actually downloaded it and run some tests. No it doesn't inline and
assembler looks stupid. For this:

\ simple test
I80186
100 MSDOS
: TTT DUP * ;
: XXX TTT . ;
: MAIN 2 XXX ;
INCLUDE FORTHLIB
END

It generated this code (only interesting part):

00000124 58 pop ax
00000125 8946FE mov [bp-0x2],ax
00000128 83C5FE add bp,byte -0x2
0000012B 58 pop ax
0000012C 50 push ax
0000012D 5B pop bx
0000012E F7EB imul bx
00000130 50 push ax
00000131 8B4600 mov ax,[bp+0x0]
00000134 83C502 add bp,byte +0x2
00000137 FFE0 jmp ax
00000139 58 pop ax
0000013A 8946FE mov [bp-0x2],ax
0000013D 83C5FE add bp,byte -0x2
00000140 E8E1FF call 0x124
00000143 58 pop ax
00000144 E80E00 call 0x155
00000147 8B4600 mov ax,[bp+0x0]
0000014A 83C502 add bp,byte +0x2
0000014D FFE0 jmp ax
0000014F 6A02 push byte +0x2 ; ENTRY POINT
00000151 E8E5FF call 0x139
00000154 C3 ret

I'm not sure that it does any fancy optimisations at all.

But I remember that we were impressed by 4c applications size and
performance compared to Turbo Pascal.

>
> As for what happened: Apparently there was not enough interest in
> higher Forth performance to make the additional work worthwhile for
> commercial Forth vendors. And the rest of us also has not found the
> time for doing it yet; I have planned to do it for many years, but
> other things have always pushed the project from the top of my ToDo
> list.

When I talk about "ecosystem", I mean not only compiler optimisations,
but a lot of other basic tools. For example, AFAIK, only GForth has some
support for "coverage analysis".

>
> As for the 30 years of analytic Forth compilers:
>
> 1984? cforth
> 1986 jForth (AFAIK analytic, not sure if from the start)
> 1990 MOPS (AFAIK analytic)
> 1998 VFX
> 1998 FLK
> 2005 LXF/NTF
> 2011 Mecrisp
>

The term "analytic compiler" is new to me. Do you call so compilers
that make some advanced stack to registers mappings? I.e. avoiding
stack operations in generated code.

minf...@arcor.de

unread,
Dec 17, 2021, 3:00:11 PM12/17/21
to
Thanks, I should have played more with compiler flags

GCC 11.2 -O1 :
main:
sub rsp, 8
mov rax, QWORD PTR sp[rip]
lea rdx, [rax+4]
mov QWORD PTR sp[rip], rdx
mov DWORD PTR [rax+4], 2
mov rax, QWORD PTR sp[rip]
lea rdx, [rax+4]
mov QWORD PTR sp[rip], rdx
mov DWORD PTR [rax+4], 3
mov rax, QWORD PTR sp[rip]
lea rdx, [rax-4]
mov QWORD PTR sp[rip], rdx
mov edx, DWORD PTR [rax]
add DWORD PTR [rax-4], edx
mov rax, QWORD PTR sp[rip]
mov edi, DWORD PTR [rax]
add edi, 48
mov rsi, QWORD PTR stdout[rip]
call putc
mov eax, 0
add rsp, 8
ret
sp:
.quad stk
stk:
.zero 40


Whereas GCC 11.2 with -O2 :
main:
sub rsp, 8
mov rax, QWORD PTR sp[rip]
mov edi, 53
movabs rcx, 12884901893
mov rsi, QWORD PTR stdout[rip]
lea rdx, [rax+4]
mov QWORD PTR sp[rip], rdx
mov QWORD PTR [rax+4], rcx
call putc
xor eax, eax
add rsp, 8
ret
sp:
.quad stk
stk:
.zero 40

This is even more optimized than with Clang!

Paul Rubin

unread,
Dec 17, 2021, 5:32:19 PM12/17/21
to
Nickolay Kolchin <nbko...@gmail.com> writes:
>> I think the current concerns about C producing un-intended
>> effects where optimising is pushed hard...
> There is compcert -- formally verified C compiler.

The issue in that debate was what happens when the target program's
execution triggers undefined behaviour according to the C standard.
Compcert, like most compilers, take no responsibility for that
situation.

Paul Rubin

unread,
Dec 17, 2021, 5:36:56 PM12/17/21
to
an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
> 1986 jForth (AFAIK analytic, not sure if from the start)

Is that supposed to say iForth?

dxforth

unread,
Dec 17, 2021, 8:09:00 PM12/17/21
to
On 17/12/2021 14:22, Paul Rubin wrote:
> dxforth <dxf...@gmail.com> writes:
>> I'm somewhat confused. If the only measure between Forth and C is the
>> compiler, then the implication is a Forth programmer has no more
>> control over outcome than a C programmer. While such may be the end
>> result of Standard Forth, which promotes a mindless conformity, we're
>> not there yet by any stretch.
>
> I read the exchange somewhat differently:
>
> 1) Forth has traditionally been implemented as an interpreter or very
> simple compiler.
>
> 2) But in fact, relatively fancy Forth optimizing compilers exist, such
> as VFX. They do the kinds of optimizations you can find in compiler
> books like the Dragon book, and produce target code comparable to that
> of (say) fancy C compilers from the 1980s or 1990s.
>
> 3) The claim that might be under dispute is that C compilers have
> advanced since the 1990s, so Forth compilers are somewhat behind C
> compilers in terms of optimizations.
>
> My own feeling is that this just isn't important. For almost all
> application areas (measuring the space of application areas by the
> number of programmers working in a given area at a given moment),
> extreme optimization of the target code is just not that important.
> ...

I don't think there's serious interest either. While MPE has been able
to use code efficiency for marketing purposes it hasn't resulted in
competition or outright advantage. The opposition still gets contract
work for significant projects - which suggests to me the quality of the
programmer is of greater consequence than the quality of the compiler
(in Forth at least). All this assumes the hardware is appropriate. If
the cpu is underpowered, there's little headroom for improvement.

Nickolay Kolchin

unread,
Dec 17, 2021, 11:26:11 PM12/17/21
to
Actually, they are working on C subset for this reason. I.e. they guarantee
that compiler optimisations don't depend on UB. Bulletproof C
development can be achieved using CompCert as compiler and Frama-C
for static analysis.

But this is pointless discussion because:

1. Forth "ambiguous condition" are no better than C "undefined behaviour".
2. Formal Verified Forth doesn't exist. :)

Nickolay Kolchin

unread,
Dec 17, 2021, 11:58:37 PM12/17/21
to
Well, there is no serious interest in Forth now. Poor compilers are just
one reason. Expectation that Forth programmer is a 200 IQ superman
is another... Inability to use hardware at full due to language limitations
(SIMD) is the third.

Marcel Hendrix

unread,
Dec 18, 2021, 3:42:01 AM12/18/21
to
On Saturday, December 18, 2021 at 5:58:37 AM UTC+1, Nickolay Kolchin wrote:
[..]
> Inability to use hardware at full due to language limitations
> (SIMD) is the third.

You mentioned this more than once. I have looked into SIMD quite
extensively. In C it is obvious the compiler uses the new instruction
sets for everything when directed to, but my tests with the NGSPICE
source code fail to show significant speed increase. It is maybe
5 to 10%, certainly not the expected factor of 4 to 8.

In assembly language called from Forth my best results are about a
factor of 2, at the cost of annoying alignment restrictions to program
around. It is hard to see a speedup in *programs* using these optimized
words, unless it is something obvious like a matrix multiplication
benchmark.

Having an AVX2 optimized C library called from Forth is also
disappointing because the extra overhead going through
the OS calling convention layers becomes noticeable.

It would be interesting to see non-obvious examples where SIMD
brings a significant advantage, and even more interesting to see how
extensively the compiler has to reorder the source code to get that
result.

-marcel

Anton Ertl

unread,
Dec 18, 2021, 4:19:00 AM12/18/21
to
Paul Rubin <no.e...@nospam.invalid> writes:
>Nickolay Kolchin <nbko...@gmail.com> writes:
>> FLK from mid-nineties still stand strong against current Forth
>> compilers in benchmarks.
>>
>> From my point of view, this means that Forth ecosystem stopped
>> evolving about 30 years ago and I'm trying to understand what
>> happened.
>
>Well, 1) what are are the benchmarks measuring

Yes, that's always a good question.

> and 2) how much have C
>compilers evolved in the same period? How does GCC from 30 years ago
>compare with GCC today?

Favourably. Basically, today you need to wave all kinds of flags at
gcc to make it usable. 30 years ago you didn't.

As for comparing performance, this brings us back to question 1. I
have done some measurements with programs that the gcc maintainers
don't use as benchmarks, and you can find them in

https://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_2015_submission_29.pdf

For Figure 1 (p. 13) I used gcc-2.7.2.3 from 1997, egcs-1.1.2 from
1999, and gcc-5.2.0 from 2015. The programs I compiled are
transliterations of Jon Bentley's traveling salesman example from
"Writing Efficient Programs"
<https://www.complang.tuwien.ac.at/anton/lvas/effizienz/tsp.html>.

For tsp6 the code by gcc-5.2.0 -O3 is quite a bit faster than the code
by gcc-2.7.2.3 -O3 (not by a factor 1.5, though), but for tsp8 the
code by gcc-2.7.2.3 -O3 is slightly faster than the code by gcc-5.2.0.
The rest is in-between. Note that this problem can benefit from SIMD,
but gcc has failed to auto-vectorize it up to gcc-10.3.

For Figure 2 (p. 15) I used a number of gcc versions between 2.95
(1999) and 4.4.0 (2009). As you can see, for IA-32 gcc-2.95 dominates
over the other versions. It's not available for AMD64, and there 4.0
gives a pretty solid performance (which did not so great for IA-32).
On both architectures cases, the latest version 4.4.0 was not close to
the top.

>Compilers today burn a lot of cycles and memory doing expensive
>optimizations, relying on today's faster and bigger computers to make
>compile times tolerable. New optimizations (at least for C) tend to
>bring a few percent more application performance per major compiler
>release.

On what basis do you make this claim? Do you confuse application
performance with benchmark performance?

>So there's maybe 1.5x speedup since 1990s compilers.

That statement suggests a uniform improvement that is not there.
There are singular improvements that can be quite a bit bigger than a
factor 1.5 (e.g., when you get lucky with auto-vectorization), among a
sea of cases where you see hardly any improvements, or sometimes even
slowdowns. Of course the advocates of these developments always
point to the big improvements, and the maintainers make sure that the
benchmarks they use are not affected by slowdowns.

>Fwiw here's a Forth implementation that compiles with LLVM:
>
> https://github.com/Reschivon/movForth

And it does it by mapping stack items to registers, so the result
should perform well. According to the README it supports only a very
limited number of words, so it's only a proof-of-concept for now.

Compared to forth2c, the question is if LLVM is a better base to work
from than C.

Nickolay Kolchin

unread,
Dec 18, 2021, 4:23:19 AM12/18/21
to
This is a complex question which I'm not prepared to answer immediately.
I promise to start another thread on SIMD in Forth.

Anton Ertl

unread,
Dec 18, 2021, 4:50:06 AM12/18/21
to
Nickolay Kolchin <nbko...@gmail.com> writes:
>On Friday, December 17, 2021 at 7:25:11 AM UTC+3, Paul Rubin wrote:
>> Nickolay Kolchin <nbko...@gmail.com> writes:
>> > FLK from mid-nineties still stand strong against current Forth
>> > compilers in benchmarks.
>> >
>> > From my point of view, this means that Forth ecosystem stopped
>> > evolving about 30 years ago and I'm trying to understand what
>> > happened.
>> Well, 1) what are are the benchmarks measuring, and 2) how much have C
>> compilers evolved in the same period? How does GCC from 30 years ago
>> compare with GCC today?
>
>We can take any benchmark that have C and Forth implementation.
>Unfortunately Forth doesn't have complete set of "Shootout" benchmarks.

Forth is not included in the current shootout languages
<https://benchmarksgame-team.pages.debian.net/benchmarksgame/index.html>,
and AFAIK has not been for many years. Forth used to be pretty well
represented in the original shootout
<https://web.archive.org/web/20010124090400/http://www.bagley.org/~doug/shootout/>,
but that project was abandoned (in 2002), and later (2004) someone
else picked it up who did not want to invest the time to make it work
with Forth. The most recent instance seems to have few languages and
few programs, which indicates how much work the original project by
Doug Bagley must have been.

Anyway, concerning benchmarks that work on C and Forth, you can find
some at <http://www.complang.tuwien.ac.at/forth/bench.zip>: sieve,
bubble-sort, (integer) matrix-mult, fib. You find Forth versions
there, as well as manually written C as well as C generated from the
Forth programs with forth2c.

>SSA was a game changer.

SSA made it easier to add optimizations that help a few programs (in
particular, benchmarks). This would not be so bad if the compiler
maintainers were satisfied with that, but they also got the idea that
they could "optimize" based on the assumption that C programs don't
perform undefined behaviour. I guess that some of the damage that
came out of that idea would also have occured without SSA form, but
SSA form made it easy to make more such "optimizations". So it
changed the game to the worse.

>For floating point you can easily expect 10x
>modern GCC advantage due to SIMD usage.

Easily? All the tsp programs on
<https://www.complang.tuwien.ac.at/anton/lvas/effizienz/tsp.html> are
FP programs. What's more, they can be (manually) vectorized to
benefit from SIMD instructions; Even with gcc, I have not seen that
these programs benefit from SIMD instructions, much less by 10x.

Ok, so let's take an example that should be the bread-and-butter of
auto-vectorization: FP matrix multiplication; you can find results on
slide 76 of https://www.complang.tuwien.ac.at/anton/lvas/effizienz.pdf

Here you can see 6 different variants of coding FP matrix
multiplication You can see that for the two middle variants -O3
performs significantly better than -O2. That's because gcc managed to
auto-vectorize it. But it did not manage to auto-vectorize the other
four variants. For the version that it managed to auto-vectorize the
speedup is 2.29x (middle upper) and 1.74x (middle lower); that's on an
Ivy Bridge CPU. 10x? Maybe if you use binary base.

Anton Ertl

unread,
Dec 18, 2021, 5:15:32 AM12/18/21
to
Nickolay Kolchin <nbko...@gmail.com> writes:
>It will be interesting to see. IMHO, gcc and llvm/clang infrastructures
>develop at high pace. Sanitizers are the best thing for developing
>C/C++ applications and I really don't understand how we debug apps
>without them.

The way that sanitizers have been presented to me are as the tool to
work around the problems that the gcc and clang maintainers are
producing themselves by pretending that C programs never perform
undefined behaviours. The idea is that the C compiler maintainers
throw in another "optimization" based on this assumption, and when the
complaints come in that this "optimization" breaks legacy code, they
point to a sanitizer that might have pointed out the problem to the
programmer. The problems with that approach are:

* The sanitizer tends to be released at the same time or after the
breaking change, long after the legacy code.

* The sanitizers perform run-time checks, so they will only discover
problems that occur during the test runs.

* What's worse, not all sanitizers can be combined. So you need
multiple compilations and test runs with different sets of
sanitizers. And the code with the sanitizers compiled in is slow.

So calling it "the best thing" to me sounds like a case of Stockholm
Syndrome.

[Exception: Valgrind can be useful, and some sanitizers implement functionality
also present in valgrind, so that is a good feature.]

>Each gcc/clang release is deeply investigated for performance regressions
>on sites like phoronix, so I don't believe that serious regressions exist.

Deeply investigated? Phoronix is just Michael Larabel, and covers
many other topics; what he does is to use the compiler on a set of
benchmarks and compare the performance results of that, on one CPU
(e.g.,
<https://www.phoronix.com/scan.php?page=article&item=gcc-12-alderlake&num=1>).
This only shows how the compiler performs on these benchmarks, on this
CPU.

>There
>may be changes related to "new standard understanding" (like aliasing), but
>those are part of language evolution.

If breaking existing, tested production programs is part of language
evolution, that's a part we can do without.

Nickolay Kolchin

unread,
Dec 18, 2021, 5:28:46 AM12/18/21
to
We have automated testing and coverage for that. Combined with sanitizers
we can be sure that our application doesn't have UD defects. Valgrind is just
too slow and have worse integration with language.

> >Each gcc/clang release is deeply investigated for performance regressions
> >on sites like phoronix, so I don't believe that serious regressions exist.
> Deeply investigated? Phoronix is just Michael Larabel, and covers
> many other topics; what he does is to use the compiler on a set of
> benchmarks and compare the performance results of that, on one CPU
> (e.g.,
> <https://www.phoronix.com/scan.php?page=article&item=gcc-12-alderlake&num=1>).
> This only shows how the compiler performs on these benchmarks, on this
> CPU.

You can download phoronix-testsuite and run them yourself. Many people do this.

> >There
> >may be changes related to "new standard understanding" (like aliasing), but
> >those are part of language evolution.
> If breaking existing, tested production programs is part of language
> evolution, that's a part we can do without.

Wrong. To move forward it may be necessary to break something. For legacy
builds -- docker exist.

Now I'm starting to understand what happened to Forth. You cared about old
programs so much, that missed the point when they become obsolete and
vanished. And new ones didn't appear because you didn't care about them.

Anton Ertl

unread,
Dec 18, 2021, 5:32:55 AM12/18/21
to
Brian Fox <bria...@brianfox.ca> writes:
>On Friday, December 17, 2021 at 3:12:42 AM UTC-5, Nickolay Kolchin wrote:
>
>A couple of comments related to this.
>There is a post on comp.lang.forth, many years ago where Anton transpiled
>Forth to C. I can't find the topic here with a quick search.

Martin Maierhofer wrote forth2c, a compiler for (a very limited batch
version of) Forth to C. You can find it on

<http://www.complang.tuwien.ac.at/forth/forth2c.tar.gz>

You can find the results of applying forth2c to bubble-sort, fib,
matrix-mult, and sieve in

<http://www.complang.tuwien.ac.at/forth/bench.zip>

in the directory bench/c-generated (there are also Forth and manually
written C variants of these benchmarks).

You can find my paper about this in

@InProceedings{ertl&maierhofer95,
author = {M. Anton Ertl and Martin Maierhofer},
title = {Translating {Forth} to Efficient {C}},
crossref = {euroforth95},
url = {http://www.complang.tuwien.ac.at/papers/ertl%26maierhofer95.ps.gz},
url2 = {http://www.complang.tuwien.ac.at/papers/ertl%26maierhofer95.pdf},
abstract = {An automatic translator can translate Forth into C
code which the current generation of optimizing C
compilers compiles to efficient machine code. I.e.,
the resulting code keeps stack items in registers
and rarely updates the stack pointer. This paper
presents a simple translation method that produces
efficient C code, describes an implementation of the
method and presents results achieved with this
implementation: The translated code is 4.5--7.5
times faster than Gforth (the fastest measured
interpretive system), 1.3--3 times faster than
BigForth 386 (a native code compiler), and smaller
than Gforth's threaded code.}
}

@Proceedings{euroforth95,
title = "EuroForth~'95 Conference Proceedings",
booktitle = "EuroForth~'95 Conference Proceedings",
year = "1995",
key = "EuroForth '95",
address = "Schloss Dagstuhl, Germany",
}

You can find slightly more recent performance results for these
benchmarks on a 486DX2/66 (with various Forth systems for running the
Forth programs) on

<http://www.complang.tuwien.ac.at/forth/performance.html>
<http://www.complang.tuwien.ac.at/forth/performance.eps>

>The compiled C code ran faster than some of the Forth systems.

The C code generated by Forth2c was about the same speed as the
manually written C code (when both are optimized). All other Forth
systems were significantly slower for sieve bubble matmul, but FLK and
mxForth beat gcc on fib.

>I think the current concerns about C producing un-intended
>effects where optimising is pushed hard makes a case for a compiler
>like Forth where one knows what code will be generated because it is not
>pushed to the extreme.

I subscribe to that POV, but it's a feature of the compilers, not of
the language. Also, some people think that the stuff that, say VFX is
doing is already too involved, and they argue for simpler, more easily
predictable systems.

Nickolay Kolchin

unread,
Dec 18, 2021, 5:51:53 AM12/18/21
to
On Saturday, December 18, 2021 at 12:50:06 PM UTC+3, Anton Ertl wrote:
> Nickolay Kolchin <nbko...@gmail.com> writes:
> >On Friday, December 17, 2021 at 7:25:11 AM UTC+3, Paul Rubin wrote:
> >> Nickolay Kolchin <nbko...@gmail.com> writes:
> >> > FLK from mid-nineties still stand strong against current Forth
> >> > compilers in benchmarks.
> >> >
> >> > From my point of view, this means that Forth ecosystem stopped
> >> > evolving about 30 years ago and I'm trying to understand what
> >> > happened.
> >> Well, 1) what are are the benchmarks measuring, and 2) how much have C
> >> compilers evolved in the same period? How does GCC from 30 years ago
> >> compare with GCC today?
> >
> >We can take any benchmark that have C and Forth implementation.
> >Unfortunately Forth doesn't have complete set of "Shootout" benchmarks.
> Forth is not included in the current shootout languages
> <https://benchmarksgame-team.pages.debian.net/benchmarksgame/index.html>,
> and AFAIK has not been for many years. Forth used to be pretty well
> represented in the original shootout
> <https://web.archive.org/web/20010124090400/http://www.bagley.org/~doug/shootout/>,
> but that project was abandoned (in 2002), and later (2004) someone
> else picked it up who did not want to invest the time to make it work
> with Forth. The most recent instance seems to have few languages and
> few programs, which indicates how much work the original project by
> Doug Bagley must have been.

Factor is also not included. But it ships with shootout benchmarks in
distribution.

>
> Anyway, concerning benchmarks that work on C and Forth, you can find
> some at <http://www.complang.tuwien.ac.at/forth/bench.zip>: sieve,
> bubble-sort, (integer) matrix-mult, fib. You find Forth versions
> there, as well as manually written C as well as C generated from the
> Forth programs with forth2c.
> >SSA was a game changer.
> SSA made it easier to add optimizations that help a few programs (in
> particular, benchmarks). This would not be so bad if the compiler
> maintainers were satisfied with that, but they also got the idea that
> they could "optimize" based on the assumption that C programs don't
> perform undefined behaviour. I guess that some of the damage that
> came out of that idea would also have occured without SSA form, but
> SSA form made it easy to make more such "optimizations". So it
> changed the game to the worse.

This is dubious. SSA greatly simplified CFA. If you have better solution
I'm all ears.

> >For floating point you can easily expect 10x
> >modern GCC advantage due to SIMD usage.
> Easily? All the tsp programs on
> <https://www.complang.tuwien.ac.at/anton/lvas/effizienz/tsp.html> are
> FP programs. What's more, they can be (manually) vectorized to
> benefit from SIMD instructions; Even with gcc, I have not seen that
> these programs benefit from SIMD instructions, much less by 10x.
>
> Ok, so let's take an example that should be the bread-and-butter of
> auto-vectorization: FP matrix multiplication; you can find results on
> slide 76 of https://www.complang.tuwien.ac.at/anton/lvas/effizienz.pdf
>
> Here you can see 6 different variants of coding FP matrix
> multiplication You can see that for the two middle variants -O3
> performs significantly better than -O2. That's because gcc managed to
> auto-vectorize it. But it did not manage to auto-vectorize the other
> four variants. For the version that it managed to auto-vectorize the
> speedup is 2.29x (middle upper) and 1.74x (middle lower); that's on an
> Ivy Bridge CPU. 10x? Maybe if you use binary base.

I'm not prepared to discuss SIMD in depth now. Will post reply on separate
thread.

Marcel Hendrix

unread,
Dec 18, 2021, 7:09:53 AM12/18/21
to
On Saturday, December 18, 2021 at 10:19:00 AM UTC+1, Anton Ertl wrote:
[..]
> As for comparing performance, this brings us back to question 1. I
> have done some measurements with programs that the gcc maintainers
> don't use as benchmarks, and you can find them in
>
> https://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_2015_submission_29.pdf
[..]

Recommended! (I somehow missed that one.)

Quote:
"A tool could also check whether a load in an inner loop always
produces the same result during a run, and, if so, suggest to the programmer to
move the load out of the loop manually; the source-level optimization also works
in cases where “optimization” does not work because there is a store in the loop
to the same type as the load."

I wonder if this happens much in Forth programs. Anyway, it is an interesting
idea to have an option in Forth to add diagnostic code that advises the
programmer. It might be relatively easy to add, even in high-level.

-marcel

Anton Ertl

unread,
Dec 18, 2021, 9:13:53 AM12/18/21
to
The idea is to represent stack items by local variables, not by
something in memory (e.g., an array element). That's because the
compiler has a much easier time understanding a local variable whose
address is not taken than some memory location. The compiler can put
such a local in a register without much analysis, whereas it takes a
lot of analysis to put an array element in a register (and often the
compiler cannot achieve that). The compiler can also perform other
optimizations if it knows all places where a variable changes.

The cost is that the compiler has to keep track of stack items, i.e.,
an analytic compiler. You can achieve a part ofthe benefit with
peephole optimization, but a proper analytic compiler provides more.
The cool thing is that, when going through C, it's relatively cheap to
be analytic across basic block boundaries (e.g., forth2c does that):
You can implement the reorganization of stack items at basic block
boundaries as local-to-local assignment, and can leave the
optimization of these assignments to the C compiler (whereas a direct
native-code compiler has to do that itself).

Anton Ertl

unread,
Dec 18, 2021, 9:35:07 AM12/18/21
to
Ok, I see the kind of code your system produces.

>With Clang compiler and flag -O2 compiles to:
>
>main: # @main
> push rax
> mov rax, qword ptr [rip + sp]
> lea rcx, [rax + 4]
> mov dword ptr [rax + 8], 3
> mov qword ptr [rip + sp], rcx
> mov dword ptr [rax + 4], 5
> mov rsi, qword ptr [rip + stdout]
> mov edi, 53
> call putc
> xor eax, eax
> pop rcx
> ret
>stk:
> .zero 40
>sp:
> .quad stk
>
>So Clang pre-calculates that 2+3=5 and fills the global stack accordingly.
>But it does not know that stk[10] is a stack but treats it as an array, and
>(in)correctly assumes that it must place the 3 into stk[2].

You instructed it to place 3 there, so it's hard for Clang to avoid that.

For comparison, let's see what forth2c does. I don't have the Forth
code for your example (and am also too lazy to get a runnable
forth2c), so I'll use the following example:

: max1
2dup < if
swap
endif
drop ;

forth2c compiles this to:

Cell max1(Cell p0, Cell p1)
{
Cell _c_result;
Cell x0;
Cell x1;

{ /* 2dup */
Cell n1, n2;
n1 = p0;
n2 = p1;
p1 = n2;
p0 = n1;
x0 = n2;
x1 = n1;
}
{ /* less-than */
Cell n1, n2, n;
n1 = x1;
n2 = x0;
n = FLAG(n2 < n1);
x0 = n;
}
if (!x0) goto label0;
{ /* swap */
Cell n1, n2;
n1 = p0;
n2 = p1;
p1 = n1;
p0 = n2;
}
label0:
{ /* drop */
}
{ /* exit */
_c_result = p1;
return (_c_result);
}
}

Note the lack of memory references. Only local variables are used.

gcc-10.2.1 compiles this to:

max1:
cmpq %rsi, %rdi
movq %rsi, %rax
cmovge %rdi, %rax
ret

Bottom line: to use "optimization for free" from C compilers, you need
to know what C compilers can do and what they are not so good at.

Anton Ertl

unread,
Dec 18, 2021, 10:42:12 AM12/18/21
to
Not sure how you compute the 1.5 times. What I see there is that the
fastest system (a 2x64-core EPYC 7763 system) builds LLVM in 99s. My
guess is that Stephen Pelc's client is more interested in building his
application than in building LLVM, and I expect that your optizing C++
compiler is not up to the task.

Plus, your optimizing C++ compiler still takes more than three times
as long for its task than VFX takes for its task. If you think that a
smaller code base leads to smaller compile times, that's not
necessarily the case: Build a recent gforth with clang
(./BUILD-FROM-SCRATCH CC=clang), and see it crawl; I just did so on a
Ryzen 5800X, and it took 2161s real time, and 9100s CPU time (28s of
that was system time). There are 31694 lines of .c files and 21868 of
.h files in Gforth when the C compiler is invoked, so a lines/s metric
does not look so great in this case. Using gcc instead results in
more acceptable build times (48s real, 86u+6s CPU).

And in any case, if you compare clang/llvm building itself, shouldn't
you compare that to, say, VFX building itself (ok, no sources
available, so that's not going to fly), or maybe SwiftForth or FLK
building itself?

For Gforth much of the time is taken by configure, the C compiler, and
texinfo. Let's try to get numbers for the Forth compiler part only:

rm kernl64l.fi*; time make kernl64l.fi
...
real 0m0.227s
user 0m0.185s
sys 0m0.001s

rm gforth.fi; time make gforth.fi
...
real 0m0.641s
user 0m0.498s
sys 0m0.017s

Quite a bit faster than LLVM recompiling itself.

Anton Ertl

unread,
Dec 18, 2021, 10:49:07 AM12/18/21
to
It looked much better on paper:-) Thanks for checking.

>When I talk about "ecosystem", I mean not only compiler optimisations,
>but a lot of other basic tools. For example, AFAIK, only GForth has some
>support for "coverage analysis".

That's a rather recent addition.

>> As for the 30 years of analytic Forth compilers:
>>
>> 1984? cforth
>> 1986 jForth (AFAIK analytic, not sure if from the start)
>> 1990 MOPS (AFAIK analytic)
1995 iForth
>> 1998 VFX
>> 1998 FLK
>> 2005 LXF/NTF
>> 2011 Mecrisp
>>
>
>The term "analytic compiler" is new to me. Do you call so compilers
>that make some advanced stack to registers mappings?

Compilers that model at compile time what happens on the stack at
run-time, and use that to keep stack items in registers. Not sure who
first coined the term.

Anton Ertl

unread,
Dec 18, 2021, 10:50:22 AM12/18/21
to
No, it's supposed to say JForth <http://jforth.org/>. I should also
have added iForth.

Anton Ertl

unread,
Dec 18, 2021, 11:06:31 AM12/18/21
to
Nickolay Kolchin <nbko...@gmail.com> writes:
>1. Forth "ambiguous condition" are no better than C "undefined behaviour".

There are two reasons why it is better:

a) C's undefined behaviour allows time travel, Forth's ambiguous
conditions don't. So if a program does this and that and then runs
into an ambiguous condition, it at least has to exhibit all the
visible behaviour of this and that. E.g., even an adversarial
standard-compliant Forth compiler could not compile something like the
SATD example (from page 4 of
<https://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_2015_submission_29.pdf>)
into an empty endless loop, while a conforming C implementation can.

b) Forth implementations avoid such shenanigans. Of course that is
weak. C implementations did not use to do such shenanigans, either,
and then they started, produced lots of propaganda on why such
shenanigans are desirable and proper, and that C never was meant
differently, and now have a large following of devout believers in
these claims. But maybe reason a) helps in making that path look less
attractive.

>2. Formal Verified Forth doesn't exist. :)

There has been some work in that direction, mainly from Bill Stoddart
and his associates.

Anton Ertl

unread,
Dec 18, 2021, 11:19:25 AM12/18/21
to
Nickolay Kolchin <nbko...@gmail.com> writes:
>Inability to use hardware at full due to language limitations
>(SIMD) is the third.

It seems to me that Forth sits in the same boat here as everybody
else. No language has standardized SIMD extensions; Fortran has the
array sub-language, but unfortunately it works on arrays (with alias
problems), not on some opaque type, so it's hardly easier to compile
than auto-vectorizing loops (and indeed, I hear that gfortran compiles
the array sublanguage into loops and then hopes that the
auto-vectorizer manages to vectorize them). GCC has vector types, but
they are limited (e.g., fixed-size) and seem to receive little love
from the gcc maintainers (after all, they are not used in the relevant
benchmarks).

I have done some work on using SIMD in Forth, but it has not gotten
into the production stage yet:

@InProceedings{ertl17,
author = {M. Anton Ertl},
title = {{SIMD} and Vectors},
crossref = {euroforth17},
pages = {25--36},
url = {http://www.euroforth.org/ef17/papers/ertl.pdf},
video = {https://wiki.forth-ev.de/lib/exe/fetch.php/events:ef2017:simd-vectors.mp4},
OPTnote = {refereed},
abstract = {Many programs have parts with significant data
parallelism, and many CPUs provide SIMD instructions
for processing data-parallel parts faster. The weak
link in this chain is the programming language. We
propose a vector wordset so that Forth programmers
can make use of SIMD instructions to speed up the
data-parallel parts of their applications. The
vector wordset uses a separate vector stack
containing opaque vectors with run-time determined
length. Preliminary results using one benchmark
show a factor~8 speedup of a simple vector
implementation over scalar Gforth code, a smaller
(factor 1.8) speedup over scalar VFX code; another
factor of 3 is possible on this benchmark with a
more sophisticated implementation. However, vectors
have an overhead; this overhead is amortized in this
benchmark at vector lengths between 3 and 250
(depending on which variants we compare).}
}

@Proceedings{euroforth17,
title = {33rd EuroForth Conference},
booktitle = {33rd EuroForth Conference},
year = {2017},
key = {EuroForth'17},
url = {http://www.complang.tuwien.ac.at/anton/euroforth/ef17/papers/proceedings.pdf}
}

@InProceedings{ertl18manlang,
author = {M. Anton Ertl},
title = {Software Vector Chaining},
booktitle = {15th International Conference on Managed Languages &
Runtimes (Manlang'18)},
year = {2018},
pages = {Article-18},
url = {http://www.complang.tuwien.ac.at/papers/ertl18manlang.pdf},
doi = {10.1145/3237009.3237021},
abstract = {Providing vectors of run-time determined length as
opaque value types is a good interface between the
machine-level SIMD instructions and portable
application-oriented programming languages.
Implementing vector operations requires a loop that
breaks the vector into SIMD-register-sized chunks.
A compiler can fuse the loops of several vector
operations together. However, during normal
compilation this is only easy if no other control
structures are involved. This paper explores an
alternative: collect a trace of vector operations at
run-time (following the program control flow during
this collecting step), and then perform the combined
vector loop. This arrangement has a certain
run-time overhead, but its implementation is simpler
and can happen independently, in a library.
Preliminary performance results indicate that the
overhead makes this approach beneficial only for
long vectors ($>1$KB). For shorter vectors, unfused
loops should be used in a library setting.
Fortunately, this choice can be made at run time,
individually for each vector operation.}
}

@InProceedings{ertl18chaining,
author = {M. Anton Ertl},
title = {Software Vector Chaining},
crossref = {euroforth18},
pages = {54-55},
url = {http://www.euroforth.org/ef18/papers/ertl-chaining.pdf},
url-slides = {http://www.euroforth.org/ef18/papers/ertl-chaining-slides.pdf},
video = {https://wiki.forth-ev.de/doku.php/events:ef2018:vectors},
OPTnote = {presentation slides, paper published at Manlang'18},
abstract = {Providing vectors of run-time determined length as
opaque value types is a good interface between the
machine-level SIMD instructions and portable
application-oriented programming languages.
Implementing vector operations requires a loop that
breaks the vector into SIMD-register-sized chunks.
A compiler can fuse the loops of several vector
operations together. However, during normal
compilation this is only easy if no other control
structures are involved. This paper explores an
alternative: collect a trace of vector operations at
run-time (following the program control flow during
this collecting step), and then perform the combined
vector loop. This arrangement has a certain
run-time overhead, but its implementation is simpler
and can happen independently, in a library.
Preliminary performance results indicate that the
overhead makes this approach beneficial only for
long vectors ($>1$KB). For shorter vectors, unfused
loops should be used in a library setting.
Fortunately, this choice can be made at run time,
individually for each vector operation.}
}

@Proceedings{euroforth18,
title = {34th EuroForth Conference},
booktitle = {34th EuroForth Conference},
year = {2018},
key = {EuroForth'18},
url = {http://www.euroforth.org/ef18/papers/proceedings.pdf}
}

Github page:

https://github.com/AntonErtl/vectors

Nickolay Kolchin

unread,
Dec 18, 2021, 12:08:53 PM12/18/21
to
On Saturday, December 18, 2021 at 6:42:12 PM UTC+3, Anton Ertl wrote:
> Nickolay Kolchin <nbko...@gmail.com> writes:
> >On Friday, December 17, 2021 at 7:21:12 PM UTC+3, Stephen Pelc wrote:
> >> A major client is much more interested in compilation speed than in
> >> execution speed. He compiles 1.2 million lines of Forth source code in 29
> >> seconds.
> >>
> >
> >Well, according to this https://openbenchmarking.org/test/pts/build-llvm you
> >are only 1.5 times slower than optimising C++ compiler... (LLVM is about 10M
> >lines).
> Not sure how you compute the 1.5 times. What I see there is that the
> fastest system (a 2x64-core EPYC 7763 system) builds LLVM in 99s. My
> guess is that Stephen Pelc's client is more interested in building his
> application than in building LLVM, and I expect that your optizing C++
> compiler is not up to the task.

I've just divided estimated lines count on compilation time to get line per
second.

>
> Plus, your optimizing C++ compiler still takes more than three times
> as long for its task than VFX takes for its task. If you think that a
> smaller code base leads to smaller compile times, that's not
> necessarily the case: Build a recent gforth with clang
> (./BUILD-FROM-SCRATCH CC=clang), and see it crawl; I just did so on a
> Ryzen 5800X, and it took 2161s real time, and 9100s CPU time (28s of
> that was system time). There are 31694 lines of .c files and 21868 of
> .h files in Gforth when the C compiler is invoked, so a lines/s metric
> does not look so great in this case. Using gcc instead results in
> more acceptable build times (48s real, 86u+6s CPU).
>
> And in any case, if you compare clang/llvm building itself, shouldn't
> you compare that to, say, VFX building itself (ok, no sources
> available, so that's not going to fly), or maybe SwiftForth or FLK
> building itself?

No, I just wanted to show that this metric is pretty useless. I.e. we can
build complex project like LLVM, written in a language that is hard to
parse/compile/etc at better line/s ratio.

>
> For Gforth much of the time is taken by configure, the C compiler, and
> texinfo. Let's try to get numbers for the Forth compiler part only:
>
> rm kernl64l.fi*; time make kernl64l.fi
> ...
> real 0m0.227s
> user 0m0.185s
> sys 0m0.001s
>
> rm gforth.fi; time make gforth.fi
> ...
> real 0m0.641s
> user 0m0.498s
> sys 0m0.017s
>
> Quite a bit faster than LLVM recompiling itself.

I think we both understand that C++ is not the fastest compile
language in the world. It is possible to write a small (several hundred lines)
file that will compile for minutes. Just mix sol2 and exprtk together.

So, lets compare with modern language that actually cared about build
speed during design. Let's take Go for example: on my machine, Go compile
itself for 38s, line count ~2M.

I.e. Pelc numbers are not that impressive.

Anton Ertl

unread,
Dec 18, 2021, 12:15:31 PM12/18/21
to
Marcel Hendrix <m...@iae.nl> writes:
>In assembly language called from Forth my best results are about a
>factor of 2, at the cost of annoying alignment restrictions to program
>around.

There are alignment restrictions in SSE/SSE2, but not in AVX*. Of
course, for performance its helpful if accesses are aligned in AVX*.
There also tend to be problems with the last few elements of an array.
Supposedly AVX-512 solves this, but I have yet to see if this is
efficient.

>It would be interesting to see non-obvious examples where SIMD
>brings a significant advantage

My C transliteration of Jon Bentley's traveling salesman problem
program that I showed earlier is actually vectorizable.

You can find an extended discussion in the thread starting with
<2016Nov1...@mips.complang.tuwien.ac.at>, with the most
interesting results in <2016Nov1...@mips.complang.tuwien.ac.at>
(Google groups seems to show the whole thread at
<https://groups.google.com/g/comp.arch/c/BpKnUXKkBNk/m/YhHACqb7BAAJ>)

Xeon E3-1220 i7-4690K
Sandy Bridge Haswell
cycles cycles instructions
240,874,925 192,411,043 413,323,795 scalar smart
153,192,044 136,464,596 181,520,934 AVX smart with prefetch
131,286,268 95,393,579 206,593,195 AVX hard with prefetch
128,785,936 123,517,484 230,002,974 AVX "branchless"

So, a factor of 2 for the best AVX version of Haswell compared to the
scalar version.

> and even more interesting to see how
>extensively the compiler has to reorder the source code to get that
>result.

This is all assembly (or intrinsic) work. I tried to transform the
program (array-of-structs into two arrays
<https://www.complang.tuwien.ac.at/anton/lvas/effizienz/tspb.c>) in
the hope that gcc would auto-vectorize it, but at least at the time,
gcc did not go for it. And testing it now with gcc-10.2.1, I also
don't see any significant vectorization, and an actual slowdown. On a
Skylake:

cycles:u instructions:u
184,533,267 410,833,651 tsp9.c (scalar with array-of-structs)
190,896,279 360,826,233 tspb.c (two arrays)

#compiled with
gcc -Wall -O3 -march=native -mtune=native tsp9.c -lm
#measured with
LC_NUMERIC=en_US.utf8 perf stat -e cycles:u -e instructions:u a.out 10000 >/dev/null

Nickolay Kolchin

unread,
Dec 18, 2021, 12:33:22 PM12/18/21
to
Have you looked at what Factor do? They have several benchmark
samples in simd and normal form. For example, nbody. Their bench
suite report this timings:

benchmark.nbody 0.928451599
benchmark.nbody-simd 0.066342094

I've briefly compared source code and found no non-obvious differences.
I haven't investigated this and don't know what happens there internally.

P.S. You asked for 10x difference? Here it is! :)

Anton Ertl

unread,
Dec 18, 2021, 12:53:50 PM12/18/21
to
Nickolay Kolchin <nbko...@gmail.com> writes:
>On Saturday, December 18, 2021 at 1:15:32 PM UTC+3, Anton Ertl wrote:
>> The problems with that approach are:
>>
>> * The sanitizer tends to be released at the same time or after the
>> breaking change, long after the legacy code.
>>
>> * The sanitizers perform run-time checks, so they will only discover
>> problems that occur during the test runs.
>>
>> * What's worse, not all sanitizers can be combined. So you need
>> multiple compilations and test runs with different sets of
>> sanitizers. And the code with the sanitizers compiled in is slow.
>>
>> So calling it "the best thing" to me sounds like a case of Stockholm
>> Syndrome.
>>
>> [Exception: Valgrind can be useful, and some sanitizers implement functionality
>> also present in valgrind, so that is a good feature.]
>
>We have automated testing and coverage for that.

Interesting. How much does that cost?

>Combined with sanitizers
>we can be sure that our application doesn't have UD defects.

Or at least you have done all you can. Some smart compiler maintainer
may already be working on the next "optimization" based on the
assumption that programs do not perform undefined behaviour, for some
other undefined behaviour that's not covered by the currently existing
sanitizers.

And 100% coverage for all the defensive programming stuff that should
never happen is hard to believe. Or do you leave it away, because it
should never happen?

>> >Each gcc/clang release is deeply investigated for performance regressions
>> >on sites like phoronix, so I don't believe that serious regressions exist.
>> Deeply investigated? Phoronix is just Michael Larabel, and covers
>> many other topics; what he does is to use the compiler on a set of
>> benchmarks and compare the performance results of that, on one CPU
>> (e.g.,
>> <https://www.phoronix.com/scan.php?page=article&item=gcc-12-alderlake&num=1>).
>> This only shows how the compiler performs on these benchmarks, on this
>> CPU.
>
>You can download phoronix-testsuite and run them yourself. Many people do this.

And it will still only show results for these benchmarks. And when
someone discovers a regression for some CPU, will that regression be
"deeply analysed" on phoronix? Even someone as high-profile as Agner
Fog seems to have trouble coming through when he discusses how Intel's
compiler or library always uses the slow path on non-Intel CPUs.

>> >There
>> >may be changes related to "new standard understanding" (like aliasing), but
>> >those are part of language evolution.
>> If breaking existing, tested production programs is part of language
>> evolution, that's a part we can do without.
>
>Wrong. To move forward it may be necessary to break something.

It may be. But in these cases it isn't, and they are not moving
forward.

>For legacy
>builds -- docker exist.

What if docker or the OS kernel take the same cavalier attitude
towards breaking changes?

>Now I'm starting to understand what happened to Forth. You cared about old
>programs so much, that missed the point when they become obsolete and
>vanished. And new ones didn't appear because you didn't care about them.

Actually I think the cavalier attitude towards breaking existing code
shown in Forth-83 standardization mas quite hurtful for Forth's
popularity in the second half of the 1980s. Not sure if it was
decisive, but it sure did not help.

Otherwise, I don't remember any proposal for standardization that
really needed to break things. If lack of modern features is an
issue, the problem was that the need for such features was not seen
(by users, consequently by system implementors, and consequently by
standardization committee), or (to a lesser extent) that people could
not agree on a common approach towards such a feature, or that people
simply lost the drive to make proposals, not that some existing
feature was in the way.

However, I think that the main issue is that Forth throve on small
systems with little RAM (or ROM). It's advantages for such systems no
longer played a role when more RAM became available, and Forth
programmers sometimes were not so great at adapting to larger hardware
(Valdocs) and Forth advocates continued to preach that small is
beautiful, not how to make use of the additional capabilities to make
great applications.

But if you look at what happened to similar languages which embraced
the additional hardware capabilities, like Fifth (Paul Snow and Cliff
Click), Factor, Eight, and OForth, they seem to stay niche languages,
like Forth itself, rather than eclipsing Forth and displacing C++ or
Java.

minf...@arcor.de

unread,
Dec 18, 2021, 12:56:26 PM12/18/21
to
Of course. Although with necessary memory references for a global interpreter
one would have to add code for stack loading/unloading to local variables.
But it is still impressive.

FWIW the equivalent MinForth code _with_ stack referencing is:
main: # @main
mov rdx, qword ptr [rip + sp]
lea rax, [rdx - 4]
mov ecx, dword ptr [rdx]
cmp dword ptr [rdx - 4], ecx
jge .LBB0_2
mov dword ptr [rax], ecx
.LBB0_2:
mov qword ptr [rip + sp], rax
xor eax, eax
ret
stk:
.zero 40

But to be fair, MAX in high-level code would compile worse to:
main:
mov rax, QWORD PTR sp[rip]
xor ecx, ecx
mov edx, DWORD PTR [rax]
mov esi, DWORD PTR [rax-4]
cmp edx, esi
mov DWORD PTR [rax+8], edx
setg cl
neg ecx
mov DWORD PTR [rax+4], ecx
cmp edx, esi
jle .L2
mov DWORD PTR [rax-4], edx
mov DWORD PTR [rax], esi
.L2:
sub rax, 4
mov QWORD PTR sp[rip], rax
xor eax, eax
ret
sp:
.quad stk
stk:
.zero 40

Anton Ertl

unread,
Dec 18, 2021, 1:03:55 PM12/18/21
to
Marcel Hendrix <m...@iae.nl> writes:
>> https://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_2015_submission=
>_29.pdf=20
>[..]
>
>Recommended! (I somehow missed that one.)
>
>Quote:
>"A tool could also check whether a load in an inner loop always
>produces the same result during a run, and, if so, suggest to the programme=
>r to
>move the load out of the loop manually
...
>I wonder if this happens much in Forth programs.

I expect so.

If you have a loop

begin
...
a @
... while
b @
...
c @
...
... !
repeat

many Forth programmers (including me) consider it bad style to replace
this with

a @ b @ c @
... begin
...
6 pick
... while
4 pick
...
4 pick
...
... !
repeat

because it's hard to know what the PICKs refer to. Many Forth
programmers (excluding me) also consider it bad style to replace this
with

a @ b @ c @ {: a1 b1 c1 :}
begin
...
a1
... while
b1
...
c1
...
... !
repeat

and on many current Forth systems this will run slower (even if it's
easier to get this to run very fast than to get the first version to
run very fast).

Paul Rubin

unread,
Dec 18, 2021, 2:03:08 PM12/18/21
to
Nickolay Kolchin <nbko...@gmail.com> writes:
> Wrong. To move forward it may be necessary to break something.

Ehh not exactly. If the program is not broken you should not break it.
But if it is doing UB, it is already broken, and it can be necessary or
useful to make the brokenness present itself differently. The idea that
you are supposed to preserve the semantics of UB is just bizarre to me.
UB by definition is unpredictable and has no semantics.

I've been playing with an Ada program written in the 1980s and first I
cringed at how rigid it was, but then came to appreciate how bulletproof
it was. For one thing it compiled and worked immediately with current
GNAT.

I notice that the recent Pegasus spyware relied on an exploit of an
unsigned int overflow in an iOS messaging utility. That is not UB
(unsigned int overflow in C is defined as wraparound) but it is
mathematically silly (if x is an integer, x+1 should never be less than
x). Trying to use static analysis tools like Frama or Spark to prove
this never happens in a program is enough hassle to only be worth it for
the most critical systems that can never crash.

Ada does a reasonable thing instead, which is emit a runtime overflow
check and exception on overflow (people unfortunately tend to turn that
off by enabling an unsafe optimization). Then the overflow bug causes
an obvious crash and diagnostic, instead of a silently exploitable
misbehaviour.

It wouldn't surprise me if this bug has gotten people killed, as the
usual baddies deploy the spyware to monitor and round up their critics.

Paul Rubin

unread,
Dec 18, 2021, 2:11:24 PM12/18/21
to
Nickolay Kolchin <nbko...@gmail.com> writes:
> So, lets compare with modern language that actually cared about build
> speed during design. Let's take Go for example: on my machine, Go compile
> itself for 38s, line count ~2M.

Go is an exceptionally fast compiler. When I saw the claim about LLVM
compiling itself in 99s, I had to wonder what the hardware was: in
particular, how many cores were used in parallel, if more than 1, and
how that compared to what was used for the big Forth app. I've built
LLVM on my wimply 4 core i7 a few times, and it took a lot longer than
99s, iirc.

Nickolay Kolchin

unread,
Dec 18, 2021, 2:12:12 PM12/18/21
to
On Saturday, December 18, 2021 at 8:53:50 PM UTC+3, Anton Ertl wrote:
> Nickolay Kolchin <nbko...@gmail.com> writes:
> >On Saturday, December 18, 2021 at 1:15:32 PM UTC+3, Anton Ertl wrote:
> >> The problems with that approach are:
> >>
> >> * The sanitizer tends to be released at the same time or after the
> >> breaking change, long after the legacy code.
> >>
> >> * The sanitizers perform run-time checks, so they will only discover
> >> problems that occur during the test runs.
> >>
> >> * What's worse, not all sanitizers can be combined. So you need
> >> multiple compilations and test runs with different sets of
> >> sanitizers. And the code with the sanitizers compiled in is slow.
> >>
> >> So calling it "the best thing" to me sounds like a case of Stockholm
> >> Syndrome.
> >>
> >> [Exception: Valgrind can be useful, and some sanitizers implement functionality
> >> also present in valgrind, so that is a good feature.]
> >
> >We have automated testing and coverage for that.
> Interesting. How much does that cost?

~120Euro/m for AMD5950X with 128Gb RAM on Heitzner. But this appeared
only after we start working with LLVM/Clang. Before that, we just have
a number of old Atoms and RPI sitting in the corner. You just put
gitlab-runner on them and you are in business.

> >Combined with sanitizers
> >we can be sure that our application doesn't have UD defects.
> Or at least you have done all you can. Some smart compiler maintainer
> may already be working on the next "optimization" based on the
> assumption that programs do not perform undefined behaviour, for some
> other undefined behaviour that's not covered by the currently existing
> sanitizers.

You are absolutely correct -- "we have done all we can".

>
> And 100% coverage for all the defensive programming stuff that should
> never happen is hard to believe. Or do you leave it away, because it
> should never happen?

100% coverage is almost impossible. Only DO-178 projects go for that. 70/80 is
more realistic estimation.

In embellished reality this works this way:

1. We have a running test system (this may involve stand creation)
2. We have a number of tests that check that we satisfy project requirements.
3. If bugs are found, they are first reproduced with test system and stay there
forever, so we can be sure that they don't reappear.

> >> >Each gcc/clang release is deeply investigated for performance regressions
> >> >on sites like phoronix, so I don't believe that serious regressions exist.
> >> Deeply investigated? Phoronix is just Michael Larabel, and covers
> >> many other topics; what he does is to use the compiler on a set of
> >> benchmarks and compare the performance results of that, on one CPU
> >> (e.g.,
> >> <https://www.phoronix.com/scan.php?page=article&item=gcc-12-alderlake&num=1>).
> >> This only shows how the compiler performs on these benchmarks, on this
> >> CPU.
> >
> >You can download phoronix-testsuite and run them yourself. Many people do this.
> And it will still only show results for these benchmarks. And when
> someone discovers a regression for some CPU, will that regression be
> "deeply analysed" on phoronix? Even someone as high-profile as Agner
> Fog seems to have trouble coming through when he discusses how Intel's
> compiler or library always uses the slow path on non-Intel CPUs.

There will be some noise about serious regressions, which lead to one of
possible outcomes:

- Developers will fix regression
- We will be prepared and skip problematic release.

From my experience, it takes less time to keep source code compatible with
current developer utilities, compared to sitting on one version for 10 years and
then trying to build with modern tools.

> >> >There
> >> >may be changes related to "new standard understanding" (like aliasing), but
> >> >those are part of language evolution.
> >> If breaking existing, tested production programs is part of language
> >> evolution, that's a part we can do without.
> >
> >Wrong. To move forward it may be necessary to break something.
> It may be. But in these cases it isn't, and they are not moving
> forward.
> >For legacy
> >builds -- docker exist.
> What if docker or the OS kernel take the same cavalier attitude
> towards breaking changes?

We'll go in full virtualisation. I've just run ForthCMP on ARM as you know.

> >Now I'm starting to understand what happened to Forth. You cared about old
> >programs so much, that missed the point when they become obsolete and
> >vanished. And new ones didn't appear because you didn't care about them.
> Actually I think the cavalier attitude towards breaking existing code
> shown in Forth-83 standardization mas quite hurtful for Forth's
> popularity in the second half of the 1980s. Not sure if it was
> decisive, but it sure did not help.
>
> Otherwise, I don't remember any proposal for standardization that
> really needed to break things. If lack of modern features is an
> issue, the problem was that the need for such features was not seen
> (by users, consequently by system implementors, and consequently by
> standardization committee), or (to a lesser extent) that people could
> not agree on a common approach towards such a feature, or that people
> simply lost the drive to make proposals, not that some existing
> feature was in the way.
>

But at present, there are almost no users... Secondly, this is Forth.
Compatibility with previous standards can be achieved by simple
wordlist order replacement...

> However, I think that the main issue is that Forth throve on small
> systems with little RAM (or ROM). It's advantages for such systems no
> longer played a role when more RAM became available, and Forth
> programmers sometimes were not so great at adapting to larger hardware
> (Valdocs) and Forth advocates continued to preach that small is
> beautiful, not how to make use of the additional capabilities to make
> great applications.

This is part of evolution. You should adapt or die. I still think for Forth
hasn't fully reached its potential. For example, Dart died, but was
resurrected by Flutter.

>
> But if you look at what happened to similar languages which embraced
> the additional hardware capabilities, like Fifth (Paul Snow and Cliff
> Click), Factor, Eight, and OForth, they seem to stay niche languages,
> like Forth itself, rather than eclipsing Forth and displacing C++ or
> Java.

I prefer to look at successful languages. How they introduce breaking
changes:

- Fortran (lets pretend it is successful. At least it is the oldest active PL): Hey
guys, here is new Standard. It is completely different from previous ones. Live
with it.

- C/C++: This is new standard. Compatibility is compiler developers problem.

- Python: This is new standard. We have a script that will help you port code
from old one. It doesn't work most of the time, but at least we tried.

- Ada: Here is new standard and military requirements. You have two weeks
to satisfy them (I actually don't know if Ada had breaking changes ever).

I mean that there is nothing dangerous in "breaking" changes. And you spend
years discussing word naming and put '{:' in the standard instead of commonly
used '{', because of some commercial vendor.

Paul Rubin

unread,
Dec 18, 2021, 2:12:24 PM12/18/21
to
an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
> Compilers that model at compile time what happens on the stack at
> run-time, and use that to keep stack items in registers. Not sure who
> first coined the term.

This seems to make the Forth stack into an abstraction inversion.

Nickolay Kolchin

unread,
Dec 18, 2021, 2:42:30 PM12/18/21
to
On Saturday, December 18, 2021 at 10:03:08 PM UTC+3, Paul Rubin wrote:
> Nickolay Kolchin <nbko...@gmail.com> writes:
> > Wrong. To move forward it may be necessary to break something.
> Ehh not exactly. If the program is not broken you should not break it.
> But if it is doing UB, it is already broken, and it can be necessary or
> useful to make the brokenness present itself differently. The idea that
> you are supposed to preserve the semantics of UB is just bizarre to me.
> UB by definition is unpredictable and has no semantics.
>

Your point is valid but I'm talking about different things. For example,
making STATE readonly instead of variable, or removing it at all.

> I've been playing with an Ada program written in the 1980s and first I
> cringed at how rigid it was, but then came to appreciate how bulletproof
> it was. For one thing it compiled and worked immediately with current
> GNAT.
>
> I notice that the recent Pegasus spyware relied on an exploit of an
> unsigned int overflow in an iOS messaging utility. That is not UB
> (unsigned int overflow in C is defined as wraparound) but it is
> mathematically silly (if x is an integer, x+1 should never be less than
> x). Trying to use static analysis tools like Frama or Spark to prove
> this never happens in a program is enough hassle to only be worth it for
> the most critical systems that can never crash.
>

The cost of verification goes down each year. Since we have now access to
almost infinite computing power, we are able to bruteforce many things.
And automatic provers like Z3 also progressed greatly in recent years.

> Ada does a reasonable thing instead, which is emit a runtime overflow
> check and exception on overflow (people unfortunately tend to turn that
> off by enabling an unsafe optimization). Then the overflow bug causes
> an obvious crash and diagnostic, instead of a silently exploitable
> misbehaviour.

This is not that simple. You must also do some error recovery afterwards.
This can: be very complex, introduce new bugs, require additional months
of testing. With automatic prover you are on straight line, without all this
troubles.

Nickolay Kolchin

unread,
Dec 18, 2021, 3:31:11 PM12/18/21
to
On Saturday, December 18, 2021 at 7:06:31 PM UTC+3, Anton Ertl wrote:
> Nickolay Kolchin <nbko...@gmail.com> writes:
> >1. Forth "ambiguous condition" are no better than C "undefined behaviour".
> There are two reasons why it is better:
>
> a) C's undefined behaviour allows time travel, Forth's ambiguous
> conditions don't. So if a program does this and that and then runs
> into an ambiguous condition, it at least has to exhibit all the
> visible behaviour of this and that. E.g., even an adversarial
> standard-compliant Forth compiler could not compile something like the
> SATD example (from page 4 of
> <https://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_2015_submission_29.pdf>)
> into an empty endless loop, while a conforming C implementation can.

I'm not sure that your example is valid. Gcc had bug there. The logic
must look like this:

- d[k]: k is in bounds
- k+=1: we don't care about k value now.
- k < 16: we terminate loop here, keeping k in bounds.

KForth may have this bug, though... If it implements such level of CFA. :)

>
> b) Forth implementations avoid such shenanigans. Of course that is
> weak. C implementations did not use to do such shenanigans, either,
> and then they started, produced lots of propaganda on why such
> shenanigans are desirable and proper, and that C never was meant
> differently, and now have a large following of devout believers in
> these claims. But maybe reason a) helps in making that path look less
> attractive.

I always have good time reading crypto guys cry when new gcc version
breaks their time based loops.

> >2. Formal Verified Forth doesn't exist. :)
> There has been some work in that direction, mainly from Bill Stoddart
> and his associates.

It looks purely academical. RVM had no public releases?

Paul Rubin

unread,
Dec 18, 2021, 4:43:37 PM12/18/21
to
Nickolay Kolchin <nbko...@gmail.com> writes:
> - d[k]: k is in bounds
> - k+=1: we don't care about k value now.
> - k < 16: we terminate loop here, keeping k in bounds.

The code in question said:

int d[16];
int SATD (void)
{
int satd = 0, dd, k;
for (dd=d[k=0]; k<16; dd=d[++k]) {
satd += (dd < 0 ? -dd : dd);
}
return satd;
}

Look at the iteration where k=15. This is less than 16, so the loop
body runs. At the end of the loop body, the increment dd=[++k] runs.
That is, dd=k[16]. That is an OOB access, which is UB, so you are on
your own. What we really want here is checked arrays, but C doesn't
have them. That is, C sucks.

I haven't tried Rust yet. In C++, I generally use vector.at() which is
checked, instead of [] which is not. Ada always supposed to check stuff
like this. Ada is ugly and clumsy in many ways, but there are things to
like about it.

Hugh Aguilar

unread,
Dec 18, 2021, 8:07:50 PM12/18/21
to
On Saturday, December 18, 2021 at 10:53:50 AM UTC-7, Anton Ertl wrote:
> However, I think that the main issue is that Forth throve on small
> systems with little RAM (or ROM). It's advantages for such systems no
> longer played a role when more RAM became available, and Forth
> programmers sometimes were not so great at adapting to larger hardware
> (Valdocs) and Forth advocates continued to preach that small is
> beautiful, not how to make use of the additional capabilities to make
> great applications.

There is no such word as: "throve." The word is: "thrived."
Anyway, what Anton Ertl says about Forth thriving in small-RAM systems
is total baloney anyway. I suspect that he gets such nonsense from Elizabeth Rather
who was not a programmer.

Forth was a terrible language for processors with a small RAM because
those processors (such as the 8051) typically only support
the direct addressing-mode (8-bit address) and do not support the
indexed addressing-mode, so they don't support the Forth data-stack very well.
This continues to be a problem today because the indexed addressing-mode is typically
a lot slower than the direct addressing-mode because it does an addition internally.
It is possible to store local variables in direct-memory and use the direct addressing-mode
(you have to make sure that functions in the same call-chain use different variables
for their locals so they don't over-write each other's data, but this information is available
at compile-time) --- Walter Banks had a C compiler that did this, and there were others.
It would be difficult or impossible to write a Forth compiler to do this. This problem
can be fixed in an FPGA because it can have the entire Forth data-stack in registers and
it can transfer multiple registers into other registers to make room on the data-stack
or to consume an element from the data-stack (the data-stack is typically limited to
a depth of eight elements, which may be a problem, but the same idea could be done
with a deeper stack).

What I remember from the late 1980s is that Forth was a terrible fit for most processors
because they had too few registers. C uses one register for both the data-stack and the
return-stack, but Forth uses separate registers for the data-stack and the return-stack.
The MC6809 had its U and S registers, so it was a good choice for Forth. The MC6809
became obsolete though and was replaced by the MC6808 and MC6812, both of which
were obviously designed for C and not for Forth.

In regard to desktop computers, Forth would be a better choice than C given my
rquotations --- the Forth-200x committee failed to include rquotations, and all they have
are the Paysan-faked quotations that are just syntactic sugar for :NONAME and are
pretty-much worthless for supporting general-purpose data-structures. My rquotations
are more important to the future of Forth on desktop computers than the entire
Forth-200x committee put together. For Forth to succeed, it is necessary to have
rquotations, but the Forth-200x committee can be discarded as less than worthless.

dxforth

unread,
Dec 18, 2021, 9:11:57 PM12/18/21
to
On 19/12/2021 12:07, Hugh Aguilar wrote:
> ...
> Forth was a terrible language for processors with a small RAM because
> those processors (such as the 8051) typically only support
> the direct addressing-mode (8-bit address) and do not support the
> indexed addressing-mode, so they don't support the Forth data-stack very well.

Early microcontrollers (ROM RAM CPU on one chip) were severely resource
limited. If you wanted to run a HLL on them, you accepted the resulting
performance. BASIC interpreters were all the rage at the time so Intel
created one for the 8051 series (MCS BASIC-52). Inevitably Forth got
ported but perhaps the most interesting one was Myforth:

http://www.kiblerelectronics.com/myf/myf.shtml

Nickolay Kolchin

unread,
Dec 18, 2021, 9:32:17 PM12/18/21
to
On Sunday, December 19, 2021 at 12:43:37 AM UTC+3, Paul Rubin wrote:
> Nickolay Kolchin <nbko...@gmail.com> writes:
> > - d[k]: k is in bounds
> > - k+=1: we don't care about k value now.
> > - k < 16: we terminate loop here, keeping k in bounds.
> The code in question said:
>
> int d[16];
> int SATD (void)
> {
> int satd = 0, dd, k;
> for (dd=d[k=0]; k<16; dd=d[++k]) {
> satd += (dd < 0 ? -dd : dd);
> }
> return satd;
> }
>
> Look at the iteration where k=15. This is less than 16, so the loop
> body runs. At the end of the loop body, the increment dd=[++k] runs.
> That is, dd=k[16]. That is an OOB access, which is UB, so you are on
> your own. What we really want here is checked arrays, but C doesn't
> have them. That is, C sucks.
>

Oops, my bad. You are right. I've misread preincrement with postincrement.

Ron AARON

unread,
Dec 18, 2021, 11:40:21 PM12/18/21
to


On 19/12/2021 3:07, Hugh Aguilar wrote:
> On Saturday, December 18, 2021 at 10:53:50 AM UTC-7, Anton Ertl wrote:
>> However, I think that the main issue is that Forth throve on small
>> systems with little RAM (or ROM). It's advantages for such systems no
>> longer played a role when more RAM became available, and Forth
>> programmers sometimes were not so great at adapting to larger hardware
>> (Valdocs) and Forth advocates continued to preach that small is
>> beautiful, not how to make use of the additional capabilities to make
>> great applications.
>
> There is no such word as: "throve." The word is: "thrived."

Maybe you should consult a dictionary first, before opening your trap.

dxforth

unread,
Dec 18, 2021, 11:46:13 PM12/18/21
to
IIUC that code is defective; in which case why should it matter if a dumb
compiler generates one thing and a smart compiler something else? Garbage
in - garbage out.
It is loading more messages.
0 new messages