Are stack machines inherently faster?

polymorph self

unread,

Nov 13, 2016, 9:33:26 PM11/13/16

to

I am noticing 15,000 context switches per second on my debian testing linux 4.7 6 cpu amd fx machine.

Are stack chips misc (?) and thus since fewer instructions, a 300mhz chip that is a stack machine gets a lot more done per cycle? or does less work to accomplish the same thing?

rickman

unread,

Nov 13, 2016, 11:02:32 PM11/13/16

to

No.

--

Rick C

polymorph self

unread,

Nov 14, 2016, 8:00:01 PM11/14/16

to

misc having few instructions to get things done
sounds like less
gets more done

rickman

unread,

Nov 14, 2016, 10:27:53 PM11/14/16

to

No.

--

Rick C

Ilya Tarasov

unread,

Nov 17, 2016, 9:26:52 AM11/17/16

to

> > misc having few instructions to get things done
> > sounds like less
> > gets more done
>
> No.

Yes :)

foxaudio...@gmail.com

unread,

Nov 17, 2016, 11:08:08 AM11/17/16

to

Customer: "That's not an argument, that's just contradiction."
Counselor: "No it isn't."
Monty Python, The Argument

From what I remember of Stack Machines, Koopman, it really depends on what the
problem is that you are trying to solve. Stack machines can be remarkably
faster at servicing a simple hardware interrupt, but register machines can win
the day in general computation benchmarks and so on.

Disclaimer: I did not check the text.

My own experience is that in real world projects, even ITC Forth performed very
well, if you allow for a hand coded primitive or two or three and assembler can
even be eliminated by inlining Forth primitives without next where that is
possible in critical inner loop stuff.

IMHO Forth and Forth CPUs seems to be better in practice than theory would
indicate, but I don't have the studies to prove that.

BF

m...@iae.nl

unread,

Nov 17, 2016, 1:52:47 PM11/17/16

to

On Thursday, November 17, 2016 at 5:08:08 PM UTC+1, foxaudio...@gmail.com wrote:
[..]

> My own experience is that in real world projects, even ITC Forth
> performed very well, if you allow for a hand coded primitive or
> two or three and assembler can even be eliminated by inlining Forth
> primitives without next where that is
> possible in critical inner loop stuff.

It depends on what you call "very well." I would say that you're
describing a popular Forth myth. It is not true.

There have been (very, very long ago), CLF threads that posed
problems that piqued people's interest and resulted in a flurry
of smart algorithmic tricks that would have been a lot more
cumbersome to come up with, design, and test in other languages.

-marcel

Elizabeth D. Rather

unread,

Nov 17, 2016, 2:57:37 PM11/17/16

to

On 11/17/16 8:52 AM, m...@iae.nl wrote:
> On Thursday, November 17, 2016 at 5:08:08 PM UTC+1, foxaudio...@gmail.com wrote:
> [..]
>> My own experience is that in real world projects, even ITC Forth
>> performed very well, if you allow for a hand coded primitive or
>> two or three and assembler can even be eliminated by inlining Forth
>> primitives without next where that is
>> possible in critical inner loop stuff.
>
> It depends on what you call "very well." I would say that you're
> describing a popular Forth myth. It is not true.

Well, it was my experience as well. But in a real world project, good
design of the software is a far greater determinant of performance than
anything else.

> There have been (very, very long ago), CLF threads that posed
> problems that piqued people's interest and resulted in a flurry
> of smart algorithmic tricks that would have been a lot more
> cumbersome to come up with, design, and test in other languages.
>
> -marcel
>

Agreed.

--
==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
http://www.forth.com

"Forth-based products and Services for real-time
applications since 1973."
==================================================

hughag...@gmail.com

unread,

Nov 17, 2016, 3:50:17 PM11/17/16

to

On Thursday, November 17, 2016 at 9:08:08 AM UTC-7, foxaudio...@gmail.com wrote:
> My own experience is that in real world projects, even ITC Forth performed very

> well, if you allow for a hand coded primitive or two or three...

This works well with a fat Standard, because you have more primitives.

Straight Forth is a fat Standard because it includes two data-structures: the chain and the array. ANS-Forth is a thin Standard because it doesn't include any data-structures, so all data-structures have to be custom written in Forth by the application programmer. This is why I'm expecting the ITC version of Straight Forth to generate code that is about twice the speed of SwiftForth code (SwiftForth is STC but has very little optimization) --- of course, I'm also expecting that 99% of Straight Forth programs will use the built-in data-structures --- if you need a different data-structure, and you write it yourself in Forth, then the fact that the Forth code is ITC is going to cause it to be slow (as compared to hand-written assembly-language).

Also, despite the fact that Straight Forth is a fat Standard and ANS-Forth is a thin Standard, I expect Straight Forth to be smaller than ANS-Forth --- this is because I discard all of the weird and useless cruft that is in ANS-Forth --- for example, DO loops get discarded, and I don't care if they were invented by Charles Moore in the 1970s and everybody expects them to be part of any Forth.

Paul Rubin

unread,

Nov 17, 2016, 6:58:35 PM11/17/16

to

m...@iae.nl writes:
>> My own experience is that in real world projects, even ITC Forth

>> performed very well, if you allow for a hand coded primitive ...

> It depends on what you call "very well." I would say that you're
> describing a popular Forth myth. It is not true.

I don't know about Foxaudio, but I'd say "very well" means the ITC
performance is enough to satisfy users' real-world requirements. Most
of my stuff these days is written in Python and it's fast enough for my
purposes, and Gforth-itc is probably >10x faster than Python. I've
never written a Forth program that wasn't a benchmark, that used enough
computation time for me to notice.

Even big problems that need a lot of computation can often be
parallelized. Computers are cheap and it's often better business to
just buy more of them, than to increase software development costs or
decrease maintainability.

> CLF threads that posed problems that piqued people's interest and
> resulted in a flurry of smart algorithmic tricks

Sure, good algorithms can produce a far bigger speedup than careful
optimization of dumb algorithms can possibly hope to.

> that would have been a lot more cumbersome to come up with, design,
> and test in other languages.

I'd be interested in seeing an example.

foxaudio...@gmail.com

unread,

Nov 17, 2016, 7:29:27 PM11/17/16

to

That's what I should have said Paul.

Thank you.

BF

john

unread,

Nov 18, 2016, 3:28:31 AM11/18/16

to

In article <87vavld...@nightsong.com>, no.e...@nospam.invalid says...

> Even big problems that need a lot of computation can often be
> parallelized. Computers are cheap and it's often better business to
> just buy more of them, than to increase software development costs or
> decrease maintainability.
>

To some degree that is something I've been considering myself.
But care needs to be used to stop it becoming habitual or inappropriate..

Can I take this opportunity to let you all know that jim - who
has been staying with me and managing my web site
the last couple of weeks and posting here - has had some bad news
and had to fly home.
If anyone was expecting replies from him that will be why he isn't
responding. I'm expecting him back in a couple of weeks.

If you want to know what we are working on I've put a preliminary
release on the web site (library and jims forum). As I've told jim, any
input from readers of this group would be more than welcome.

john

=========================
http://johntech.co.uk

"Bleeding Edge Forum"
http://johntech.co.uk/forum/

=========================

Stephen Pelc

unread,

Nov 18, 2016, 6:17:01 AM11/18/16

to

On Thu, 17 Nov 2016 15:58:33 -0800, Paul Rubin
<no.e...@nospam.invalid> wrote:

>I don't know about Foxaudio, but I'd say "very well" means the ITC
>performance is enough to satisfy users' real-world requirements. Most
>of my stuff these days is written in Python and it's fast enough for my
>purposes, and Gforth-itc is probably >10x faster than Python. I've
>never written a Forth program that wasn't a benchmark, that used enough
>computation time for me to notice.

One of our clients has a recalculation that used to take 15 or so
seconds on our DTC ProForth for Windows. The boss excitedly took
me to his office to show that the conversion to VFX Forth took
about 1.5 seconds. To him, that was important.

These days, there are plenty of algorithms that do not follow the
"90% of the time in 10% of the code" model.

Stephen

--
Stephen Pelc, steph...@mpeforth.com
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691
web: http://www.mpeforth.com - free VFX Forth downloads

foxaudio...@gmail.com

unread,

Nov 18, 2016, 9:37:37 AM11/18/16

to

That's a better ratio of improvement over DTC in intel than I would have
expected Stephen, so VFX is quite efficient.

I have a question.

Let me begin with the fact that I have nothing but praise for the excellent
work MPE has done in pushing the envelope in native code generation with Forth
compilers. Here here.

I find myself wondering what you would have done with that client if they had
requested a speedup on that recalculation and you had NOT developed VFX?
( I am assuming they were prepared to pay for the work)

Could you have hand coded a few time critical/inner loop routines and achieved those results?

It's a moot point now since you have invested the time to create VFX but I am
curious nevertheless.

BF

Stephen Pelc

unread,

Nov 18, 2016, 10:30:16 AM11/18/16

to

On Fri, 18 Nov 2016 06:37:35 -0800 (PST), foxaudio...@gmail.com
wrote:

>That's a better ratio of improvement over DTC in intel than I would have
>expected Stephen, so VFX is quite efficient.

Our numbers suggest (it was a long time ago, and CPUs have changed)
that going from DTC to STC gains you about 2.2:1. Adding fairly plain
vanilla optimisation gains you about another 2:1 by simple inlining.
Beyond this, you are in the territory of lots of grunt work. Both VFX
and iForth also tokenise small words and expand them inline, which can

pay big dividends, depending very much on CPU and coding style.

>I find myself wondering what you would have done with that client if they had
>requested a speedup on that recalculation and you had NOT developed VFX?
>( I am assuming they were prepared to pay for the work)

We could have spent several months writing code routines for the
recalculation engine (effectively a spreadsheet), but it would still
not have been as fast as the VFX solution. Sometimes, the correct
solution to a coding problem is to rewrite the compiler. The MPE
benchmark suite is on the MPE website. We gave up benchmarking
ITC Forths some years ago when VFX was more than 12 times faster
than the popular ITC Forth.

The good business decisions we made in developing VFX were to decide
to do it properly and get funding for the development.

>Could you have hand coded a few time critical/inner loop routines and
> achieved those results?

Probably not. Remember also that assembler code is harder to write and
to maintain, and affects both cost and time to market. As I keep
pointing out, I haven't written an interrupt routine in assembler for
an ARM for about 15 years ... except to prove that I can still do it.
A good code generator is an enabling technology.

m...@iae.nl

unread,

Nov 18, 2016, 1:58:06 PM11/18/16

to

In the past I posted on pinholed eForth and mxForth. When developing these I enabled specific optimizations with options so that I could test all the combinations. These postings backup Stephen's numbers. They also backup the observation that at some point the Forth does not get faster anymore and one needs to overhaul the code generator from scratch.

When I have some time again I will give iForth speculative threads and rewrite the code generator for AVX2. But first there will be SPICE written in Forth :-)

-marcel

HAA

unread,

Nov 18, 2016, 7:30:01 PM11/18/16

to

Stephen Pelc wrote:
> On Thu, 17 Nov 2016 15:58:33 -0800, Paul Rubin
> <no.e...@nospam.invalid> wrote:
>
> >I don't know about Foxaudio, but I'd say "very well" means the ITC
> >performance is enough to satisfy users' real-world requirements. Most
> >of my stuff these days is written in Python and it's fast enough for my
> >purposes, and Gforth-itc is probably >10x faster than Python. I've
> >never written a Forth program that wasn't a benchmark, that used enough
> >computation time for me to notice.
>
> One of our clients has a recalculation that used to take 15 or so
> seconds on our DTC ProForth for Windows. The boss excitedly took
> me to his office to show that the conversion to VFX Forth took
> about 1.5 seconds. To him, that was important.
>
> These days, there are plenty of algorithms that do not follow the
> "90% of the time in 10% of the code" model.
>
> Stephen

No doubt your client could also demonstrate that VFX is faster than
SwiftForth. Yet Forth Inc still exists and presumably gets work.
The model which states 'fastest is best' appears not to follow.
What happens when 1.5 seconds is not good enough? Discard Forth
for C? That's not unreasonable either. CPU, clock, language, facilities,
I rank greater than compiler technology. Get those wrong and it won't
matter whether it's DTC or NCC.

Elizabeth D. Rather

unread,

Nov 18, 2016, 11:18:29 PM11/18/16

to

SwiftForth is substantially faster than ITC Forths. Its compiler isn't
as complex as VFX, but it delivers satisfactory performance. Most of our
users say they prefer SwiftForth's user interface, particularly as a
host for our interactive cross compilers.

Cheers,
Elizabeth

Paul Rubin

unread,

Nov 18, 2016, 11:27:12 PM11/18/16

to

ste...@mpeforth.com (Stephen Pelc) writes:
> One of our clients has a recalculation that used to take 15 or so
> seconds on our DTC ProForth for Windows. The boss excitedly took
> me to his office to show that the conversion to VFX Forth took
> about 1.5 seconds. To him, that was important.

Of course this happens sometimes. In the programs that most people I
know work on, it doesn't happen often: the resource constraints are
rarely computational.

In the application you mentioned, if it's spreadsheet-like, I wonder
whether algorithmic improvements (like finding the dependency graph
between formulas and updating only the necessary cells after a change)
might have helped more than better code generation. OTOH you may
already have been doing that.

Paul Rubin

unread,

Nov 19, 2016, 3:31:15 AM11/19/16

to

"HAA" <som...@microsoft.com> writes:
> What happens when 1.5 seconds is not good enough? Discard Forth for
> C? That's not unreasonable either. CPU, clock, language, facilities,
> I rank greater than compiler technology. Get those wrong and it won't
> matter whether it's DTC or NCC.

I have to wonder what kind of hardware Stephen's client was running DTC
Forth on. If it's old enough then a 10x or more speedup might be had by
just upgrading it.

VFX code output should run at comparable speed to the output of a decent
C compiler. A fancy optimizing C compiler might get a little more
speedup, but the next thing to pursue is more likely parallelism, GPU's,
etc. That assumes that the algorithm is already decent, the existing
host PC is reasonably fast by today's standards, etc.

Stephen Pelc

unread,

Nov 19, 2016, 12:42:25 PM11/19/16

to

On Sat, 19 Nov 2016 11:29:47 +1100, "HAA" <som...@microsoft.com>
wrote:

>No doubt your client could also demonstrate that VFX is faster than
>SwiftForth. Yet Forth Inc still exists and presumably gets work.
>The model which states 'fastest is best' appears not to follow.
>What happens when 1.5 seconds is not good enough? Discard Forth
>for C? That's not unreasonable either. CPU, clock, language, facilities,
>I rank greater than compiler technology. Get those wrong and it won't
>matter whether it's DTC or NCC.

I was asked a question about performance and I answered it. If
you want to compare performance between two Forth systems, try
downloading the MPE benchmark suite ... and using it.

That performance is just one factor in tool chain selection
is obvious. Another is the GUI. Another is the level of library
support. And so on, and so on.

It is also a fallacy to assume that the commercial Forth
vendors are in competition with each other. We are very rarely
in competition for the same client. Our competition is the
toolchains for other languages.

Anton Ertl

unread,

Nov 20, 2016, 11:50:19 AM11/20/16

to

Paul Rubin <no.e...@nospam.invalid> writes:
>VFX code output should run at comparable speed to the output of a decent
>C compiler.

Quite doubtful, because a decent C compiler performs "global" (i.e.,
intraprocedural, but across basic block boundaries) register
allocation, while VFX performs only local register allocation (and
only for data stack items AFAICS).

But let's see. I'll use the benchmarks in
<http://www.complang.tuwien.ac.at/forth/bench.zip>, because they come
in both Forth and C versions (plus another C version produced by
compiling the Forth version with forth2c):

As C compiler I use gcc-3.4.0 -m32 -O (using gcc-4.9 for invoking the
linker), the oldest gcc I could get to run without much ado, and the
lowest optimization option that includes register allocation.

Note that the startup overhead including compilation of the benchmark
is 12M-13M cycles and 9M-10M instructions on VFX; you can subtract
that from the VFX numbers in the following, but it does not change
much.

cycles on Skylake (Core i7-6700K):
fib sieve bubble-sort matrix-mult
103,918,294 90,766,932 130,074,697 37,612,032 gcc-3.4.0
114,757,879 208,334,551 250,370,641 81,795,422 vfxlin
101,972,847 120,005,096 117,031,810 27,122,738 forth2c+gcc-3.4.0

instruction counts (might provide insight into the cycle numbers):
fib sieve bubble-sort matrix-mult
277,239,469 158,698,839 136,036,718 91,708,093 gcc-3.4.0
193,834,212 190,441,316 313,932,162 173,199,451 vfxlin
304,920,087 272,868,329 144,091,955 107,261,137 forth2c+gcc-3.4.0

Discussion:

VFX does nicely on fib, with only the startup+compile overhead as
difference, but is significantly slower than gcc and forth2c+gcc on
the other benchmarks.

Forth2c is relatively close to the manually written C code in cycles,
although it has some overhead in instructions; the speedup over
c-manual in matrix-mult may by due to inner-product() returning its
result by writing to a passed pointer in c-manual, while the forth
version passes the result on the stack (and c-generated therefore
passes it in a register). The forth2c speedup over c-manual for
bubble-sort appears due to a freak slowdown for c-manual/bubble-sort
when enabling -fomit-frame-pointer that does not affect
c-generated/bubble-sort.

Conclusion: global register allocation pays off.

BTW, forth2c is neither commercial nor proprietary. You can download
it at <http://www.complang.tuwien.ac.at/forth/forth2c.tar.gz>. It's
proof-of-concept software, that's 20 years old, however.

One other BTW is that the C code that compiled fine 20 years ago
needed some massaging to compile today, while the Forth programs just
compiled fine. The C problems mainly have to do with the way that
stan.c (the Stanford benchmarks) are written, but the kind of failing
I saw is pretty C-specific, so a part of the blame is also in the
language design. Those who want to experience the failure can get the
old version at <http://www.complang.tuwien.ac.at/forth/bench.tar.gz>.

The command lines I used are:

#in c-manual:
for i in fib sieve bubble-sort matrix-mult; do gcc-3.4.0 -fomit-frame-pointer -m32 -O -DUNIX -c $i.c; gcc -m32 $i.o -o $i; echo $i; perf stat -B -r100 -e cycles -e instructions $i >/dev/null; done
#in forth:
for i in fib sieve bubble-sort matrix-mult; do echo $i; perf stat -B -r100 -e cycles -e instructions vfxlin "include $i.fs bye" >/dev/null; perf stat -B -r100 -e cycles -e instructions vfxlin "include $i.fs main bye" >/dev/null; done
#in c-generated:
for i in fib sieve bubble-sort matrix-mult; do gcc-3.4.0 -fomit-frame-pointer -m32 -O -DUNIX -c $i.c; gcc -m32 $i.o -o $i; echo $i; perf stat -B -r100 -e cycles -e instructions $i >/dev/null; done

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2016: http://www.euroforth.org/ef16/

Paul Rubin

unread,

Nov 21, 2016, 2:16:46 AM11/21/16

to

an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
> VFX does nicely on fib, with only the startup+compile overhead as
> difference, but is significantly slower than gcc and forth2c+gcc on
> the other benchmarks.

Thanks, that's more of a difference than I expected. Can you try -O0
and -Os ?

> BTW, forth2c is neither commercial nor proprietary.

I may have mentioned it before, but here's another backend that might
interest you: http://c9x.me/compile/

hughag...@gmail.com

unread,

Nov 21, 2016, 5:26:19 PM11/21/16

to

On Saturday, November 19, 2016 at 10:42:25 AM UTC-7, Stephen Pelc wrote:
> It is also a fallacy to assume that the commercial Forth
> vendors are in competition with each other. We are very rarely
> in competition for the same client. Our competition is the
> toolchains for other languages.

Your goal in supporting ANS-Forth seems to be to make the Forth community look stupid. Good job! Having the parameters backwards in LOCALS| was, by itself, the killing stroke for the entire Forth community.

You are not in competition with Forth Inc.. I think you rely on them to do your sales, then when they find a customer they outsource the actual programming to you, so you both make money.

You are in competition with Forth programmers such as myself. The whole purpose of ANS-Forth is to allow you to tell everybody in the Forth community that they are non-standard unless they adhere to the idiotic restrictions of Elizabeth Rather's toy language.

You don't adhere to ANS-Forth though. You routinely abandon ANS-Forth compatibility, even when doing so is not necessary. For example, in regard to SYONYM and ALIAS, you told me (quite condescendingly) to RTFM. You meant the VFX manual where you provide a SYNONYM that is VFX-specific and would get me trapped in vendor lock-in. I wrote SYNONYM and ALIAS in ANS-Forth though --- it was easy --- it did require me to have the disambiguifiers already installed.

Here is the code, so RTFM yourself:

: :synonym { xt flg str wid -- } \ FLG is 1 for immediate and -1 for non-immediate
str wid :name
flg 1 = if xt lit, execute, ;, immediate exit then
flg -1 = if xt compile, ;, exit then
true abort" *** :SYNONYM given an invalid xt ***" ;

: :synonym-fast-check ( -- )
state @ 0= abort" *** a word created by :SYNONYM-FAST can't be used in interpretive mode ***" ;

: :synonym-fast { xt flg str wid -- } \ FLG is 1 for immediate and -1 for non-immediate
str wid :name
flg 1 = if xt lit, execute, ;, immediate exit then
flg -1 = if postpone :synonym-fast-check
xt lit, postpone compile, ;, immediate exit then
true abort" *** :SYNONYM-FAST given an invalid xt ***" ;

: 'find ( -- xt flg ) \ stream: name \ FLG is 1 for immediate and -1 for non-immediate
bl word find dup 0= abort" *** 'FIND couldn't find the word ***" ;

: synonym { wid | new -- } \ stream: new-name old-name \ the new word is compiled into the WID word-list
bl word hstr to new
'find new wid :synonym
new dealloc ;

: synonym-fast { wid | new -- } \ stream: new-name old-name \ the new word is compiled into the WID word-list
bl parse <hstr> to new
'find new wid :synonym-fast
new dealloc ;

\ :SYNONYM-FAST generates faster executing code than :SYNONYM but the words can't be used in interpretive mode.
\ This may not be necessary with a good optimizing compiler, but I'm not aware of any at this time.

1234512345 constant alias-id \ an arbitrary number used to identify alias'd definitions

0
w field alias.xt \ this should be the first field so the DOES> portion of ALIAS will be fast (no addition needed)
w field alias.adr \ the address of the body, used to identify alias'd definitions
w field alias.id \ the constant ALIAS-ID, used to identify alias'd definitions
constant alias-struct

: alias ( wid -- ) \ stream: new-name old-name \ the new word is compiled into the WID word-list
get-current swap set-current create set-current
here >r alias-struct allot
'find 1 = if immediate then r@ alias.xt !
r@ r@ alias.adr !
alias-id r@ alias.id !
rdrop
does>
alias.xt @ execute ;

\ ALIAS does the same thing as SYNONYM but has the advantage that >BODY will work on it.
\ It is slower executing though (especially under SwiftForth in which CREATE DOES> words are very inefficient).

: >body ( xt -- adr )
>body >r
r@ alias.adr @ r@ = if r@ alias.id @ alias-id = if \ is this an alias'd word?
r> alias.xt @ >body exit then then \ return the body of the original word
r> ; \ return the body of this word

\ Note that >BODY still has an undefined result if used on a word that wasn't defined with CREATE or isn't an alias of such a word.

hughag...@gmail.com

unread,

Nov 21, 2016, 5:35:25 PM11/21/16

to

On Saturday, November 19, 2016 at 10:42:25 AM UTC-7, Stephen Pelc wrote:

> That performance is just one factor in tool chain selection
> is obvious. Another is the GUI. Another is the level of library
> support. And so on, and so on.

This is what Elizabeth Rather says:
--------------------------------------------------------------------------
...in Forth it's so easy to build data structures that are exactly right for
the particular application at hand that worrying about what pre-built
structures you have and how to use them is just not worth the bother.
--------------------------------------------------------------------------

What "level of library support" does ANS-Forth have? Zilch! Elizabeth Rather has spent her entire career railing against code-libraries. There are no code libraries.

> It is also a fallacy to assume that the commercial Forth
> vendors are in competition with each other. We are very rarely
> in competition for the same client. Our competition is the
> toolchains for other languages.

What "toolchain" does ANS-Forth have? You can't have toolchains without having code-libraries.

I think you see me and my novice-package as your competition --- because this package is provided to allow common Forth programmers to write application programs --- this is what you and Elizabeth Rather are primarily opposed to.

HAA

unread,

Nov 22, 2016, 1:43:38 AM11/22/16

to

Stephen Pelc wrote:
> On Sat, 19 Nov 2016 11:29:47 +1100, "HAA" <som...@microsoft.com>
> wrote:
>
> >No doubt your client could also demonstrate that VFX is faster than
> >SwiftForth. Yet Forth Inc still exists and presumably gets work.
> >The model which states 'fastest is best' appears not to follow.
> >What happens when 1.5 seconds is not good enough? Discard Forth
> >for C? That's not unreasonable either. CPU, clock, language, facilities,
> >I rank greater than compiler technology. Get those wrong and it won't
> >matter whether it's DTC or NCC.
>
> I was asked a question about performance and I answered it.

You responded to a comment in which the poster considered ITC
performance good enough with an 'apples and oranges' speed
comparison of DTC vs. NCC.

> If
> you want to compare performance between two Forth systems, try
> downloading the MPE benchmark suite ... and using it.

No thanks. Your prospective clients may wish to do so.

Anton Ertl

unread,

Nov 22, 2016, 5:22:41 AM11/22/16

to

Paul Rubin <no.e...@nospam.invalid> writes:
>an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>> VFX does nicely on fib, with only the startup+compile overhead as
>> difference, but is significantly slower than gcc and forth2c+gcc on
>> the other benchmarks.
>
>Thanks, that's more of a difference than I expected. Can you try -O0
>and -Os ?

cycles on Skylake (Core i7-6700K):
fib sieve bubble-sort matrix-mult

114,757,879 208,334,551 250,370,641 81,795,422 vfxlin

103,918,294 90,766,932 130,074,697 37,612,032 gcc-3.4.0 -O
110,917,296 142,023,925 127,515,900 40,082,593 gcc-3.4.0 -Os
117,783,304 282,271,421 230,553,817 72,302,347 gcc-3.4.0 -O0
101,972,847 120,005,096 117,031,810 27,122,738 forth2c+gcc-3.4.0 -O
105,767,852 123,021,305 121,547,076 27,284,693 forth2c+gcc-3.4.0 -Os
635,649,325 646,799,258 946,796,814 563,021,402 forth2c+gcc-3.4.0 -O0

instruction counts (might provide insight into the cycle numbers):
fib sieve bubble-sort matrix-mult

193,834,212 190,441,316 313,932,162 173,199,451 vfxlin

277,239,469 158,698,839 136,036,718 91,708,093 gcc-3.4.0 -O
258,784,624 190,090,078 172,001,838 75,431,668 gcc-3.4.0 -Os
268,012,628 310,659,184 290,144,319 188,642,125 gcc-3.4.0 -O0
304,920,087 272,868,329 144,091,955 107,261,137 forth2c+gcc-3.4.0 -O
286,467,831 244,777,496 162,397,933 99,100,847 forth2c+gcc-3.4.0 -Os
1,006,307,340 1,005,039,966 1,202,060,005 887,457,536 forth2c+gcc-3.4.0 -O0

I am surprised by the sometimes large differences between -O and -Os
for the manually written code (I expected small difference like those
we see for the forth2c-generated code). For -O0, which does not
allocate variables into registers, I am surprised that it does not
suffer more against VFX. OTOH, forth2c and gcc -O0 is the perfect
mismatch, leading to a factor >20 slowdown for matrix-mult; it is a
mismatch, because forth2c was written to rely on register allocation
and copy propagation, and gcc -O0 does neither.

>I may have mentioned it before, but here's another backend that might
>interest you: http://c9x.me/compile/

Thanks. Looks like it suffers from the same ABI-calls-only
disadvantage that LLVM suffers from, and it only has one target, but
it's still under development. Still, an interesting project; maybe we
get a useful C compiler from it.

Gerry Jackson

unread,

Nov 22, 2016, 5:38:22 AM11/22/16

to

Because it compiles a colon definition for the synonym I don't think
your definition of SYNONYM works in all cases. For example (ignoring the
wid which is non-standard anyway):

123 value foo
synonym bar foo
456 to bar \ Fails as TO can't save a value to a colon definition

Similarly

defer x
:noname ... ; is x
synonym y x
:noname ... ; is y \ Would fail

Probably a minor restriction.

--
Gerry

john

unread,

Nov 22, 2016, 7:53:35 AM11/22/16

to

In article <o10pen$p8s$1...@gioia.aioe.org>, som...@microsoft.com says...

Perhaps if you all considered application efficiency rather than speed you could
all stop waffling. VFX is by far the best commercial Forth I have ever seen
and the only one I would consider for any commercial forth project.
I'l say that again - the only one.

Is it better than using another language? - well that depends on factors
totally unrelated to the language itself.
I'll say that again as well - totally unrelated.

Of course wringing the best out of a processor is important but it's pretty
pointless if there's no support, no backup no documentation no toolchain.

I've just been looking at jims selection for our project and it saddens me this
was the best he could find for all manner of reasons. Not because it's bad
but just because it's so promising and yet unfinished while people are
poncing about with benchnmarks that are nothing of the sort anyway
and it seems still worrying about whether to use upper or lower case.
(I was convinced jim was extracting the P when he told me this)

Are stack machines inherently faster?
The first thing you all need to answer is "faster at what?"
and does the "what" you come up with have any validity in the real world?

--

hughag...@gmail.com

unread,

Nov 24, 2016, 3:46:52 PM11/24/16

to

SYNONYM is generates fast code, but it doesn't work with >BODY.

ALIAS compiles a CREATE DOES> definition so it generates slow code. I works with >BODY because I wrote a new >BODY for it --- supporting >BODY was the whole point of ALIAS.

The problem with supporting VALUE in a similar way, is that VALUE isn't defined to necessarily use CREATE DOES> internally. In SwiftForth VALUE is just a CREATE DOES> definer, so this does work:

12345 value vvv ok
vvv . 12345 ok
' vvv >body @ . 12345 ok
get-current alias new-vvv vvv ok
new-vvv . 12345 ok
' new-vvv >body @ . 12345 ok

This doesn't work in VFX though because VALUE is not a CREATE DOES> definer. VFX is ANS-Forth compliant in this case because the ANS-Forth document doesn't require VALUE to use CREATE DOES> and it is okay for VFX to do something else internally.

Anyway --- VALUE and TO are abominations --- I never use them (except for locals which have to use TO because the abomination is required).

Anyway, I hate ANS-Forth and I'm never going to become an ANS-Forth programmer. I just wrote SYNONYM and ALIAS as a joke because Stephen Pelc had said that this was impossible in ANS-Forth --- so I wrote it in ANS-Forth and got >BODY to work, although as you said: VALUE still doesn't work.

BTW: I found it rather humorous that you said: "ignoring the wid which is non-standard anyway." What??? SYNONYM and ALIAS are not in the ANS-Forth Standard. What Standard is my wid feature being non-standard to???

Cecil Bayona

unread,

Nov 24, 2016, 4:15:57 PM11/24/16

to

On 11/24/2016 2:46 PM, hughag...@gmail.com wrote:

>
> SYNONYM is generates fast code, but it doesn't work with >BODY.
>
> ALIAS compiles a CREATE DOES> definition so it generates slow code. I works with >BODY because I wrote a new >BODY for it --- supporting >BODY was the whole point of ALIAS.
>
> The problem with supporting VALUE in a similar way, is that VALUE isn't defined to necessarily use CREATE DOES> internally. In SwiftForth VALUE is just a CREATE DOES> definer, so this does work:
>
> 12345 value vvv ok
> vvv . 12345 ok
> ' vvv >body @ . 12345 ok
> get-current alias new-vvv vvv ok
> new-vvv . 12345 ok
> ' new-vvv >body @ . 12345 ok
>
> This doesn't work in VFX though because VALUE is not a CREATE DOES> definer. VFX is ANS-Forth compliant in this case because the ANS-Forth document doesn't require VALUE to use CREATE DOES> and it is okay for VFX to do something else internally.
>
> Anyway --- VALUE and TO are abominations --- I never use them (except for locals which have to use TO because the abomination is required).
>
> Anyway, I hate ANS-Forth and I'm never going to become an ANS-Forth programmer. I just wrote SYNONYM and ALIAS as a joke because Stephen Pelc had said that this was impossible in ANS-Forth --- so I wrote it in ANS-Forth and got >BODY to work, although as you said: VALUE still doesn't work.
>
> BTW: I found it rather humorous that you said: "ignoring the wid which is non-standard anyway." What??? SYNONYM and ALIAS are not in the ANS-Forth Standard. What Standard is my wid feature being non-standard to???
>

Pardon my ignorance but in what way does ANSI Forth create slow code by
using CREATE DOES ?

With older Forth that I'm used to CREATE DOES creates as efficient code
as one wants, no overhead of any kind, only the code you use so it's up
to the programmer to create an efficient definition.

--
Cecil - k5nwa

hughag...@gmail.com

unread,

Nov 24, 2016, 4:30:38 PM11/24/16

to

On Tuesday, November 22, 2016 at 5:53:35 AM UTC-7, john wrote:
> Perhaps if you all considered application efficiency rather than speed you could
> all stop waffling. VFX is by far the best commercial Forth I have ever seen
> and the only one I would consider for any commercial forth project.
> I'l say that again - the only one.
>
> Is it better than using another language? - well that depends on factors
> totally unrelated to the language itself.
> I'll say that again as well - totally unrelated.
>
> Of course wringing the best out of a processor is important but it's pretty
> pointless if there's no support, no backup no documentation no toolchain.

I agree that VFX is very impressive. I think that Stephen Pelc is a very talented programmer.

I also think that it is a waste of talent to write an ANS-Forth compiler. QBASIC was better than PolyForth in every way. What qualification did Elizabeth Rather have for writing the ANS-Forth Standard and imposing it on the entire Forth community? None!

Way back in 1984 when I first got into Forth, I could see that Forth-83 had a lot of problems. I was told that a new standard was coming and that it would fix these problems. What I didn't realize is that Charles Moore had been kicked out of Forth Inc. in 1982 --- the Forth-83 standard was written by Forth Inc. without his assistance and that was why it was no good --- in 1994 the ANS-Forth standard was written by Forth Inc. without his assistance and that is why it was no good either.

In 1984 I did not imagine a future in which I myself would have to write a Forth standard in the year 2016. I really expected some leadership in the Forth community, but none was ever forth-coming (so to speak). What is sad about this is that most of the ideas in Straight Forth are obvious. Of course, I think there are some clever parts (I would think that; I wrote it!) --- honestly though, most of it is very straight-forward (hence the name: Straight Forth) --- anybody with a modicum of Forth application-programming experience would have thought up most of this easily. The two built-in data-structures are chains and arrays --- what else would anybody consider? --- this is an obvious design!

> Are stack machines inherently faster?
> The first thing you all need to answer is "faster at what?"
> and does the "what" you come up with have any validity in the real world?

When Testra started the MiniForth project, they were using an 80c320 (LMI Forth code with a heavy dose of assembly-language) in their motion-control board for the laser-etcher --- the competition was using an MC68000 (C code with some assembly-language) in their motion-control board for the laser-etcher --- Testra's board was less expensive, but it was not competitive in regard to speed.

The MiniForth board was both less-expensive than the MC68000 board (about the same cost as the 80c320 board) and significantly faster than the MC68000. The competition just abandoned the laser-etcher market because they were no longer competitive in any way. Does that sound like "validity" to you?

Writing MFX was something that I was proud of. ANS-Forth killed Forth though --- nobody ever took Forth seriously after ANS-Forth became the Standard.

Brad Eckert

unread,

Nov 24, 2016, 5:16:20 PM11/24/16

to

Thanks for the benchmarks. It's interesting that a good analytical Forth compiler comes to under 2:1 compared to GCC. It seems this is a small difference that could be swamped by other factors such as better algorithm polishing or explicit memory layout to minimize cache misses, both strengths of Forth.

As for target architectures, stack machines seem to have about the same semantic density as register machines. A dense stack machine language is about as good as a dense register machine language (like Thumb or some CISCs). The difference is that a Forth compiler is much simpler for the former.

hughag...@gmail.com

unread,

Nov 24, 2016, 5:30:41 PM11/24/16

to

On Thursday, November 24, 2016 at 2:15:57 PM UTC-7, Cecil - k5nwa wrote:
> Pardon my ignorance but in what way does ANSI Forth create slow code by
> using CREATE DOES ?
>
> With older Forth that I'm used to CREATE DOES creates as efficient code
> as one wants, no overhead of any kind, only the code you use so it's up
> to the programmer to create an efficient definition.

CREATE DOES> is inefficient because, in most cases, you are storing constant data in the >BODY struct (the data is known at compile-time and it never changes). At run-time you have to fetch this data out of memory as if it were mutable. It would be a lot more efficient if you compiled code with this data being a literal in the code.

I brought this up a long time ago (2009) and Stephen Pelc argued vigorously that CREATE DOES> is efficient. Weirdly though, he didn't provide source-code using CREATE DOES> that did the same thing as my code that didn't use CREATE DOES>. Finally, he did provide source-code --- it was VFX-specific and not ANS-Forth compliant! --- VFX has some non-standard way to declare data in the >BODY struct as being immutable. By comparison, all of my code in the novice-package is ANS-Forth compliant. It seems strange to me that I write ANS-Forth compliant code but I'm not considered to be an ANS-Forth programmer, whereas Stephen Pelc writes code that is not ANS-Forth compliant but he is considered to be an expert on ANS-Forth (that is why Elizabeth Rather appointed him to the Forth-200x committee).

CREATE DOES> is typically implemented with the code compiled by CREATE doing a CALL to the code that precedes the DOES> code and falls through into it. The CALL pushes the address after the CALL onto the return-stack. This is not code however; this is the address of the >BODY struct where data is held. The code that precedes the DOES> code pops this address off the return-stack and pushes it onto the data-stack so the DOES> code will have it. Both VFX and SwiftForth are implemented like this. This technique goes back to the 1970s when Charles Moore thought it up. On a modern processor however, this is inefficient. The x86 has RET prediction (up to 16 levels of function calls get predicted correctly), but if you pop an address off the return-stack and use it as data, the whole RET prediction thing fails and all of that data is lost. The x86 is no longer able to predict RET destinations! Failure to predict branches is a big speed-killer on the x86 --- if the x86 predicts branches correctly then it is able to compile trace-code ahead of time and have it ready to go --- if the x86 mispredicts a branch though, then the whole trace-cache has to be abandoned and the machine-code at the actual destination has to be compiled into trace-code before it can execute (this is not being done in parallel with the execution of previous trace-code and hence it causes a huge delay in execution).

Anyway, getting back to the first point I made about CREATE DOES> treating immutable data as mutable, here is quote from the ANS-Forth "leading expert:"

On Sunday, August 9, 2015 at 7:30:10 PM UTC-7, Elizabeth D. Rather wrote:
> Here's [an array definer] presented in my Forth Application Techniques that I think is
> far more useful than the FSL version:
>
> : ARRAY ( n -- ) CREATE DUP , CELLS ALLOT
> DOES> ( n -- a) SWAP OVER @ OVER < OVER
> 1 < OR ABORT" Out of Range" CELLS + ;

Note that this doesn't work properly, because it is using 1 as the base index rather than 0 which is more common. Most likely, Elizabeth Rather wrote it this way because COBOL uses 1 as the base index for arrays and COBOL is the only programming language that she knows. Another related problem is that her range-checking is done with two signed comparisons --- it should be done with one unsigned comparison.

Here I have fixed her code for her:

: array ( n -- )
create dup , cells allot
does> ( index base-adr -- )
2dup @ u< 0= abort" *** ARRAY out of range ***"
swap 1+ cells + ;

None of these CREATE DOES> definers are any good though. The limit value that is comma'd into the >BODY struct is known at compile-time and it never changes --- it should be a literal --- instead it is treated as if it were mutable.

In my novice-package I have 1ARRAY that is for defining one-dimensional arrays. It is general-purpose in that you can use any size of element, not just cell-sized elements. Also, you can turn off the range-checking if you don't need it. Here is a demonstration of 1ARRAY in SwiftForth (with error-checking turned on) --- notice the instruction A # EBX MOV in which the limit value is treated as being immutable.

10 w 1array mmm ok
0 mmm . 4782256 ok
9 mmm . 4782292 ok
10 mmm . mmm *** array index 1 out of bounds ***
-1 mmm . mmm *** array index 1 out of bounds ***
see mmm
48F90F 4 # EBP SUB 83ED04
48F912 EBX 0 [EBP] MOV 895D00
48F915 A # EBX MOV BB0A000000
48F91A 48BA2F ( check1 ) CALL E810C1FFFF
48F91F EBX 2 # SHL C1E302
48F922 48F8B0 # EBX ADD 81C3B0F84800
48F928 RET C3 ok
see check1
48BA2F 0 [EBP] EBX CMP 3B5D00
48BA32 0 # EBX MOV BB00000000
48BA37 48BA3A JBE 7601
48BA39 EBX DEC 4B
48BA3A -1 # EBX XOR 83F3FF
48BA3D 4076BF ( (S") ) CALL E87DBCF7FF
48BA42 "*** array index 1 out of bounds ***"
48BA67 407E9F ( ?ABORT ) JMP E933C4F7FF ok

Here it is with range-checking turned off:

false to bounds-check? ok
10 w 1array nnn ok
0 nnn . 4782572 ok
9 nnn . 4782608 ok
see nnn
48FA2F EBX 2 # SHL C1E302
48FA32 48F9EC # EBX ADD 81C3ECF94800
48FA38 RET C3 ok

Note that 1ARRAY 2ARRAY etc. are obsolete now though --- I have ARY that is more convenient --- ARY also produces fast code though.

Gerry Jackson

unread,

Nov 24, 2016, 5:44:08 PM11/24/16

to

On 24/11/2016 20:46, hughag...@gmail.com wrote:
> BTW: I found it rather humorous that you said: "ignoring the wid which is non-standard anyway." What??? SYNONYM and ALIAS are not in the ANS-Forth Standard. What Standard is my wid feature being non-standard to???

Forth 2012, see http://forth-standard.org/standard/tools/SYNONYM

--
Gerry

Anton Ertl

unread,

Nov 24, 2016, 6:16:10 PM11/24/16

to

Gerry Jackson <ge...@jackson9000.fsnet.co.uk> writes:
>Because it compiles a colon definition for the synonym I don't think
>your definition of SYNONYM works in all cases. For example (ignoring the
>wid which is non-standard anyway):
>
>123 value foo
>synonym bar foo
>456 to bar \ Fails as TO can't save a value to a colon definition
>
>Similarly
>
>defer x
>:noname ... ; is x
>synonym y x
>:noname ... ; is y \ Would fail
>
>Probably a minor restriction.

In Forth-2012 SYNONYM copies only the interpretation and compilation
semantics, not the TO <name> run-time semantics, and not the property
"defined by DEFER", nor (not relevant above) "defined by CREATE"; so
the examples above are not required to work in Forth-2012. Maybe we
will change that in the next standard.

hughag...@gmail.com

unread,

Nov 24, 2016, 7:38:04 PM11/24/16

to

On Thursday, November 24, 2016 at 4:16:10 PM UTC-7, Anton Ertl wrote:
> Gerry Jackson <ge...@jackson9000.fsnet.co.uk> writes:
> >Because it compiles a colon definition for the synonym I don't think
> >your definition of SYNONYM works in all cases. For example (ignoring the
> >wid which is non-standard anyway):
> >
> >123 value foo
> >synonym bar foo
> >456 to bar \ Fails as TO can't save a value to a colon definition
> >
> >Similarly
> >
> >defer x
> >:noname ... ; is x
> >synonym y x
> >:noname ... ; is y \ Would fail
> >
> >Probably a minor restriction.
>
> In Forth-2012 SYNONYM copies only the interpretation and compilation
> semantics, not the TO <name> run-time semantics, and not the property
> "defined by DEFER", nor (not relevant above) "defined by CREATE"; so
> the examples above are not required to work in Forth-2012. Maybe we
> will change that in the next standard.

Maybe you should stop using the word "standard" --- this is an insult to the entire Forth community --- just because Elizabeth Rather appointed you to be the chairperson of the Forth-200x committee doesn't mean that you have any authority over Forth programmers such as myself.

Elizabeth Rather's toy language was a failure at everything. The fact that the parameters for LOCALS| were backwards was ridiculous; that alone makes ANS-Forth a joke. Then you wrote {: to define locals. You had to change the name from { despite the fact that { (the John Hopkin's format) had been in use since prior to ANS-Forth. This was because SwiftForth uses { for multi-line comments. You also screwed up the definition of {: because you failed to zero out the locals to the right of the | --- you said that these values were "undefined," meaning that they could be anything. When I pointed out the obvious problem with this (programs don't necessarily do the same thing from one compilation to the next, or even one run to the next), you said that your code was correct because it was written to spec. What spec? Obviously, you are just Leon Wagner's code monkey --- he writes the spec --- you implement the code without concern for obvious blunders in the spec.

I will never become an ANS-Forth programmer --- not Forth-200x either --- these are both just marketing gimmicks from Forth Inc. whose sole purpose is to convince the world that Forth Inc. owns the Forth language and that the entire Forth community is dependent upon Forth Inc. for leadership.

Gerry Jackson

unread,

Nov 25, 2016, 2:28:40 AM11/25/16

to

On 24/11/2016 23:11, Anton Ertl wrote:
> Gerry Jackson <ge...@jackson9000.fsnet.co.uk> writes:
>> Because it compiles a colon definition for the synonym I don't think
>> your definition of SYNONYM works in all cases. For example (ignoring the
>> wid which is non-standard anyway):
>>
>> 123 value foo
>> synonym bar foo
>> 456 to bar \ Fails as TO can't save a value to a colon definition
>>
>> Similarly
>>
>> defer x
>> :noname ... ; is x
>> synonym y x
>> :noname ... ; is y \ Would fail
>>
>> Probably a minor restriction.
>
> In Forth-2012 SYNONYM copies only the interpretation and compilation
> semantics, not the TO <name> run-time semantics, and not the property
> "defined by DEFER", nor (not relevant above) "defined by CREATE"; so
> the examples above are not required to work in Forth-2012. Maybe we
> will change that in the next standard.

Be that as it may but my comments to Hugh still stand as his SYNONYM is
incomplete as is the Forth 2012 standard. As a user I would expect a
word defined as a synonym to behave and be treated exactly as the parent
word, i.e. to be able to use it (or not use it because of restrictions
on the parent) in all situations that the parent word is used. So yes I
would support changing the standard. The alternative is yet another
ambiguous condition and I think we're all agreed there are too many of
them already.

I see that GForth, VFX Forth, Win32 Forth and my system all handle TO
and IS on a synonym correctly (Swiftforth hasn't implemented SYNONYM) so
there shouldn't be too much opposition.

--
Gerry

foxaudio...@gmail.com

unread,

Nov 25, 2016, 7:24:25 AM11/25/16

to

I heartily agree. There has to be some accommodation for common sense. Forth has enough hurdles for the new user without adding one that is so counter-intuitive.

Ilya Tarasov

unread,

Nov 25, 2016, 8:37:06 AM11/25/16

to

пятница, 25 ноября 2016 г., 1:30:41 UTC+3 пользователь hughag...@gmail.com написал:

> On Thursday, November 24, 2016 at 2:15:57 PM UTC-7, Cecil - k5nwa wrote:
> > Pardon my ignorance but in what way does ANSI Forth create slow code by

Small, but importan question. Why performance takes so much care? In 90-th, performance was a priority for programming, but no longer. Having a variety of CPU from 1.5 GHz Atom to 4 GHz Core-i7, we should keep in mind our program will run on an average-speed CPU and user will not count every frame or every loop iteration per second. Having pure interpreted code is no good, but having x2-x3 slower compiled code is still acceptable if you can highlight other important features of your program/language.

Alex

unread,

Nov 25, 2016, 9:37:41 AM11/25/16

to

On 11/25/2016 12:24, foxaudio...@gmail.com wrote:
> I heartily agree. There has to be some accommodation for common sense. Forth has enough hurdles for the new user without adding one that is so counter-intuitive.
>

Heartily agree with what? You've cut away the entire thread.

--
Alex

Anton Ertl

unread,

Nov 25, 2016, 10:58:37 AM11/25/16

to

Brad Eckert <hwf...@gmail.com> writes:
>Thanks for the benchmarks. It's interesting that a good analytical Forth co=

>mpiler comes to under 2:1 compared to GCC.

Actually, the good analytical Forth compiler achieves around 1:1
compared to GCC. The other analytical Forth compiler (VFX) comes to
under 2:1. The main reasons for VFX's worse performance are that it
does not keep stack items other than TOS in registers across control
flow, and that it does not keep return stack items in registers at
all. E.g., something like

: foo >r r> ;

becomes

FOO
( 080C08A0 53 ) PUSH EBX
( 080C08A1 5B ) POP EBX
( 080C08A2 C3 ) NEXT,

I had not expected that to result in such large differences, and I had
expected VFX to significantly outperform gcc -O0. My guess is that
the reason for the large differences to gcc -O is recurrences through
memory in the inner loop (which means at least 5 cycles per
iteration); and the reason for gcc -O0 not being worse is that having
two or three recurrences is not much worse than having one, as long as
they dont depend on each other: they all wait in parallel.

>It seems this is a small differe=
>nce that could be swamped by other factors such as better algorithm polishi=
>ng or explicit memory layout to minimize cache misses, both strengths of Fo=
>rth.

C is just as strong at these as Forth is, and my impression is that
there is a certain ignorant attitude in the Forth community for such
topics (the C community with its faith in compiler optimization
miracles is getting there, too, however). E.g., 20 years after sharing
of instructions and data was pointed out as a performance problem,
most Forth systems still mix instructions and data in the dictionary,
with a bit of occasional padding as a workaround that helps in some
cases.

Andrew Haley

unread,

Nov 25, 2016, 10:58:42 AM11/25/16

to

Ilya Tarasov <ilya74....@gmail.com> wrote:
> ???????, 25 ?????? 2016 ?., 1:30:41 UTC+3 ???????????? hughag...@gmail.com ???????:

>> On Thursday, November 24, 2016 at 2:15:57 PM UTC-7, Cecil - k5nwa wrote:
>> > Pardon my ignorance but in what way does ANSI Forth create slow code by
>
> Small, but importan question. Why performance takes so much care? In
> 90-th, performance was a priority for programming, but no
> longer. Having a variety of CPU from 1.5 GHz Atom to 4 GHz Core-i7,
> we should keep in mind our program will run on an average-speed CPU
> and user will not count every frame or every loop iteration per
> second. Having pure interpreted code is no good, but having x2-x3
> slower compiled code is still acceptable if you can highlight other
> important features of your program/language.

It's an interesting question. Firstly, Stephen has already answered
part of that by pointing out that he hasn't written an interrupt
handler in assembly code for many, many years. But also, we're
competing in a marketplace for bang for the buck. People have power
budgets, not only in mobile applications but increasingly also in
server farms. If your program runs only once this sort of
consideration perhaps doesn't matter. But if it runs many times, then
it matters a lot more.

Andrew.

Stephen Pelc

unread,

Nov 25, 2016, 11:10:00 AM11/25/16

to

On Thu, 24 Nov 2016 15:15:53 -0600, Cecil Bayona <cba...@cbayona.com>
wrote:

>Pardon my ignorance but in what way does ANSI Forth create slow code by
>using CREATE DOES ?

Hugh's folderol of beliefs and assumptions leads him to assume that
children of CREATE ... DOES> are slow ... because a fetch has to be
performed for some data, e.g. when defining a CONSTANT.

: constant create , does> @ ; \ x -- ; -- x

The fault is not that of CREATE ... DOES> but of the lack of
memory address qualifiers. In other languages, qualifiers such
as const and volatile give the compiler hints or instructions
about memory.

The problem is actually easier for cross compilers which already
describe memory type. However, since cross compilers treat Flash as
writable, when can a read from Flash be considered as reading a
constant?

On desktops, all application memory is RAM, so how to tell Forth
when it is read-only.

If anyone was actually to try such a system in Forth, Hugh would
then complain that the notation was not part of the now obsolete
ANS Forth standard.

Anton Ertl

unread,

Nov 25, 2016, 11:31:27 AM11/25/16

to

Cecil Bayona <cba...@cbayona.com> writes:
>Pardon my ignorance but in what way does ANSI Forth create slow code by
>using CREATE DOES ?

: field1 ( n "name" -- )
create ,
does> ( n1 -- n2 )
@ + ;
here .

5 field1 foo1
here .

: bar1 foo1 @ ;

: field2 ( n "name" -- )
>r : r> postpone literal postpone + postpone ; ;

here .
5 field2 foo2
here .

: bar2 foo2 @ ;

On VFX this produces:

BAR1
( 080C0930 031D10090C08 ) ADD EBX, [080C0910]
( 080C0936 8B1B ) MOV EBX, 0 [EBX]
( 080C0938 C3 ) NEXT,
( 9 bytes, 3 instructions )

BAR2
( 080C09D0 8B5B05 ) MOV EBX, [EBX+05]
( 080C09D3 C3 ) NEXT,
( 4 bytes, 2 instructions )

And that's not a failure of VFX (here it does as well as possible),
it's due to the way CREATE..DOES> is specified: you can change the
data, so the compiler cannot perform the @ earlier.

In Forth-94/2012 you can use ":" to get around the issue, but that
typically costs more memory than a CREATE...DOES>-defined word.

One way around the memory consumption problem is to optimize the
CREATE...DOES>-created word with a SET-OPTIMIZER method:

: field3 ( n "name" -- )
\ you must not change the body of "name"
create ,
[: @ + ;] set-does>
[: >body @ postpone literal postpone + ;] set-optimizer ;

5 field3 foo3

This means that in the normal case, when FOO3 is performed, it does
the usual pushing of the body address and then performs @ +; when foo3
is COMPILE,d, the optimizer routine kicks in and it first uses >BODY @
to get the 5 on the top of stack, then compiles it with LITERAL and
compiles a +; so the fetch overhead is not present in the compiled
code. Some systems use a mechanism like this internally, for defining
various defining words.

Cecil Bayona

unread,

Nov 25, 2016, 1:26:59 PM11/25/16

to

A desktop computer is not the only place where Forth is used, on small
embedded CPUs, performance can be important.

--
Cecil - k5nwa

hughag...@gmail.com

unread,

Nov 25, 2016, 7:51:13 PM11/25/16

to

On Friday, November 25, 2016 at 9:10:00 AM UTC-7, Stephen Pelc wrote:
> On Thu, 24 Nov 2016 15:15:53 -0600, Cecil Bayona <cba...@cbayona.com>
> wrote:
>
> >Pardon my ignorance but in what way does ANSI Forth create slow code by
> >using CREATE DOES ?
>
> Hugh's folderol of beliefs and assumptions leads him to assume that
> children of CREATE ... DOES> are slow ... because a fetch has to be
> performed for some data, e.g. when defining a CONSTANT.
>
> : constant create , does> @ ; \ x -- ; -- x

I had to look up "folderol" --- you are confusing me with big words --- the definition is: "trivial or nonsensical fuss."

That isn't true. Memory access kills the speed.

> The fault is not that of CREATE ... DOES> but of the lack of
> memory address qualifiers. In other languages, qualifiers such
> as const and volatile give the compiler hints or instructions
> about memory.

The only language I'm aware of that has compile-time declarations such as CONST and VOLATILE is C.

> If anyone was actually to try such a system in Forth, Hugh would
> then complain that the notation was not part of the now obsolete
> ANS Forth standard.

Forth-200x is mandated to be 100% compatible with Elizabeth Rather's toy language.

Anyway, another problem with CREATE DOES> is that there is only one action associated with the data. To get other actions you need to use >BODY to obtain the base address, which the DOES> code gets automatically.

Yet another problem is that CREATE builds the struct in the dictionary at compile-time --- you almost always need to build structs in the heap at run-time --- CREATE DOES> is only useful in programs, such as assemblers, when all the data (in the case of an assembler, all the opcodes) are known at compile-time and speed isn't an issue (because all of these constants will be fetched from memory at run-time rather than get compiled into the code as literals).

All in all, CREATE DOES> is a very crude kind of OOP --- I already posted code for an OOP system that does everything that CREATE DOES> can do and a lot more, and does it more efficiently (also, more efficiently than Bernd Paysan's MiniOOF that relies on CREATE DOES> internally) --- CREATE DOES> is a product of the 1970s when Charles Moore first invented Forth.

Albert van der Horst

unread,

Nov 26, 2016, 6:52:39 AM11/26/16

to

In article <58385882....@news.eternal-september.org>,

Stephen Pelc <ste...@mpeforth.com> wrote:
>On Thu, 24 Nov 2016 15:15:53 -0600, Cecil Bayona <cba...@cbayona.com>
>wrote:
>
>>Pardon my ignorance but in what way does ANSI Forth create slow code by
>>using CREATE DOES ?
>
>Hugh's folderol of beliefs and assumptions leads him to assume that
>children of CREATE ... DOES> are slow ... because a fetch has to be
>performed for some data, e.g. when defining a CONSTANT.
>
>: constant create , does> @ ; \ x -- ; -- x
>
>The fault is not that of CREATE ... DOES> but of the lack of
>memory address qualifiers. In other languages, qualifiers such
>as const and volatile give the compiler hints or instructions
>about memory.
>
>The problem is actually easier for cross compilers which already
>describe memory type. However, since cross compilers treat Flash as
>writable, when can a read from Flash be considered as reading a
>constant?
>
>On desktops, all application memory is RAM, so how to tell Forth
>when it is read-only.
>
>If anyone was actually to try such a system in Forth, Hugh would
>then complain that the notation was not part of the now obsolete
>ANS Forth standard.

In my experimental optimiser I have an analyser that finds out all
properties of all words (with some manual help).
This may be different from the usual ways. There is no automatic
optimisation, you have to ask for it. Emphasis is knowledge about
words, not the compilation process.
The stack effect is indicated by coloring of the first and last
char's.

12 CONSTANT AAP
LATEST ID.
AAP ( the first A and last P are purple: stack unknown.)

FILL-ALL ( Add information, based on existing information)
LATEST ID.
AAP ( A's are white, P is aqua : nothing in,one out)

Other flags remember that AAP is a compile time constant.
: test AAP 3 * ;
SEE test
: test
AAP 0000,0003 *
;
test

'test OPTIMISE
See AAP
: test
0000,0024
;

>
>Stephen

Groetjes Albert
--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

Ilya Tarasov

unread,

Nov 26, 2016, 1:39:07 PM11/26/16

to

> It's an interesting question. Firstly, Stephen has already answered
> part of that by pointing out that he hasn't written an interrupt
> handler in assembly code for many, many years. But also, we're
> competing in a marketplace for bang for the buck. People have power
> budgets, not only in mobile applications but increasingly also in
> server farms. If your program runs only once this sort of
> consideration perhaps doesn't matter. But if it runs many times, then
> it matters a lot more.

We should separate application domains and goals. If you plan to use Forth in supercomputing applications or CPU-based calculations (GPU also possible to use here), you need to consider some trade-offs. Forth already has these trade-offs - if we want to keep translator as simple as possible, we must agreed with other features/misfeatures. Accept performance 'as is' while you try to get maximum profit from other features.

An idea about '1% performance increase means thousands years of overall PC runtime worldwide' is a myth. First, in general you have no thousands PCs running your program all time. For Forth it is especially important :) Second, not all programs performs heavy calculations 100% of their runtime. This may be more important for users to have some program now instead of having 10% faster program later.

Performance was a key feature in 90-th. DSL and custom syntax may be more interesting today.

foxaudio...@gmail.com

unread,

Nov 26, 2016, 1:54:10 PM11/26/16

to

I agreed with Gerry's comments.

(didn't realize my new phone posts only my comments.)

Ilya Tarasov

unread,

Nov 26, 2016, 1:57:20 PM11/26/16

to

> A desktop computer is not the only place where Forth is used, on small
> embedded CPUs, performance can be important.

Since embedded is my preferred area, I can't completely agrees with you, by many reasons. From my experience, I never had a kind of success story by claiming Forth as a tool for achieving maximum performance.

1. MCUs often has enough performance for simple tasks. If not, we must review system architecture and use some kind of acceleration. It is very risky to use, say, 100 MHz MCU and rely on programmers skills to keep performance level from scratch to final product. 99% programmers prefer to have a certain slack while developing.
2. In embedded applications, we always can switch to assembler. If not, don't complain on Forth performance.
3. Main and very important feature of Forth is an ability to create your own code as fast as possible. We may ask for money saving (less actual with a lot of freeware development tools) or for our own toolchain with better system integration. This may be summarized as 'we need our own tool and can accept misfeatures of this approach'.

Cecil Bayona

unread,

Nov 26, 2016, 2:40:13 PM11/26/16

to

A guess on my part but I will attribute your answer to language issues,
because you are responding as I was making claims that I did not make.

I did not state that performance by Forth is superior to anything else,
or that performance is necessary in all cases but that it "can be
important" in some cases.

One can't always change the CPU on a project as an example modifying the
firmware on an existing product to add more features, this could be a
product bought from someone else so changing the hardware is too much
trouble and not worth the effort.

Switching to assembler could be avoided if the Forth compiler is
efficient, not everyone is making a project that involves making a lot
of units, in my case most projects are one of a kind not to be duplicated.

--
Cecil - k5nwa

hughag...@gmail.com

unread,

Nov 26, 2016, 6:09:23 PM11/26/16

to

On Friday, November 25, 2016 at 5:51:13 PM UTC-7, hughag...@gmail.com wrote:
> Anyway, another problem with CREATE DOES> is that there is only one action associated with the data. To get other actions you need to use >BODY to obtain the base address, which the DOES> code gets automatically.
>
> Yet another problem is that CREATE builds the struct in the dictionary at compile-time --- you almost always need to build structs in the heap at run-time --- CREATE DOES> is only useful in programs, such as assemblers, when all the data (in the case of an assembler, all the opcodes) are known at compile-time and speed isn't an issue (because all of these constants will be fetched from memory at run-time rather than get compiled into the code as literals).

I think CREATE DOES> was invented solely for supporting arrays. Arrays are the only data-structure described in "Starting Forth."

Arrays are usually defined at compile-time because the size is known at compile-time --- also, arrays can't easily change size anyway, so putting them in the heap is pretty much pointless.

Elizabeth Rather says:
--------------------------------------------------------------------------
Virtually every Forth application needs some kind of array structures. The
reason that some general-purpose one might be "little used" is because it's
so easy to make a version that does *exactly* what the application needs
rather than some generic definition. ... The objective is to master the
toolset and be able to think clearly about exactly what kind of arrays will
be useful in your application and then build exactly that kind.
--------------------------------------------------------------------------

I almost never use arrays. I use LIST.4TH or ASSOCIATION.4TH, both of which allow insertion and deletion of nodes. Both of these use the heap, so they can't be defined with CREATE DOES> which uses ALLOT to provide memory for the data-structure.

I very much doubt that Elizabeth Rather knows how to implement any data-structure other than the array, or even knows that there are other data-structures.

All of this discussion of CREATE DOES> seems like nonsense to me --- this is 1970s technology!

I might put <BUILDS DOES> in Straight Forth --- I might discard it as cruft though (I already discarded DO loops) --- as a practical matter, porting ANS-Forth programs over to Straight Forth isn't very important, because there aren't any ANS-Forth programs.

> All in all, CREATE DOES> is a very crude kind of OOP --- I already posted code for an OOP system that does everything that CREATE DOES> can do and a lot more, and does it more efficiently (also, more efficiently than Bernd Paysan's MiniOOF that relies on CREATE DOES> internally) --- CREATE DOES> is a product of the 1970s when Charles Moore first invented Forth.

This OOP system should work well for almost all programs.

polymorph self

unread,

Nov 27, 2016, 3:22:06 AM11/27/16

to

On Thursday, November 17, 2016 at 3:50:17 PM UTC-5, hughag...@gmail.com wrote:
> On Thursday, November 17, 2016 at 9:08:08 AM UTC-7, foxaudio...@gmail.com wrote:
> > My own experience is that in real world projects, even ITC Forth performed very
> > well, if you allow for a hand coded primitive or two or three...
>
> This works well with a fat Standard, because you have more primitives.
>
> Straight Forth is a fat Standard because it includes two data-structures: the chain and the array. ANS-Forth is a thin Standard because it doesn't include any data-structures, so all data-structures have to be custom written in Forth by the application programmer. This is why I'm expecting the ITC version of Straight Forth to generate code that is about twice the speed of SwiftForth code (SwiftForth is STC but has very little optimization) --- of course, I'm also expecting that 99% of Straight Forth programs will use the built-in data-structures --- if you need a different data-structure, and you write it yourself in Forth, then the fact that the Forth code is ITC is going to cause it to be slow (as compared to hand-written assembly-language).
>
> Also, despite the fact that Straight Forth is a fat Standard and ANS-Forth is a thin Standard, I expect Straight Forth to be smaller than ANS-Forth --- this is because I discard all of the weird and useless cruft that is in ANS-Forth --- for example, DO loops get discarded, and I don't care if they were invented by Charles Moore in the 1970s and everybody expects them to be part of any Forth.

what forth exactly do you use?
do you run any blog or website on forth?

polymorph self

unread,

Nov 27, 2016, 3:22:58 AM11/27/16

to

On Thursday, November 17, 2016 at 6:58:35 PM UTC-5, Paul Rubin wrote:

> m...@iae.nl writes:
> >> My own experience is that in real world projects, even ITC Forth

> >> performed very well, if you allow for a hand coded primitive ...
> > It depends on what you call "very well." I would say that you're
> > describing a popular Forth myth. It is not true.
>
> I don't know about Foxaudio, but I'd say "very well" means the ITC
> performance is enough to satisfy users' real-world requirements. Most
> of my stuff these days is written in Python and it's fast enough for my
> purposes, and Gforth-itc is probably >10x faster than Python. I've
> never written a Forth program that wasn't a benchmark, that used enough
> computation time for me to notice.
>
> Even big problems that need a lot of computation can often be
> parallelized. Computers are cheap and it's often better business to
> just buy more of them, than to increase software development costs or
> decrease maintainability.
>
> > CLF threads that posed problems that piqued people's interest and
> > resulted in a flurry of smart algorithmic tricks
>
> Sure, good algorithms can produce a far bigger speedup than careful
> optimization of dumb algorithms can possibly hope to.
>
> > that would have been a lot more cumbersome to come up with, design,
> > and test in other languages.
>
> I'd be interested in seeing an example.

so use forth and throw hardware at the problem?

polymorph self

unread,

Nov 27, 2016, 3:23:54 AM11/27/16

to

On Friday, November 18, 2016 at 6:17:01 AM UTC-5, Stephen Pelc wrote:

> On Thu, 17 Nov 2016 15:58:33 -0800, Paul Rubin
> <no.e...@nospam.invalid> wrote:
>
> >I don't know about Foxaudio, but I'd say "very well" means the ITC
> >performance is enough to satisfy users' real-world requirements. Most
> >of my stuff these days is written in Python and it's fast enough for my
> >purposes, and Gforth-itc is probably >10x faster than Python. I've
> >never written a Forth program that wasn't a benchmark, that used enough
> >computation time for me to notice.
>

> One of our clients has a recalculation that used to take 15 or so
> seconds on our DTC ProForth for Windows. The boss excitedly took
> me to his office to show that the conversion to VFX Forth took
> about 1.5 seconds. To him, that was important.
>
> These days, there are plenty of algorithms that do not follow the
> "90% of the time in 10% of the code" model.

>
> Stephen
>
> --
> Stephen Pelc, steph...@mpeforth.com
> MicroProcessor Engineering Ltd - More Real, Less Time
> 133 Hill Lane, Southampton SO15 5AF, England
> tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691
> web: http://www.mpeforth.com - free VFX Forth downloads

do you code in gforth on debian?
do you think websites can be easily made with such?

polymorph self

unread,

Nov 27, 2016, 3:25:24 AM11/27/16

to

On Friday, November 18, 2016 at 10:30:16 AM UTC-5, Stephen Pelc wrote:
> On Fri, 18 Nov 2016 06:37:35 -0800 (PST), foxaudio...@gmail.com
> wrote:
>
> >That's a better ratio of improvement over DTC in intel than I would have
> >expected Stephen, so VFX is quite efficient.
>
> Our numbers suggest (it was a long time ago, and CPUs have changed)
> that going from DTC to STC gains you about 2.2:1. Adding fairly plain
> vanilla optimisation gains you about another 2:1 by simple inlining.
> Beyond this, you are in the territory of lots of grunt work. Both VFX
> and iForth also tokenise small words and expand them inline, which can
>
> pay big dividends, depending very much on CPU and coding style.
>
> >I find myself wondering what you would have done with that client if they had
> >requested a speedup on that recalculation and you had NOT developed VFX?
> >( I am assuming they were prepared to pay for the work)
>
> We could have spent several months writing code routines for the
> recalculation engine (effectively a spreadsheet), but it would still
> not have been as fast as the VFX solution. Sometimes, the correct
> solution to a coding problem is to rewrite the compiler. The MPE
> benchmark suite is on the MPE website. We gave up benchmarking
> ITC Forths some years ago when VFX was more than 12 times faster
> than the popular ITC Forth.
>
> The good business decisions we made in developing VFX were to decide
> to do it properly and get funding for the development.
>
> >Could you have hand coded a few time critical/inner loop routines and
> > achieved those results?
>
> Probably not. Remember also that assembler code is harder to write and
> to maintain, and affects both cost and time to market. As I keep
> pointing out, I haven't written an interrupt routine in assembler for
> an ARM for about 15 years ... except to prove that I can still do it.
> A good code generator is an enabling technology.

>
> Stephen
>
> --
> Stephen Pelc, steph...@mpeforth.com
> MicroProcessor Engineering Ltd - More Real, Less Time
> 133 Hill Lane, Southampton SO15 5AF, England
> tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691
> web: http://www.mpeforth.com - free VFX Forth downloads

by the gods I can't even imagine writing an interrupt in assembler
how the hell did you learn such?

polymorph self

unread,

Nov 27, 2016, 3:27:55 AM11/27/16

to

On Monday, November 21, 2016 at 5:26:19 PM UTC-5, hughag...@gmail.com wrote:
> On Saturday, November 19, 2016 at 10:42:25 AM UTC-7, Stephen Pelc wrote:
> > It is also a fallacy to assume that the commercial Forth
> > vendors are in competition with each other. We are very rarely
> > in competition for the same client. Our competition is the
> > toolchains for other languages.
>
> Your goal in supporting ANS-Forth seems to be to make the Forth community look stupid. Good job! Having the parameters backwards in LOCALS| was, by itself, the killing stroke for the entire Forth community.
>
> You are not in competition with Forth Inc.. I think you rely on them to do your sales, then when they find a customer they outsource the actual programming to you, so you both make money.
>
> You are in competition with Forth programmers such as myself. The whole purpose of ANS-Forth is to allow you to tell everybody in the Forth community that they are non-standard unless they adhere to the idiotic restrictions of Elizabeth Rather's toy language.
>
> You don't adhere to ANS-Forth though. You routinely abandon ANS-Forth compatibility, even when doing so is not necessary. For example, in regard to SYONYM and ALIAS, you told me (quite condescendingly) to RTFM. You meant the VFX manual where you provide a SYNONYM that is VFX-specific and would get me trapped in vendor lock-in. I wrote SYNONYM and ALIAS in ANS-Forth though --- it was easy --- it did require me to have the disambiguifiers already installed.
>
> Here is the code, so RTFM yourself:
>
> : :synonym { xt flg str wid -- } \ FLG is 1 for immediate and -1 for non-immediate
> str wid :name
> flg 1 = if xt lit, execute, ;, immediate exit then
> flg -1 = if xt compile, ;, exit then
> true abort" *** :SYNONYM given an invalid xt ***" ;
>
> : :synonym-fast-check ( -- )
> state @ 0= abort" *** a word created by :SYNONYM-FAST can't be used in interpretive mode ***" ;
>
> : :synonym-fast { xt flg str wid -- } \ FLG is 1 for immediate and -1 for non-immediate
> str wid :name
> flg 1 = if xt lit, execute, ;, immediate exit then
> flg -1 = if postpone :synonym-fast-check
> xt lit, postpone compile, ;, immediate exit then
> true abort" *** :SYNONYM-FAST given an invalid xt ***" ;
>
> : 'find ( -- xt flg ) \ stream: name \ FLG is 1 for immediate and -1 for non-immediate
> bl word find dup 0= abort" *** 'FIND couldn't find the word ***" ;
>
> : synonym { wid | new -- } \ stream: new-name old-name \ the new word is compiled into the WID word-list
> bl word hstr to new
> 'find new wid :synonym
> new dealloc ;
>
> : synonym-fast { wid | new -- } \ stream: new-name old-name \ the new word is compiled into the WID word-list
> bl parse <hstr> to new
> 'find new wid :synonym-fast
> new dealloc ;
>
> \ :SYNONYM-FAST generates faster executing code than :SYNONYM but the words can't be used in interpretive mode.
> \ This may not be necessary with a good optimizing compiler, but I'm not aware of any at this time.
>
> 1234512345 constant alias-id \ an arbitrary number used to identify alias'd definitions
>
> 0
> w field alias.xt \ this should be the first field so the DOES> portion of ALIAS will be fast (no addition needed)
> w field alias.adr \ the address of the body, used to identify alias'd definitions
> w field alias.id \ the constant ALIAS-ID, used to identify alias'd definitions
> constant alias-struct
>
> : alias ( wid -- ) \ stream: new-name old-name \ the new word is compiled into the WID word-list
> get-current swap set-current create set-current
> here >r alias-struct allot
> 'find 1 = if immediate then r@ alias.xt !
> r@ r@ alias.adr !
> alias-id r@ alias.id !
> rdrop
> does>
> alias.xt @ execute ;
>
> \ ALIAS does the same thing as SYNONYM but has the advantage that >BODY will work on it.
> \ It is slower executing though (especially under SwiftForth in which CREATE DOES> words are very inefficient).
>
> : >body ( xt -- adr )
> >body >r
> r@ alias.adr @ r@ = if r@ alias.id @ alias-id = if \ is this an alias'd word?
> r> alias.xt @ >body exit then then \ return the body of the original word
> r> ; \ return the body of this word
>
> \ Note that >BODY still has an undefined result if used on a word that wasn't defined with CREATE or isn't an alias of such a word.

Elizabeth seems nice why not be more polite and work together to make money while replacing all the crappy java?

Rod Pemberton

unread,

Nov 27, 2016, 5:08:26 AM11/27/16

to

On Sun, 27 Nov 2016 00:25:22 -0800 (PST)
polymorph self <jack...@gmail.com> wrote:

> > Probably not. Remember also that assembler code is harder to write
> > and to maintain, and affects both cost and time to market. As I keep
> > pointing out, I haven't written an interrupt routine in assembler
> > for an ARM for about 15 years ... except to prove that I can still
> > do it. A good code generator is an enabling technology.
> >

> by the gods I can't even imagine writing an interrupt in assembler

It's no different than coding a subroutine or procedure in assembly.
You just have to take care of some extra steps.

There are a bunch of people who have done so for the x86 processor on
alt.os.development. I have yet to see anyone on a.o.d. who codes for
the ARM processor.

> how the hell did you learn such?

You read the programmer instruction manuals from the processor
manufacturer. You teach yourself how to program in assembly. Then,
like everything else in life, you solve the puzzle. You make it work by
doing what needs to be done. Think of it as a procedure with a few
extra layers wrapped around the code for a procedure.

For x86, the basic sequence is:

disable interrupts
preserve registers
(optionally) switch stacks
do what needs to be done for the interrupt procedure
(optionally) restore stacks
restore registers
enable interrupts
return from interrupt

Of course, you have to do that sequence in assembly. But, you don't
have to code the interrupt in assembly. You can code most of it
in a high level language like C. However, most high level languages
don't provide a mechanism to implement interrupt routines. So, you
have to find a method to transfer from assembly to the high level
language, and a way to transfer back too. This is more assembly code.
This code is custom to each compiler.

Rod Pemberton

franck....@gmail.com

unread,

Nov 27, 2016, 1:11:45 PM11/27/16

to

Le samedi 26 novembre 2016 12:52:39 UTC+1, Albert van der Horst a écrit :
>
> In my experimental optimiser I have an analyser that finds out all
> properties of all words (with some manual help).
> This may be different from the usual ways. There is no automatic
> optimisation, you have to ask for it. Emphasis is knowledge about
> words, not the compilation process.
> The stack effect is indicated by coloring of the first and last
> char's.
>
> 12 CONSTANT AAP
> LATEST ID.
> AAP ( the first A and last P are purple: stack unknown.)
>
> FILL-ALL ( Add information, based on existing information)
> LATEST ID.
> AAP ( A's are white, P is aqua : nothing in,one out)
>
> Other flags remember that AAP is a compile time constant.
> : test AAP 3 * ;
> SEE test
> : test
> AAP 0000,0003 *
> ;
> test
>
> 'test OPTIMISE
> See AAP
> : test
> 0000,0024
> ;
>
>

> Groetjes Albert
> --
> Albert van der Horst, UTRECHT,THE NETHERLANDS
> Economic growth -- being exponential -- ultimately falters.
> albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

In Oforth, a word in the dictionary is an object, and, as
all objects, has a type.

The #interpret method is polymorphic and declared for each
word type. It is called by the outer interpreter if FIND
retrieves a word from the dictionary.

For constants objects, #interpret adds its value as literal
to the current definition.

For instance (version in current development)

8 const: A
ok
#A dup value .s
[1] (Integer) 8
[2] (Constant) A
ok

: test A 5 A ;
ok
#test see

c7:47:fc: 8 : 0 : mov c0000008, -4(edi)
c7:47:f8: 8 : 7 : mov c0000005, -8(edi)
c7:47:f4: 8 : E : mov c0000008, -12(edi)
83:ef: c:c3 : 15 : sub c, edi
c3: 0: 0: 0 : 18 : ret
ok

Franck
http://www.oforth.com

luser droog

unread,

Nov 27, 2016, 1:56:26 PM11/27/16

to

If you're using google-groups, you have to select
the Desktop Version of the page for it to do quoting.
Zooming and scrolling will be more of a chore. :(

luser droog

unread,

Nov 27, 2016, 2:42:22 PM11/27/16

to

On Sunday, November 27, 2016 at 2:27:55 AM UTC-6, polymorph self wrote:
> On Monday, November 21, 2016 at 5:26:19 PM UTC-5, hughag...@gmail.com wrote:

> Elizabeth seems nice why not be more polite and work together to make money while replacing all the crappy java?

This is pretty telling. The trolling is bothering the trolls.
Try not using anybody's name for like, a week or two.
Just talk code.

--
"I am the man with no name.... Zapp Branigan, at your service."

Elizabeth D. Rather

unread,

Nov 27, 2016, 8:12:54 PM11/27/16

to

It can be even simpler. In SwiftX we follow the convention that a task
managing an interrupt-driven device initiates an operation and then
"sleeps" until an interrupt signals its completion. The interrupt
routine does only the minimum required (tally a counter, read a number
and store it, etc.) and wakes the task, which does any other processing
in high-level Forth. It isn't necessary to save & restore all registers,
and typical interrupt routines use only one (often none, depending on
the processer). Often interrupt routines are only a few instructions
long (4-6, sometimes fewer).

Cheers,
Elizabeth

--
==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
http://www.forth.com

"Forth-based products and Services for real-time
applications since 1973."
==================================================

hughag...@gmail.com

unread,

Nov 28, 2016, 12:06:50 AM11/28/16

to

On Sunday, November 27, 2016 at 6:12:54 PM UTC-7, Elizabeth D. Rather wrote:
> It can be even simpler. In SwiftX we follow the convention that a task
> managing an interrupt-driven device initiates an operation and then
> "sleeps" until an interrupt signals its completion. The interrupt
> routine does only the minimum required (tally a counter, read a number
> and store it, etc.) and wakes the task, which does any other processing
> in high-level Forth. It isn't necessary to save & restore all registers,
> and typical interrupt routines use only one (often none, depending on
> the processer). Often interrupt routines are only a few instructions
> long (4-6, sometimes fewer).

This is nonsense, as usual.

An ISR (interrupt service routine) typically uses a circular buffer. An ISR doing input would put the data into the buffer, and the main program would remove data from the buffer (hopefully before it gets full). An ISR doing output would take data from the buffer, and the main program would put data into the buffer (usually at a much higher speed than the output port is operating at).

An ISR is not going to be 4-6 instructions long --- it is going to be much longer, and it is going to use some registers --- you can't work with a circular buffer without at least two registers, and that would be pretty restrictive.

Most likely, you don't know what a circular buffer is. Your ISRs are super-short because they just use a single variable as their buffer. The problem with this, of course, is that the main-program has to be reading from or writing to that variable as fast as the data comes in or goes out. This has almost no benefit compared to the main-program directly accessing the I/O port. The only benefit is that the processor can be put in low-power sleep mode while it waits for an interrupt, which would be useful if the I/O is very slow. This is a minor benefit though. You should just learn what a circular buffer is so your main-program doesn't have to be totally on top of the I/O but it can just let I/O be done in the background while it does more important work.

In the 1980s the 6502 was commonly used as a micro-controller. It had absolute-indexed addressing which I think was provided primarily for implementing 256-byte circular buffers. Everybody knew about circular buffers in the 1980s. I think they were well-known in the late 1970s too --- maybe in the early 1970s, or the 1960s, they were unknown --- of course, COBOL programmers such as yourself don't know any computer-science whatsoever, then or now.

john

unread,

Nov 28, 2016, 7:22:35 AM11/28/16

to

> In article <41768c5d-fc4b-4050...@googlegroups.com>, hughag...@gmail.com says...

> An ISR is not going to be 4-6 instructions long --- it is going to be much longer, and it is going to use some registers --- you can't work with a circular buffer without at least two registers, and that would be pretty restrictive.
>
>

Another method for ISR's is:
set a flag in the main program and clear the interrupt flag then exit the ISR
This results in very short ISR's
The main program checks the flag, clears it
and performs the actual service required.
This reduces the time an interrupt is being dealt with and the number
of recursions you may otherwise have interfering with main operation.
And perhaps crucially less chance of missing an interrupt..
It does have the drawback of having to check flags regularly though
Horses for courses as they say.

--
john

=========================
http://johntech.co.uk

"Bleeding Edge Forum"
http://johntech.co.uk/forum/

=========================

Alex

unread,

Nov 28, 2016, 8:43:08 AM11/28/16

to

You are very confused about this, and predictably didactic & rude to
Elizabeth.

Circular buffers are but one method for handling incoming or outgoing
data, and they have been used for a lot longer than you realise.

A circular buffer can be as simple as a block of data, addressed by an
index modulo the number of buffers. I have no idea why you think a pair
of registers is required. Neither is data movement needed, since we can
use a circular buffer of pointers to data.

ISRs are typically very short and do as little work as is possible. The
general rule is to try and minimise the amount of time in interrupts,
esecialy if they block other activity, and to do the bulk of the work
outside of the interrupt. An ISR may be as simple as moving a task
pointer from a blocked to a ready queue as part of a scheduler, and the
work is done later by the task.

--
Alex

Andrew Haley

unread,

Nov 28, 2016, 9:52:13 AM11/28/16

to

Alex <al...@rivadpm.com> wrote:
>
> You are very confused about this, and predictably didactic & rude to
> Elizabeth.
>
> Circular buffers are but one method for handling incoming or
> outgoing data, and they have been used for a lot longer than you
> realise.

I've often found circular buffers often to be more trouble than they
are worth. Of course, you have to guess how big to make them, but
that's not all. When a protocol handler starts a transaction, the
first thing you sometimes have to do is clear outstanding garbage from
the input buffer. I found myself doing this and realized that it
makes no sense at all: you might as well receive directly into the
receive packet buffer, handling any checksum as each character is
received, and awaken the task once the packet is ready. (This is for
a single-duplex protocol: other protocols are available.)

> ISRs are typically very short and do as little work as is possible.
> The general rule is to try and minimise the amount of time in
> interrupts, esecialy if they block other activity, and to do the
> bulk of the work outside of the interrupt. An ISR may be as simple
> as moving a task pointer from a blocked to a ready queue as part of
> a scheduler, and the work is done later by the task.

Yes indeed. In the Classic Forth case that's a single store
instruction.

Andrew.

Alex

unread,

Nov 28, 2016, 12:19:20 PM11/28/16

to

On 11/28/2016 14:52, Andrew Haley wrote:

Traditionally data has been moved over the wire, into a buffer on the
network card, from the network card to memory managed by the OS, and
then moved again to the application's buffers. Now we have technologies
like RDMA (remote direct memory access) which gives zero copy networking
straight into application memory, and iWarp and RoCE that provide the
protocol stack.

With hardware support, data can be moved with very small latencies in
the hundreds of nS range. The software stack and handling of interrupts
has become a sizeable percentage of the overhead. You really don't get
to do that much work in nanoseconds.

--
Alex

Andrew Haley

unread,

Nov 28, 2016, 12:58:17 PM11/28/16

to

Alex <al...@rivadpm.com> wrote:
>
> Traditionally data has been moved over the wire, into a buffer on
> the network card, from the network card to memory managed by the OS,
> and then moved again to the application's buffers.

Yep. That's the traditional non-Forth-OS way of doing it.

> Now we have technologies like RDMA (remote direct memory access)
> which gives zero copy networking straight into application memory,
> and iWarp and RoCE that provide the protocol stack.

Mmm, yes, at the big end.

Andrew.

Manuel Rodriguez

unread,

Nov 28, 2016, 2:00:03 PM11/28/16

to

Am Montag, 14. November 2016 03:33:26 UTC+1 schrieb polymorph self:
> I am noticing 15,000 context switches per second on my debian testing linux 4.7 6 cpu amd fx machine.
>
> Are stack chips misc (?) and thus since fewer instructions, a 300mhz chip that is a stack machine gets a lot more done per cycle? or does less work to accomplish the same thing?

Recently there was here in the newsgroup a similar question. The thread developed after a while into a direction which uses Bitcoin-mining as perforamance index. The idea is, to compare a stackmachine like the GA144 chipset with a normal CPU or a ASIC in advance that the stackmachine is so fast, that it will outperform all other solution. The conclusion of the thread in the past was, that stackmachine have no significant advantage to ASIC mining so it makes no sense to switch to forth only for speed reasons.

Albert van der Horst

unread,

Nov 28, 2016, 2:29:24 PM11/28/16

to

In article <xNGdnSCP3ogL2aHF...@supernews.com>,

Andrew Haley <andr...@littlepinkcloud.invalid> wrote:
>Alex <al...@rivadpm.com> wrote:
>>
>> You are very confused about this, and predictably didactic & rude to
>> Elizabeth.
>>
>> Circular buffers are but one method for handling incoming or
>> outgoing data, and they have been used for a lot longer than you
>> realise.
>
>I've often found circular buffers often to be more trouble than they
>are worth. Of course, you have to guess how big to make them, but
>that's not all. When a protocol handler starts a transaction, the
>first thing you sometimes have to do is clear outstanding garbage from
>the input buffer. I found myself doing this and realized that it
>makes no sense at all: you might as well receive directly into the
>receive packet buffer, handling any checksum as each character is
>received, and awaken the task once the packet is ready. (This is for
>a single-duplex protocol: other protocols are available.)

In manx (control of musical instruments) I've about a dozen to
a few dozen midi messages with time stamps. That is hard without
a circular buffer.

>
>> ISRs are typically very short and do as little work as is possible.
>> The general rule is to try and minimise the amount of time in
>> interrupts, esecialy if they block other activity, and to do the
>> bulk of the work outside of the interrupt. An ISR may be as simple
>> as moving a task pointer from a blocked to a ready queue as part of
>> a scheduler, and the work is done later by the task.
>
>Yes indeed. In the Classic Forth case that's a single store
>instruction.

Then there are systems with multiple interrupt levels where the main
program can be locked out by lesser interrupts. There I found it better
to do substantial calculations under interrupt. In this case a simulation
of the expected behaviour of a carriage plus mirror. That calculation
must be finished before the next command to move the mirror at
2 mS intervals.
This was the optical delay line at the VLT (very large telescope) of
ESO at Paranal.

So let's not be dogmatic about this.

[There is a pitfall, one must make sure the interrupt routine to
save and restore the floating point registers. ]

>
>Andrew.

Walter Banks

unread,

Nov 28, 2016, 4:10:16 PM11/28/16

to

As one more data point. In the ISA development work I have done stack
machines can often have very compact code but have slower execution in
the benchmarks we have run.

w..

Paul Rubin

unread,

Nov 28, 2016, 4:33:35 PM11/28/16

to

Manuel Rodriguez <a...@gmx.net> writes:
> The idea is, to compare a stackmachine like the GA144 chipset with a

> normal CPU or a ASIC ... stackmachine have no significant advantage

> to ASIC mining so it makes no sense to switch to forth only for speed
> reasons.

Of course an ASIC will be faster. Much more interesting is how GA144
compares to a normal CPU or FPGA with a comparable fab process.

john

unread,

Nov 28, 2016, 4:59:08 PM11/28/16

to

In article <o1i15m$o7k$1...@cherry.spenarnc.xs4all.nl>,
alb...@cherry.spenarnc.xs4all.nl says...

>
> Then there are systems with multiple interrupt levels where the main
> program can be locked out by lesser interrupts. There I found it better
> to do substantial calculations under interrupt. In this case a simulation
> of the expected behaviour of a carriage plus mirror. That calculation
> must be finished before the next command to move the mirror at
> 2 mS intervals.
> This was the optical delay line at the VLT (very large telescope) of
> ESO at Paranal.
>

curious - how does such a system ensure it doesn't miss an interrupt?
I can see clearing the int flag on entry to the interrupt routine
but what if 2 more come along - you can reset while still processing
the first to see the second on leaving but I can't see how you would recognise
the third.
Or doesn't it matter in this application?

Albert van der Horst

unread,

Nov 28, 2016, 5:07:12 PM11/28/16

to

In article <87eg1v8...@nightsong.com>,

http://www.greenarraychips.com/home/documents/greg/PB001-100503-GA144-1-10.pdf

After 6 years, still PRELIMINARY.

What was the last time, someone tried to purchase one?

Albert van der Horst

unread,

Nov 28, 2016, 5:12:11 PM11/28/16

to

In article <MPG.32a6b880c...@news.virginmedia.com>,

john <an...@example.com> wrote:
>In article <o1i15m$o7k$1...@cherry.spenarnc.xs4all.nl>,
>alb...@cherry.spenarnc.xs4all.nl says...
>>
>> Then there are systems with multiple interrupt levels where the main
>> program can be locked out by lesser interrupts. There I found it better
>> to do substantial calculations under interrupt. In this case a simulation
>> of the expected behaviour of a carriage plus mirror. That calculation
>> must be finished before the next command to move the mirror at
>> 2 mS intervals.
>> This was the optical delay line at the VLT (very large telescope) of
>> ESO at Paranal.
>>
>
>curious - how does such a system ensure it doesn't miss an interrupt?
>I can see clearing the int flag on entry to the interrupt routine
>but what if 2 more come along - you can reset while still processing
>the first to see the second on leaving but I can't see how you would recognise
>the third.
>Or doesn't it matter in this application?

Om each level the time between interrupts are known. So whatever
happens at lower priority interrupts, a higher level interrupt knows
it has the time to complete before the interrupt at that level
repeats.
You're right that this is particular to this situation.

>
>--
>john

hughag...@gmail.com

unread,

Nov 28, 2016, 10:02:23 PM11/28/16

to

I'm not "very confused" about this --- you are just brown-nosing Elizabeth Rather --- are you an employee of Forth Inc.?

> Circular buffers are but one method for handling incoming or outgoing
> data, and they have been used for a lot longer than you realise.

Circular buffers are the primary (as a practical matter, the only) method for handdling I/O --- they have likely been used for as long as computers have existed --- I said the 1960s (I doubt that the computers of the 1950s had interrupts).

Elizzbeth Rather clearly doesn't know what circular buffers are --- she should learn about these basic concepts --- or just stop pretending to be a programmer, which annoys the heck out of me.

> A circular buffer can be as simple as a block of data, addressed by an
> index modulo the number of buffers. I have no idea why you think a pair
> of registers is required. Neither is data movement needed, since we can
> use a circular buffer of pointers to data.

One accumulator to hold the datum, and one index register to point into the buffer --- on the 6502 these would be A and Y.

I didn't say data movement is needed. The ISR just loads or stores a single byte from or to the buffer. The main-program needs to do data movement because it typically loads or stores a block of daa from or to the buffer.

I don't know why you would want a circular buffer of pointers to data --- each datum is typically one or two bytes --- you are not really making any sense.

> ISRs are typically very short and do as little work as is possible. The
> general rule is to try and minimise the amount of time in interrupts,
> esecialy if they block other activity, and to do the bulk of the work
> outside of the interrupt. An ISR may be as simple as moving a task
> pointer from a blocked to a ready queue as part of a scheduler, and the
> work is done later by the task.

I didn't say that ISRs should be long and do unnecessary work.

hughag...@gmail.com

unread,

Nov 28, 2016, 10:37:35 PM11/28/16

to

On Monday, November 28, 2016 at 7:52:13 AM UTC-7, Andrew Haley wrote:
> Alex <al...@rivadpm.com> wrote:
> >
> > You are very confused about this, and predictably didactic & rude to
> > Elizabeth.
> >
> > Circular buffers are but one method for handling incoming or
> > outgoing data, and they have been used for a lot longer than you
> > realise.
>
> I've often found circular buffers often to be more trouble than they
> are worth.

I wonder if Alex is an employee of Forth Inc.. I know that Andrew Haley is an employee of Forth Inc though --- he has admitted it --- and, of course, that is why Elizabeth Rather appointed him to the Forth-200x committee.

Consider this thread:
https://groups.google.com/forum/#!topic/comp.lang.forth/pqhqqvWg9zk%5B51-75%5D
Here Elizabeth Rather said that a UART driver should only support half-duplex unless the applicaton needs full-duplex, but that this is never needed.

On Monday, April 25, 2016 at 5:07:26 PM UTC-7, Elizabeth D. Rather wrote:
> I can't think offhand of a case in which a
> full-duplex port literally has concurrent streams of data flowing in and
> out.

LOL --- that is the definition of "full-duplex" --- she should look these terms up in a technical dictionary before she uses them.

But, Andrew Haley predictably supported her:

On Tuesday, April 26, 2016 at 2:04:29 AM UTC-7, Andrew Haley wrote:
> Paul Rubin <no.e...@nospam.invalid> wrote:
> > Maybe if I know the application only uses half-duplex, I can
> > implement some hacky thing that relies on that fact. But if I'm
> > supplying a library facility it wouldn't occur to me to limit it
> > like that.
>
> I'm sure it wouldn't, because you don't really think in the Forth way.
> Don't speculate!

In that thread it became obvious that Elizabeth Rather doesn't know what full-duplex is, and Andrew Haley obediently supporter her.

Now in this thread it is obvious that Elizabeth Rather doesn't know what a circular buffer is, and Andrew Haley is obediently supporting her.

It makes sense that Andrew Haley should support Elizabeth Rather --- he is an employee of Forth Inc. --- she signs his paychecks.

But why should anybody else in the Forth community support Elizabeth Rather?

ANS-Forth is a cult --- all of the ANS-Forth programmers obediently support Elizabeth Rather's idiotic pronouncements regarding computer I/O --- this is very similar to how the Heaven's Gate cult members supported their leaders' idiotic pronouncements regarding the Hale-Bopp comet.

Ultimately, all of the ANS-Forth cult members' arguments boil down to telling everybody else: "You don't really think in the Forth way!"

john

unread,

Nov 29, 2016, 6:09:08 AM11/29/16

to

> In article <o1iamk$p2s$1...@cherry.spenarnc.xs4all.nl>, alb...@cherry.spenarnc.xs4all.nl says...

>
> In article <MPG.32a6b880c...@news.virginmedia.com>,
> john <an...@example.com> wrote:
> >In article <o1i15m$o7k$1...@cherry.spenarnc.xs4all.nl>,
> >alb...@cherry.spenarnc.xs4all.nl says...
> >>
> >> Then there are systems with multiple interrupt levels where the main
> >> program can be locked out by lesser interrupts. There I found it better
> >> to do substantial calculations under interrupt. In this case a simulation
> >> of the expected behaviour of a carriage plus mirror. That calculation
> >> must be finished before the next command to move the mirror at
> >> 2 mS intervals.
> >> This was the optical delay line at the VLT (very large telescope) of
> >> ESO at Paranal.
> >>
> >
> >curious - how does such a system ensure it doesn't miss an interrupt?
> >I can see clearing the int flag on entry to the interrupt routine
> >but what if 2 more come along - you can reset while still processing
> >the first to see the second on leaving but I can't see how you would recognise
> >the third.
> >Or doesn't it matter in this application?
>
> Om each level the time between interrupts are known.

Ahh .. I see. That makes all the difference of course.

> So whatever
> happens at lower priority interrupts, a higher level interrupt knows
> it has the time to complete before the interrupt at that level
> repeats.
> You're right that this is particular to this situation.
>
> >
> >--
> >john
>
> Groetjes Albert

--

Alex

unread,

Nov 29, 2016, 9:41:11 AM11/29/16

to

On 11/29/2016 03:02, hughag...@gmail.com wrote:
> On Monday, November 28, 2016 at 6:43:08 AM UTC-7, Alex wrote:
>> On 11/28/2016 05:06, hughag...@gmail.com wrote:

>
>> You are very confused about this, and predictably didactic & rude
>> to Elizabeth.
>
> I'm not "very confused" about this --- you are just brown-nosing
> Elizabeth Rather --- are you an employee of Forth Inc.?

Yes, of course you are confused. It goes with you being a furtlewangling
paranoid wanktit. No, I am not an employee of Forth Inc.

>
>> Circular buffers are but one method for handling incoming or
>> outgoing data, and they have been used for a lot longer than you
>> realise.
>
> Circular buffers are the primary (as a practical matter, the only)
> method for handdling I/O --- they have likely been used for as long
> as computers have existed --- I said the 1960s (I doubt that the
> computers of the 1950s had interrupts).

You said they were unknown until the late 1970s originally. Just a
couple of decades out.

Have you never heard of FIFO queues for IO? Hugely popular. For example,
the Linux IO scheduler uses FIFO queues, and SCSI disk commands are
queue based. They mainly have a place for slower byte oriented devices
like tape, cards, keyboards and so on.

>
>> A circular buffer can be as simple as a block of data, addressed by
>> an index modulo the number of buffers. I have no idea why you think
>> a pair of registers is required. Neither is data movement needed,
>> since we can use a circular buffer of pointers to data.
>
> One accumulator to hold the datum, and one index register to point
> into the buffer --- on the 6502 these would be A and Y.

That's an implementation detail specific to some set of hardware, not a
description of the algorithm for a circular buffer.

>
> I didn't say data movement is needed. The ISR just loads or stores a
> single byte from or to the buffer. The main-program needs to do data
> movement because it typically loads or stores a block of daa from or
> to the buffer.

Why do all that data movement? Can you think of a way of avoiding it?

>
> I don't know why you would want a circular buffer of pointers to data
> --- each datum is typically one or two bytes --- you are not really
> making any sense.

Is a read off disk or from a network just one or two bytes per interrupt?

The solution to avoiding lots of redundant data movement is to use
pointers. Write it once, point to it and avoid moving the data again.
But we've already had this discussion some time ago about your sort
routine that shuffled the data rather than pointers to the data. You
didn't seem to understand then either.

--
Alex

Andrew Haley

unread,

Nov 29, 2016, 10:44:54 AM11/29/16

to

Albert van der Horst <alb...@cherry.spenarnc.xs4all.nl> wrote:
> In article <xNGdnSCP3ogL2aHF...@supernews.com>,
> Andrew Haley <andr...@littlepinkcloud.invalid> wrote:
>>Alex <al...@rivadpm.com> wrote:
>>>
>>> You are very confused about this, and predictably didactic & rude to
>>> Elizabeth.
>>>
>>> Circular buffers are but one method for handling incoming or
>>> outgoing data, and they have been used for a lot longer than you
>>> realise.
>>
>>I've often found circular buffers often to be more trouble than they
>>are worth.
>

>>> ISRs are typically very short and do as little work as is possible.
>>> The general rule is to try and minimise the amount of time in
>>> interrupts, esecialy if they block other activity, and to do the
>>> bulk of the work outside of the interrupt. An ISR may be as simple
>>> as moving a task pointer from a blocked to a ready queue as part of
>>> a scheduler, and the work is done later by the task.
>>
>>Yes indeed. In the Classic Forth case that's a single store
>>instruction.

> Then there are systems with multiple interrupt levels where the main
> program can be locked out by lesser interrupts. There I found it better
> to do substantial calculations under interrupt. In this case a simulation
> of the expected behaviour of a carriage plus mirror. That calculation
> must be finished before the next command to move the mirror at
> 2 mS intervals.
> This was the optical delay line at the VLT (very large telescope) of
> ESO at Paranal.
>
> So let's not be dogmatic about this.

Oh, sure. If you need the result of the calculation right now, you do
that calculation right now.

Note the "often" in the sentence above: no dogmatism intended. I'm
saying that I've seen people carefully use circular buffers to make
sure "nothing gets lost" when getting rid of stale bytes is precisely
what is required. The point I'm making is do what is required: no
more, no less.

Andrew.

Brad Eckert

unread,

Nov 29, 2016, 12:17:10 PM11/29/16

to

Since the GA144 is built on 180nm and CPUs and FPGAs are built on much smaller process, we may never know. Every time I've tried to justify a GA144 in a project, I could never do it. I think, though, that the GA144's claim to fame is power savings through asynchronous logic. Chips are getting so dense nowadays that you can only pack circuits so tightly because of the heat they generate. Async processing solves this. Unfortunately, making tiny little Forth machines throws away the efficiency gain by simulating the hardware you want instead of implementing it directly. OKAD should be used to design cores that people want, such as GPUs.

Specialized processing, such as the GPU in your phone, is what's keeping Moore's Law alive. The cost per computation is still halving every 18 months even though process shrinks are approaching a dead end because more and more processing is done on dedicated hardware rather than general purpose CPUs.

Speaking of Moore's Law, the cost per calculation curve goes back through vacuum tubes, relays, and gears to the quill pen. In the quill pen domain, it goes back through the discovery of ever more powerful mathematical methods. For all we know, it could go back to people writing tally marks on a cave wall. BTW, the Moore's Law expression of 2^1.5*y is curiously close to Phi^y. It could be that cost per computation goes down by the golden ratio Phi year after year as one of nature's Fibonacci growth spirals. If you like a little mysticism with your computing.

rickman

unread,

Nov 29, 2016, 12:43:03 PM11/29/16

to

I'm not sure why the emphasis of this discussion is on moving data in
the interrupt routine. These days all but the low end MCUs use hardware
to move data to/from an I/O device and memory. We often call that DMA
although in some cases the hardware used is actually a very simple
programmable processor. The interrupt routine has only to pass through
to the task the fact that the transfer has completed and it is time to
awaken the task that uses the data.

You likely won't have this in a $0.50 processor used to control a toy or
control panel for a microwave, but then those devices aren't going to be
moving much data and likely won't be using interrupts.

A/D and D/A conversion is a common example of I/O that requires blocks
of data to be moved. This is handled by hardware in a great number of
processors even at the low end. An extreme example is the GA144 where
*all* I/O is done by allocating to each I/O task the hardware of a
dedicated processor. Then the comms between the processors is handled
by dedicated hardware. But then the GA144 is not exactly a low end device.

--

Rick C

Alex

unread,

Nov 29, 2016, 1:38:29 PM11/29/16

to

On 11/29/2016 17:43, rickman wrote:
> On 11/28/2016 8:43 AM, Alex wrote:

>>
>> ISRs are typically very short and do as little work as is possible. The
>> general rule is to try and minimise the amount of time in interrupts,
>> esecialy if they block other activity, and to do the bulk of the work
>> outside of the interrupt. An ISR may be as simple as moving a task
>> pointer from a blocked to a ready queue as part of a scheduler, and the
>> work is done later by the task.
>
> I'm not sure why the emphasis of this discussion is on moving data in
> the interrupt routine. These days all but the low end MCUs use hardware
> to move data to/from an I/O device and memory. We often call that DMA
> although in some cases the hardware used is actually a very simple
> programmable processor. The interrupt routine has only to pass through
> to the task the fact that the transfer has completed and it is time to
> awaken the task that uses the data.
>

By move I meant load and store.

--
Alex

rickman

unread,

Nov 29, 2016, 1:57:58 PM11/29/16

to

I don't know of any particular feature of a stack machine that allows it
to have more compact code. The way they are often used to implement
Forth may result in smaller code as the typical Forth program uses
smaller modules with little overhead that can encourage reuse.

Why do you say stack machines result in smaller code?

--

Rick C

rickman

unread,

Nov 29, 2016, 1:58:28 PM11/29/16

to

Why is an ASIC automatically faster?

--

Rick C

rickman

unread,

Nov 29, 2016, 2:46:38 PM11/29/16

to

On 11/29/2016 12:17 PM, Brad Eckert wrote:
> On Monday, November 28, 2016 at 2:33:35 PM UTC-7, Paul Rubin wrote:
>> Manuel Rodriguez <a...@gmx.net> writes:
>>> The idea is, to compare a stackmachine like the GA144 chipset
>>> with a normal CPU or a ASIC ... stackmachine have no significant
>>> advantage to ASIC mining so it makes no sense to switch to forth
>>> only for speed reasons.
>>
>> Of course an ASIC will be faster. Much more interesting is how
>> GA144 compares to a normal CPU or FPGA with a comparable fab
>> process.
>
> Since the GA144 is built on 180nm and CPUs and FPGAs are built on
> much smaller process, we may never know. Every time I've tried to
> justify a GA144 in a project, I could never do it. I think, though,
> that the GA144's claim to fame is power savings through asynchronous
> logic.

I think the low power aspect of the GA144 is greatly overstated. While
it looks good on paper, there are other aspects of the GA144 that reduce
the benefit. For example, to communicate, any nodes that are not
adjacent must use other nodes as passthroughs. Each running node uses
some mA while running so the impact can be considerable in some designs.

> Chips are getting so dense nowadays that you can only pack
> circuits so tightly because of the heat they generate. Async
> processing solves this.

Async processors does nothing to solve this. The circuit is only low
power when it is not running. If you have trouble getting rid of the
heat when running full tilt you have a chip that you can't use to its
full capacity.

> Unfortunately, making tiny little Forth
> machines throws away the efficiency gain by simulating the hardware
> you want instead of implementing it directly. OKAD should be used to
> design cores that people want, such as GPUs.

I'm not sure what you mean by this. A stack processor is a processor
the same as any other. Why do you need to simulate hardware on it?

> Specialized processing, such as the GPU in your phone, is what's
> keeping Moore's Law alive. The cost per computation is still halving
> every 18 months even though process shrinks are approaching a dead
> end because more and more processing is done on dedicated hardware
> rather than general purpose CPUs.

I don't follow this conclusion either. Why would Moore's law only apply
to CPUs and not other hardware?

> Speaking of Moore's Law, the cost per calculation curve goes back
> through vacuum tubes, relays, and gears to the quill pen. In the
> quill pen domain, it goes back through the discovery of ever more
> powerful mathematical methods. For all we know, it could go back to
> people writing tally marks on a cave wall. BTW, the Moore's Law
> expression of 2^1.5*y is curiously close to Phi^y. It could be that
> cost per computation goes down by the golden ratio Phi year after
> year as one of nature's Fibonacci growth spirals. If you like a
> little mysticism with your computing.

I'm a bit lost. The golden ratio is around 1.6. 2^1.5 is 2.8. That's
not very close. I think your formula should have been 2^(y/1.5). That
would be 1.587^y which is very close to the golden ratio.

--

Rick C

Brad Eckert

unread,

Nov 29, 2016, 4:44:55 PM11/29/16

to

On Tuesday, November 29, 2016 at 12:46:38 PM UTC-7, rickman wrote:
> On 11/29/2016 12:17 PM, Brad Eckert wrote:
> > Unfortunately, making tiny little Forth
> > machines throws away the efficiency gain by simulating the hardware
> > you want instead of implementing it directly. OKAD should be used to
> > design cores that people want, such as GPUs.
>
> I'm not sure what you mean by this. A stack processor is a processor
> the same as any other. Why do you need to simulate hardware on it?
>

Take for example a digital filter I coded in 486 assembly. Lots of code is spent moving data around when a few DSP instructions would have done the job. The way I see it, I'm simulating DSP hardware. In fact, if it had to be crazy fast, real hardware could execute a pipelined version in one clock cycle. Instead, I simulate that with code.

>
> > Specialized processing, such as the GPU in your phone, is what's
> > keeping Moore's Law alive. The cost per computation is still halving
> > every 18 months even though process shrinks are approaching a dead
> > end because more and more processing is done on dedicated hardware
> > rather than general purpose CPUs.
>
> I don't follow this conclusion either. Why would Moore's law only apply
> to CPUs and not other hardware?
>

There are various versions of Moore's Law. The cost per computation is the one that goes back to mechanical calculating machines. People are always saying Moore's law will poop out somewhere below 10nm, but I think efficiency improvements will push that out a couple of decades. The CPU makers sometimes give apps a huge performance boost (at not much extra cost) by adding custom instruction set extensions for crypto or multimedia. The size of the consumer electronics industry will support the shift to more hardware-based solutions. The case today is that designers have more transistors than they know what to do with. The easiest thing to do, but far from the most efficient, is put multiple CPU cores on a chip.

>
> > Speaking of Moore's Law, the cost per calculation curve goes back
> > through vacuum tubes, relays, and gears to the quill pen. In the
> > quill pen domain, it goes back through the discovery of ever more
> > powerful mathematical methods. For all we know, it could go back to
> > people writing tally marks on a cave wall. BTW, the Moore's Law
> > expression of 2^1.5*y is curiously close to Phi^y. It could be that
> > cost per computation goes down by the golden ratio Phi year after
> > year as one of nature's Fibonacci growth spirals. If you like a
> > little mysticism with your computing.
>
> I'm a bit lost. The golden ratio is around 1.6. 2^1.5 is 2.8. That's
> not very close. I think your formula should have been 2^(y/1.5). That
> would be 1.587^y which is very close to the golden ratio.

Derp. Mistranslated "doubles every 18 months".

Brad Eckert

unread,

Nov 29, 2016, 4:57:04 PM11/29/16

to

On Tuesday, November 29, 2016 at 10:43:03 AM UTC-7, rickman wrote:
> > ISRs are typically very short and do as little work as is possible. The
> > general rule is to try and minimise the amount of time in interrupts,
> > esecialy if they block other activity, and to do the bulk of the work
> > outside of the interrupt. An ISR may be as simple as moving a task
> > pointer from a blocked to a ready queue as part of a scheduler, and the
> > work is done later by the task.
>
> I'm not sure why the emphasis of this discussion is on moving data in
> the interrupt routine. These days all but the low end MCUs use hardware
> to move data to/from an I/O device and memory. We often call that DMA
> although in some cases the hardware used is actually a very simple
> programmable processor. The interrupt routine has only to pass through
> to the task the fact that the transfer has completed and it is time to
> awaken the task that uses the data.
>

I think Elizabeth was referring to the high-priority / low-priority task split that worked so well on ancient hardware (and works well today). The low priority task is serviced by a rapid cooperative multitasker. The technique can be used outside of Forth. One time our development team farmed out development of a communication stack (in C). The code worked, but it was blocking our code in a lot of places. I wrote a little cooperative task switcher in assembly to make it yield to our code.

rickman

unread,

Nov 29, 2016, 6:39:54 PM11/29/16

to

On 11/29/2016 4:44 PM, Brad Eckert wrote:
> On Tuesday, November 29, 2016 at 12:46:38 PM UTC-7, rickman wrote:
>> On 11/29/2016 12:17 PM, Brad Eckert wrote:
>>> Unfortunately, making tiny little Forth machines throws away the
>>> efficiency gain by simulating the hardware you want instead of
>>> implementing it directly. OKAD should be used to design cores
>>> that people want, such as GPUs.
>>
>> I'm not sure what you mean by this. A stack processor is a
>> processor the same as any other. Why do you need to simulate
>> hardware on it?
>>
>
> Take for example a digital filter I coded in 486 assembly. Lots of
> code is spent moving data around when a few DSP instructions would
> have done the job. The way I see it, I'm simulating DSP hardware. In
> fact, if it had to be crazy fast, real hardware could execute a
> pipelined version in one clock cycle. Instead, I simulate that with
> code.

I still don't know what you were referring to in the original statement.
What "efficiency gain" were you talking about that gets thrown away?

>>> Specialized processing, such as the GPU in your phone, is what's
>>> keeping Moore's Law alive. The cost per computation is still
>>> halving every 18 months even though process shrinks are
>>> approaching a dead end because more and more processing is done
>>> on dedicated hardware rather than general purpose CPUs.
>>
>> I don't follow this conclusion either. Why would Moore's law only
>> apply to CPUs and not other hardware?
>>
>
> There are various versions of Moore's Law. The cost per computation
> is the one that goes back to mechanical calculating machines. People
> are always saying Moore's law will poop out somewhere below 10nm, but
> I think efficiency improvements will push that out a couple of
> decades. The CPU makers sometimes give apps a huge performance boost
> (at not much extra cost) by adding custom instruction set extensions
> for crypto or multimedia. The size of the consumer electronics
> industry will support the shift to more hardware-based solutions. The
> case today is that designers have more transistors than they know
> what to do with. The easiest thing to do, but far from the most
> efficient, is put multiple CPU cores on a chip.

Moore's law only referred to the number of transistors on a chip. As
far as I know, that still holds true today. Again, I don't know what
"efficiency gains" you are talking about.

The part I am really confused about is about the process shrinks
approaching a dead end because of dedicated hardware. Hardware of all
types benefit from smaller transistors.

>>> Speaking of Moore's Law, the cost per calculation curve goes
>>> back through vacuum tubes, relays, and gears to the quill pen. In
>>> the quill pen domain, it goes back through the discovery of ever
>>> more powerful mathematical methods. For all we know, it could go
>>> back to people writing tally marks on a cave wall. BTW, the
>>> Moore's Law expression of 2^1.5*y is curiously close to Phi^y. It
>>> could be that cost per computation goes down by the golden ratio
>>> Phi year after year as one of nature's Fibonacci growth spirals.
>>> If you like a little mysticism with your computing.
>>
>> I'm a bit lost. The golden ratio is around 1.6. 2^1.5 is 2.8.
>> That's not very close. I think your formula should have been
>> 2^(y/1.5). That would be 1.587^y which is very close to the golden
>> ratio.
>
> Derp. Mistranslated "doubles every 18 months".

What is "Derp."?

--

Rick C

hughag...@gmail.com

unread,

Nov 30, 2016, 12:32:59 AM11/30/16

to

On Tuesday, November 29, 2016 at 7:41:11 AM UTC-7, Alex wrote:
> On 11/29/2016 03:02, hughag...@gmail.com wrote:
> > On Monday, November 28, 2016 at 6:43:08 AM UTC-7, Alex wrote:
> >> On 11/28/2016 05:06, hughag...@gmail.com wrote:
>
> >
> >> You are very confused about this, and predictably didactic & rude
> >> to Elizabeth.
> >
> > I'm not "very confused" about this --- you are just brown-nosing
> > Elizabeth Rather --- are you an employee of Forth Inc.?
>
> Yes, of course you are confused. It goes with you being a furtlewangling
> paranoid wanktit. No, I am not an employee of Forth Inc.

Oh, you're Alex McDonald! LOL

You are the same Alex who sent me this death threat by email:
-----------------------------------------------------------------------------
On Wed, Jun 8, 2016 at 2:29 PM, Alex McDonald <al...@rivadpm.com> wrote:
On 01/06/16 16:20, hughag...@gmail.com wrote:
> The RfD was going to be a slam-dunk from a ladder for Alex McDonald > --- but then Alex McDonald disappeared --- now the RfD is left in
> limbo while Anton and Bernd search for a new pocket-boy who will put > his name on the RfD.

Then the Forth-200x committee had Alex McDonald pretending to represent
>the "Forth community," but Alex "I do standards for a living" McDonald
> disappeared. This is a chronic problem for the Forth-200x committee
> --- their pocket boys disappear --- then they have to find new
> pocket boys to put their names on the RfDs.

Unfortunately, you're still here, and I suspect you won't do me the tremendous favour of fucking off or dieing any time soon. Ah well.

I still occasionally read clf. Personal reasons and your continued existence have stopped me being an active contributor. I'll continue to lurk for the moment.

--
Alex "I still do standards for a living" McDonald
-----------------------------------------------------------------------------

So, do you still do standards for a living? Who signs your paycheck --- Forth Inc., I would assume

> The solution to avoiding lots of redundant data movement is to use
> pointers. Write it once, point to it and avoid moving the data again.
> But we've already had this discussion some time ago about your sort
> routine that shuffled the data rather than pointers to the data. You
> didn't seem to understand then either.

I remember that you criticized my SORT function and said that I had a: "serious misunderstanding of how pointers work." That was truly the stupidest thing you have ever said, and you have said a lot of stupid things.

This is my SORT function from the novice package:

\ All of these macros use the locals from SORT, and can only be called from SORT.

macro: adr ( index -- adr )
recsiz * array + ;

macro: left ( x -- y ) 2* 1+ ;

macro: right ( x -- y ) 2* 2 + ;

macro: heapify ( x -- )
dup >r begin \ r: -- great
dup left dup limit < if dup adr rover adr 'comparer execute if rdrop dup >r then then drop
dup right dup limit < if dup adr r@ adr 'comparer execute if rdrop dup >r then then drop
dup r@ <> while
adr r@ adr recsiz exchange
r@ repeat
drop rdrop ;

macro: build-max-heap ( -- )
limit 1- 2/ begin dup 0>= while dup heapify 1- repeat drop ;

: sort { array limit recsiz 'comparer -- }
recsiz [ w 1- ] literal and abort" *** SORT: record size must be a multiple of the cell size ***"
build-max-heap
begin limit while -1 +to limit
0 adr limit adr recsiz exchange
0 heapify repeat ;

\ The SORT locals:
\ array \ the address of the 0th element
\ limit \ the number of records in the array
\ recsiz \ the size of a record in the array \ this must be a multiple of W (FIELD assures this)
\ 'comparer \ adrX adrY -- X>Y?

\ Note for the novice:
\ This code was originally written with colon words rather than macros, and using items rather than local variables.
\ After it was debugged, it was changed to use macros and locals so that it would be fast and reentrant.
\ One of the reasons why the heap-sort was chosen is because it is not recursive, which allows macros to be used.
\ Using macros allows the data (array, limit, recsiz, 'comparer) to be held in locals rather than items, which is reentrant.
\ See the file SORT.4TH for an early version.

Cecil Bayona

unread,

Nov 30, 2016, 1:46:27 AM11/30/16

to

One of the Forth CPUs I been playing with the ep32 packs 5 instructions
to a 32 bit word, no statistics or anything but that has to save some space.

--
Cecil - k5nwa

rickman

unread,

Nov 30, 2016, 2:01:54 AM11/30/16

to

I don't follow at all. I have looked at instruction sizes from 4 bits
to 18 bits. My consideration of instruction sizes indicates the amount
of compression is fairly constant in that the smaller instructions are
more limited. A larger instruction size allows for a lot more variation
and so more can happen in a single instruction... or what happens is
more tailored. For example, consider the variations of add, subtract,
with and without carry. The GA144 dispenses with all that and just has
an add instruction expecting the programmer to add other instructions to
implement the variations. The resulting size of your code depends on
whether you are utilizing the features available when a larger
instruction size is used. If your code needs subtract as often as add
and utilize carry such as in double precision operations.

The observations I made were that code density is impacted more by the
instruction set fitting the application more than anything else.

--

Rick C

Brad Eckert

unread,

Nov 30, 2016, 6:00:14 PM11/30/16

to

On Tuesday, November 29, 2016 at 4:39:54 PM UTC-7, rickman wrote:
>
> I still don't know what you were referring to in the original statement.
> What "efficiency gain" were you talking about that gets thrown away?
>

Each code a highly tuned, compact layout of a little CPU. Very energy efficient. But simulating a wire to make the layout work isn't very efficient. And speaking of async computing being much better, maybe it's not. Clock trees are only a problem due to their implementation. That can change. Besides, things like Booth multipliers are fabrics of gates that have no clock. They're only synchronous around the edges.

Chuck is being, well, Chuck. He's all cowboy. He could have gone back to build up Forth Inc after his foray into hardware, but that wouldn't be very cowboy.

>
> >>> Specialized processing, such as the GPU in your phone, is what's
> >>> keeping Moore's Law alive. The cost per computation is still
> >>> halving every 18 months even though process shrinks are
> >>> approaching a dead end because more and more processing is done
> >>> on dedicated hardware rather than general purpose CPUs.
> >>
> >> I don't follow this conclusion either. Why would Moore's law only
> >> apply to CPUs and not other hardware?
> >>
> >
> > There are various versions of Moore's Law. The cost per computation
> > is the one that goes back to mechanical calculating machines. People
> > are always saying Moore's law will poop out somewhere below 10nm, but
> > I think efficiency improvements will push that out a couple of
> > decades. The CPU makers sometimes give apps a huge performance boost
> > (at not much extra cost) by adding custom instruction set extensions
> > for crypto or multimedia. The size of the consumer electronics
> > industry will support the shift to more hardware-based solutions. The
> > case today is that designers have more transistors than they know
> > what to do with. The easiest thing to do, but far from the most
> > efficient, is put multiple CPU cores on a chip.
>
> Moore's law only referred to the number of transistors on a chip. As
> far as I know, that still holds true today. Again, I don't know what
> "efficiency gains" you are talking about.
>
> The part I am really confused about is about the process shrinks
> approaching a dead end because of dedicated hardware. Hardware of all
> types benefit from smaller transistors.
>

It's probably in the way I process generalities. It's more a phenomenon related to Moore's Law. Gordon Moore was a transistor guy who noticed the number of transistors on a chip following an exponential trend. The human cost per computation can be shown to follow the same curve, approximately 1.618^-y. Y goes up, cost goes down. That curve goes back a century at least. If you include the mathematicians and their discoveries, maybe much further.

My point is that as it gets harder to make transistors cheaper, and Gordon Moore's law flattens, the difference will be made up in the way transistors are used. The big silicon economy can afford to put resources into using them more efficiently. Using them to make a large SRAM for a general purpose processor does a little something for everyone. Lots of room for improvement for specialized needs. More customers with deep pockets means further decreases in cost per computation even if further process shrinks stop.

The real question is what comes after the process shrinks have stopped and the transistors are used as efficiently as possible? Will the 1.618^-y still hold? What if it's a metaphysical thing?

rickman

unread,

Nov 30, 2016, 6:41:58 PM11/30/16

to

On 11/30/2016 6:00 PM, Brad Eckert wrote:
> On Tuesday, November 29, 2016 at 4:39:54 PM UTC-7, rickman wrote:
>>
>> I still don't know what you were referring to in the original
>> statement. What "efficiency gain" were you talking about that gets
>> thrown away?
>>
> Each code a highly tuned, compact layout of a little CPU. Very energy
> efficient. But simulating a wire to make the layout work isn't very
> efficient. And speaking of async computing being much better, maybe
> it's not. Clock trees are only a problem due to their implementation.
> That can change. Besides, things like Booth multipliers are fabrics
> of gates that have no clock. They're only synchronous around the
> edges.

So you mean a software driven device loses efficiency over dedicated
hardware? Every approach has shortcomings. Comms is a problem in the
GA144 as they have to travel through a path of nodes to get anywhere.
That works ok for a data flow application or for some types of signal
processing, especially if a fast signal is being decimated.

> Chuck is being, well, Chuck. He's all cowboy. He could have gone back
> to build up Forth Inc after his foray into hardware, but that
> wouldn't be very cowboy.

I'm not aware that Forth, Inc needs building up. Aren't they doing ok
as it is? Certainly MPE is doing ok as well.

Are you referring to the NRE associated with developing the next process
node? That *will* have a limit, maybe not technically, but
economically. It was over 10 years ago the mask set cost rose to over a
million dollars. That makes it hard to justify a custom chip.
Foundries cost billions of dollars. That makes it hard to justify a
switch to a new node. At some point the cost of developing and
switching to the next node will be so large the amortization will make
the next node chips as costly as the present node chips and there will
be no reason to continue.

> The real question is what comes after the process shrinks have
> stopped and the transistors are used as efficiently as possible? Will
> the 1.618^-y still hold? What if it's a metaphysical thing?

I'm not sure why it would be a metaphysical thing. There may be new
paradigms that facilitate processing with greatly reduced costs. Maybe
instead of everyone having processing in their hands, they will have
great comms and all the real processing money will go into remote
servers. Then the economics are a bit different.

--

Rick C

Elizabeth D. Rather

unread,

Nov 30, 2016, 7:26:20 PM11/30/16

to

On 11/30/16 1:42 PM, rickman wrote:
> On 11/30/2016 6:00 PM, Brad Eckert wrote:

...

>> Chuck is being, well, Chuck. He's all cowboy. He could have gone back
>> to build up Forth Inc after his foray into hardware, but that
>> wouldn't be very cowboy.
>
> I'm not aware that Forth, Inc needs building up. Aren't they doing ok
> as it is? Certainly MPE is doing ok as well.

Doing fine, thank you! Several major contracts under way.

Cheers,
Elizabeth

--
==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
http://www.forth.com

"Forth-based products and Services for real-time
applications since 1973."
==================================================

polymorph self

unread,

Dec 1, 2016, 3:19:32 AM12/1/16

to

On Sunday, November 27, 2016 at 5:08:26 AM UTC-5, Rod Pemberton wrote:
> On Sun, 27 Nov 2016 00:25:22 -0800 (PST)
> polymorph self <jack...@gmail.com> wrote:
>
> > > Probably not. Remember also that assembler code is harder to write
> > > and to maintain, and affects both cost and time to market. As I keep
> > > pointing out, I haven't written an interrupt routine in assembler
> > > for an ARM for about 15 years ... except to prove that I can still
> > > do it. A good code generator is an enabling technology.
> > >
> > by the gods I can't even imagine writing an interrupt in assembler
>
> It's no different than coding a subroutine or procedure in assembly.
> You just have to take care of some extra steps.
>
> There are a bunch of people who have done so for the x86 processor on
> alt.os.development. I have yet to see anyone on a.o.d. who codes for
> the ARM processor.
>
> > how the hell did you learn such?
>
> You read the programmer instruction manuals from the processor
> manufacturer. You teach yourself how to program in assembly. Then,
> like everything else in life, you solve the puzzle. You make it work by
> doing what needs to be done. Think of it as a procedure with a few
> extra layers wrapped around the code for a procedure.
>
> For x86, the basic sequence is:
>
> disable interrupts
> preserve registers
> (optionally) switch stacks
> do what needs to be done for the interrupt procedure
> (optionally) restore stacks
> restore registers
> enable interrupts
> return from interrupt
>
> Of course, you have to do that sequence in assembly. But, you don't
> have to code the interrupt in assembly. You can code most of it
> in a high level language like C. However, most high level languages
> don't provide a mechanism to implement interrupt routines. So, you
> have to find a method to transfer from assembly to the high level
> language, and a way to transfer back too. This is more assembly code.
> This code is custom to each compiler.
>
>
> Rod Pemberton

Interesting...so humans like me could learn it given some time and effort....

Ilya Tarasov

unread,

Dec 1, 2016, 8:35:56 AM12/1/16

to

> A guess on my part but I will attribute your answer to language issues,
> because you are responding as I was making claims that I did not make.
>
> I did not state that performance by Forth is superior to anything else,
> or that performance is necessary in all cases but that it "can be
> important" in some cases.

If you think I'm typing randomly or evil siberian bear forcing me to drink vodka you are probably wrong. I can copy your previous message:

> >> A desktop computer is not the only place where Forth is used, on small
> >> embedded CPUs, performance can be important.

This is a very tricky move to point to performance optimization when someone wants to freeze project. 'Why do you write compiler instead of finishing your project? - Because performance can be important!' Is it important right now? Also, estimate performance at early stage and choose tasks where Forth performance will be enough.

> Switching to assembler could be avoided if the Forth compiler is
> efficient, not everyone is making a project that involves making a lot
> of units, in my case most projects are one of a kind not to be duplicated.

Switching to assembler is not a disaster. 'Right tool for the right task'.

Walter Banks

unread,

Dec 1, 2016, 1:01:22 PM12/1/16

to

The comment came out experience we have had with ISA development. Part
of it is the instruction packing possibilities some of it is the code
generation tools that can take advantage of very good data flow
analysis. The trade off between addressing fields and pure operation's
favors operation only instructions for application code.

All things being equal the execution turnout slower. We do know that
this was so generally because of the overhead for the extra steps needed
to execute code.

w..

hughag...@gmail.com

unread,

Dec 1, 2016, 9:38:29 PM12/1/16

to

On Thursday, December 1, 2016 at 6:35:56 AM UTC-7, Ilya Tarasov wrote:
> > A guess on my part but I will attribute your answer to language issues,
> > because you are responding as I was making claims that I did not make.
> >
> > I did not state that performance by Forth is superior to anything else,
> > or that performance is necessary in all cases but that it "can be
> > important" in some cases.
>

> If you think ... evil siberian bear forcing me to drink vodka you are probably wrong.

Probably wrong? Does that mean that there is a probability less than 0.5 that an evil siberian bear is forcing you to drink vodka?

> > Switching to assembler could be avoided if the Forth compiler is
> > efficient, not everyone is making a project that involves making a lot
> > of units, in my case most projects are one of a kind not to be duplicated.
>
> Switching to assembler is not a disaster. 'Right tool for the right task'.

For a long time, I thought of Forth as being an over-grown macro-assembler. There was no issue of "switching to assembler" --- writing code in assembly was the focus throughout.

That seemed to be the idea at Testra where I wrote MFX for their MiniForth processor --- I wrote quite a lot of assembly code, and I also wrote Forth code that metacompiled assembly (called "macros" in a traditional macro-assembler) --- there wasn't any distinction between hand-writing assembler code and compiling assembler code (because I was the one who wrote the compiler).