DTC

Mark Wills

unread,

Nov 27, 2012, 11:01:26 AM11/27/12

to

Very nice write-up here about the Forth virtual machine, complete with
links to our very own Anton Ertl's pages:

http://www.wordiq.com/definition/Forth_virtual_machine

One thing that struck me, it mentions that direct threaded is not as
"flexible" as ITC. I was wondering in what respect DTC is less
flexible? Anyone have any opinions/comments?

For context, here is the description of DTC from the above link:

"Direct threading: The addresses in the code are actually the address
of machine language. This is a compromise between speed and space. The
indirect data pointer is lost, at some loss in the language's
flexibility, and this may need to be corrected by a type tag in the
data areas, with an auxiliary table. Some Forth systems have produced
direct-threaded code. On many machines direct-threading is faster than
subroutine threading (see reference below)."

The reference in the above paragraph points to a paper by Anton on
threading benchmarks.

Paul Rubin

unread,

Nov 27, 2012, 11:55:48 AM11/27/12

to

Mark Wills <forth...@gmail.com> writes:
> Very nice write-up here about the Forth virtual machine, complete with
> links to our very own Anton Ertl's pages:
> http://www.wordiq.com/definition/Forth_virtual_machine

Note: That is a mirror of the wikipedia article you can reach with the
same title. (I won't attempt answering the ITC vs DTC question as I'd
probably get something wrong, but I'm sure others here can explain it).

Mark Wills

unread,

Nov 27, 2012, 12:01:15 PM11/27/12

to

On Nov 27, 4:55 pm, Paul Rubin <no.em...@nospam.invalid> wrote:

Ah! I didn't realise, thanks. Guess I should have checked. Thanks for
pointing it out! :-)

Hugh Aguilar

unread,

Nov 27, 2012, 8:42:38 PM11/27/12

to

On Nov 27, 9:01 am, Mark Wills <forthfr...@gmail.com> wrote:
> Very nice write-up here about the Forth virtual machine, complete with
> links to our very own Anton Ertl's pages:
>
> http://www.wordiq.com/definition/Forth_virtual_machine
>
> One thing that struck me, it mentions that direct threaded is not as
> "flexible" as ITC. I was wondering in what respect DTC is less
> flexible? Anyone have any opinions/comments?

With ITC it is possible to change how a word is interpreted by
changing the pointer at the cfa. With DTC, by comparison, you don't
have a pointer to the code that interprets the word, but rather you
have the code itself pasted in there. It is a major hassle to patch
this code to change how the word is interpreted. This is all academic
anyway --- I've never heard of anybody doing this. I think that it
would be done to provide a debug-interpretation in which the threaded
code is single-stepped through --- that is the only purpose I can
think of, but I haven't done it. I did write a single-step debugger
for my 65c02 cross-compiler, but it was subroutine-threaded --- if I
wanted to debug, I would recompile with the debug option turned on,
which would cause a BRK instruction to get compiled between every
chunk of code (representing a Forth word in the source-code). My
compiler would keep track of where all of these BRK instructions were,
so that when single-stepping through the program it would display the
correct block of source-code with a smiley-face showing where in the
block we were. It would put the user (me) into query-interpret, so I
could examine the 65c02 as necessary. Also, basic information such as
the parameter and return stacks, and some watch variables, was
continually displayed underneath the display of the source-code block.
This was all written in 16-bit UR/Forth, and the target was an Apple-
IIe computer, and they communicated with an RS-232 serial cable. I
wrote that back in maybe 1989. My application program was a symbolic
math program that would do calculus --- I got as far as determining
the derivative of a function, and reducing the equation to simplest
terms, but never got as far as symbolic integration of functions,
which is much more difficult.

I don't mess with debuggers nowadays --- Paul Rubin may find this hard
to believe, but it is not because I don't know how to write a
debugger, but it is because I find testing functions at the console
immediately after writing them to be more efficient.

BTW: The article listed "return threading" under the heading: "Less
often used are." This most likely is a reference to what I recently
learned over on clax and which those guys called "stack threading."
This is what I'm doing in HostForth. The processor return-stack
pointer (rsp on the 64-bit x86) is used as the Forth IP register. This
only works on big processors that don't use the application program's
return stack for their interrupts --- it won't work on micro-
controllers because an interrupt would overwrite the threaded Forth
code that is executing at the time that the interrupt occurs. I'm just
using it in HostForth because it is convenient and reasonably fast.
NEXT is just a single RET instruction. By comparison, in subroutine-
threading, NEXT is a CALL and a RET, so stack threading is faster for
executing colon words containing mostly primitives. For executing
colon words, stack threading requires DOCOLON code pasted in front of
the threaded code, whereas subroutine-threading still just uses a CALL
and a RET, so subroutine-threading is faster for executing colon words
containing mostly other colon words. Also, with subroutine-threading
you get to compile simple primitives as inline machine-code, which
speeds things up a lot. Mostly what kills the speed in any threaded
system is that branch prediction doesn't work, and so iteration comes
out slow -- all threaded schemes are slow because of this --- but I
don't care with HostForth because the only program that will ever be
written in HostForth is the cross-compiler TargForth, which is not
speed critical as it is all compile-time. TargForth will generate
subroutine-threaded code for the micro-controllers, as that code is
speed critical.

Mark Wills

unread,

Nov 28, 2012, 2:50:22 AM11/28/12

to

On Nov 28, 1:42 am, Hugh Aguilar <hughaguila...@yahoo.com> wrote:
> On Nov 27, 9:01 am, Mark Wills <forthfr...@gmail.com> wrote:
>
> > Very nice write-up here about the Forth virtual machine, complete with
> > links to our very own Anton Ertl's pages:
>
> >http://www.wordiq.com/definition/Forth_virtual_machine
>
> > One thing that struck me, it mentions that direct threaded is not as
> > "flexible" as ITC. I was wondering in what respect DTC is less
> > flexible? Anyone have any opinions/comments?
>
> With ITC it is possible to change how a word is interpreted by
> changing the pointer at the cfa. With DTC, by comparison, you don't
> have a pointer to the code that interprets the word, but rather you
> have the code itself pasted in there.

I don't think that's correct, Hugh. Unless I'm mistaken, you're
thinking of native compiled code.

With DTC, a definition is still a 'thread' of addresses, but they are
the addresses of code, rather than the addresses of addresses of code;
a single cell references exactly one definition, same as ITC.

Maybe the article is mistaken. But I'm trying to think of what the
disadvantages of DTC are. I presume there *are* disadvantages,
otherwise ITC would not have evolved to be the defacto that it was
during the 70's and 80's.

The Rodriguez article sums it up quite nicely:

http://www.bradrodriguez.com/papers/moving1.htm

In the article, Rodriguez states that DTC can result in larger code
size:

"This costs space: every high-level definition in a Z80 Forth (for
example) is now one byte longer, since a 2-byte address has been
replaced by a 3-byte call. But this is not universally true. A 32-bit
68000 Forth may replace a 4-byte address with a 4-byte BSR
instruction, for no net loss. And on the Zilog Super8, which has
machine instructions for DTC Forth, the 2-byte address is replaced by
a 1-byte ENTER instruction, making a DTC Forth smaller on the Super8!"

But there is no mention of a loss of flexibility, which is what the
Wikipedia article states. I can't see any reason for a lack of
flexibility myself.

I guess the original article is simply erroneous, in stating that DTC
is less flexible than DTC.

humptydumpty

unread,

Nov 28, 2012, 5:04:12 AM11/28/12

to

On Tuesday, November 27, 2012 6:01:26 PM UTC+2, M.R.W Wills wrote:
> Very nice write-up here about the Forth virtual machine, complete with
>
> links to our very own Anton Ertl's pages:
>
>
>
> http://www.wordiq.com/definition/Forth_virtual_machine
>
>
>
> One thing that struck me, it mentions that direct threaded is not as
>
> "flexible" as ITC. I was wondering in what respect DTC is less
>
> flexible? Anyone have any opinions/comments?

Adding a level of indirection could lead to easy re-vectoring.

Have a nice day,
humptydumpty

Andrew Haley

unread,

Nov 28, 2012, 5:41:49 AM11/28/12

to

Mark Wills <forth...@gmail.com> wrote:

> On Nov 28, 1:42?am, Hugh Aguilar <hughaguila...@yahoo.com> wrote:
>
>> With ITC it is possible to change how a word is interpreted by
>> changing the pointer at the cfa. With DTC, by comparison, you don't
>> have a pointer to the code that interprets the word, but rather you
>> have the code itself pasted in there.
>
> I don't think that's correct, Hugh.

I'm sure it is.

Andrew.

Mark Wills

unread,

Nov 28, 2012, 5:48:04 AM11/28/12

to

On Nov 28, 10:41 am, Andrew Haley <andre...@littlepinkcloud.invalid>
wrote:

Eh?

Andrew Haley

unread,

Nov 28, 2012, 6:27:21 AM11/28/12

to

Mark Wills <forth...@gmail.com> wrote:
> On Nov 28, 10:41?am, Andrew Haley <andre...@littlepinkcloud.invalid>

> Eh?

I don't understand the problem you're having with my reply. With ITC

it is possible to change how a word is interpreted by changing the

pointer at the cfa. This is simply true, there is no doubt about it,
and your comment is incorrect.

Andrew.

Mark Wills

unread,

Nov 28, 2012, 6:51:15 AM11/28/12

to

On Nov 28, 11:27 am, Andrew Haley <andre...@littlepinkcloud.invalid>

> Andrew.- Hide quoted text -
>
> - Show quoted text -

Oh. Okay. Yes, I see. I should have read Hugh's reply more closely
before posting.

So, in the CFA of an ITC system there is a pointer to DOCOL, DOVAR
whatever/etc. In DTC it's slightly more complicated, since the CFA
field would contain executable code. In my particular processor of
choice, the CFA field would be two cells wide. Yes, I can see that
patching it to change how it is interpreted could be a pain. Though I
suppose a dedicated helper word(s) could be provided to facilitate it.
Like Hugh says, it would be quite a rare occurence.

Okay. I get it. Sorry for the confusion.

Mark

Rod Pemberton

unread,

Nov 28, 2012, 6:58:44 AM11/28/12

to

"Hugh Aguilar" <hughag...@yahoo.com> wrote in message
news:0782b510-4476-423b...@r10g2000pbd.googlegroups.com...

> On Nov 27, 9:01 am, Mark Wills <forthfr...@gmail.com> wrote:

...

> > One thing that struck me, it mentions that direct threaded is
> > not as "flexible" as ITC. I was wondering in what respect
> > DTC is less flexible? Anyone have any opinions/comments?
>
> With ITC it is possible to change how a word is interpreted by
> changing the pointer at the cfa. With DTC, by comparison, you
> don't have a pointer to the code that interprets the word, but
> rather you have the code itself pasted in there. It is a major
> hassle to patch this code to change how the word is interpreted.

True.

With DTC, you need an assembler, since the code is inlined with
the definition's data. With ITC, the Forth can be written in a
HLL language or assembly. This is because of the CFA pointer
allowing you to assign a function or primitive or low-level word
etc which determines how each word is processed. The CFA pointer
allows the common code routines to be separated from the
definition's data. By common code routines, I mean DOCOL or
ENTER, DOSEMIS or EXIT, DOVAR, DOCON, etc. If there are CFA
"primitives" for constants and variables, why it there is no DOSTR
(do string) in Forth? And, if there is no DOSTR, e.g., Forth has
S" , then are DOCON and DOVAR really needed?

> My application program was a symbolic
> math program that would do calculus --- I got as far as
> determining the derivative of a function, and reducing the
> equation to simplest terms, but never got as far as symbolic
> integration of functions, which is much more difficult.

You should talk about that instead of your slide-rule or novice
packages.

Did you use Laplace transforms to solve the calculus equations?
If you're not familiar with them, they can convert many, but not
all, calculus problems into algebra problems. There is a set of
constraints which must be true before using the transforms.

> I don't mess with debuggers nowadays --- Paul Rubin may find
> this hard to believe, but it is not because I don't know how to
> write a debugger, but it is because I find testing functions at
> the console immediately after writing them to be more efficient.

If you can display data and rewrite and/or recompile the code, who
need a debugger?

Rod Pemberton

Mark Wills

unread,

Nov 28, 2012, 7:50:27 AM11/28/12

to

On Nov 28, 11:58 am, "Rod Pemberton" <do_not_h...@notemailnotz.cnm>
wrote:

>
> With DTC, you need an assembler, since the code is inlined with
> the definition's data. With ITC, the Forth can be written in a
> HLL language or assembly. This is because of the CFA pointer
> allowing you to assign a function or primitive or low-level word
> etc which determines how each word is processed. The CFA pointer
> allows the common code routines to be separated from the
> definition's data. By common code routines, I mean DOCOL or
> ENTER, DOSEMIS or EXIT, DOVAR, DOCON, etc. If there are CFA
> "primitives" for constants and variables, why it there is no DOSTR
> (do string) in Forth? And, if there is no DOSTR, e.g., Forth has
> S" , then are DOCON and DOVAR really needed?
>

Forth *does* have a DOSTR. In my system it's called (S"). It gets
compiled into a definition with the string data immediately after it:

: test S" HELLO" ;

see test

test: docol (s") 5 h e l l o exit

Regarding DTC, dealing with the CFA for patching purposes is only
*marginally* more complex, and depending on the processor may not be
any more complex at all.

Like you say, in an ITC you have a *pointer* to code that either calls
DOCOL DOVAR etc, or, if it's a primitive, the pointer points at the
machine code.

In a DTC, the CFA is a CALL or a BRANCH/JMP etc to the
'handler' (DOCOL et al). The rest of the thread is a thread of
addresses. See the Rodriguez article I linked above. I'm sure you've
seen it before.

On my system, the CFA would need 2 cells; 1 cell holds the branch
instruction, the other holds the address of the 'handler'. So, to
patch it, I'd need to target the 2nd cell of the CFA. Words could be
provided to do this without the programmer having to worry about the
internals.

In my case, I *could* get away with a single cell CFA, but I'd need to
store the addresses of the routines in registers and use an indirect
branch. But that's a waste of registers.

So, Hugh was spot on. The *particular example he cited* is an example
of some additional *complexity* though the DTC method (IMO) remains no
less *flexible* - there's nothing that DTC can't do that ITC can (as
far as I can discern).

Even vectoring (for example) via a variable should work just fine:

variable vector
' thing vector !
vector @ execute

I can see no complicating factor in the above in the context of a DTC
system.

I'm simply interested, because in my case, I can get some performance
improvements. I can (I think) lose NEXT althogether. NEXT would
effectively become a single assembly language instruction:

B *IP+

(branch to the *contents* of IP register and increment IP register).

If it's a single instruction, then obviously you just inline it at the
end of all your words, you don't branch to it!

Also, from a boot-strapping perspective, I see no limitation in coding
words in Forth itself once the nucleus is up and running; a definition
is (with the exception of the CFA) a thread of addresses. In the
source code of the kernal, these would simply be symbolic addresses.

Andrew Haley

unread,

Nov 28, 2012, 8:08:25 AM11/28/12

to

Mark Wills <forth...@gmail.com> wrote:

> In a DTC, the CFA is a CALL or a BRANCH/JMP etc to the
> 'handler' (DOCOL et al).

No: in some cases it's the actual code. Consider a CONSTANT, for
example.

Andrew.

Albert van der Horst

unread,

Nov 28, 2012, 8:54:01 AM11/28/12

to

In article <25-dnfvDKpqEaCjN...@supernews.com>,

This may be a good opportunity to show how nice ITC is in this
respect.

The possibility to "patch" an existing word is used in ciforth all
over the place.
Technically it patches the dfa not the cfa as it replaces a pointer
to high level code. So no change to assembler code is needed.

1. You can turn the dictionary (FIND) into a case insensitive one.
2. you can print the stack after each execution, or change the prompt
3. You can turn a word into a postfix word for once, then it flips
back.
4. It is used for turnkeys, by patching into ABORT
5. ?ERROR is patched to allow better error recovery
6. Revectoring I/O by patching TYPE. (Intentionally all output is via TYPE)
7. ALIAS works by copying three words, works for low level too.
8. Run time buffers, that don't take up space in the executable (dfa)
9. Changing in CONSTANT afterwards.

>
>Andrew.
--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

Mark Wills

unread,

Nov 28, 2012, 9:02:32 AM11/28/12

to

On Nov 28, 1:08 pm, Andrew Haley <andre...@littlepinkcloud.invalid>
wrote:

Good point. I think I see where you're leading me with this :-)

It could lead to complications with words like >BODY for example,
where a CFA might not strictly be a 1 or 2 cell wide field.

Again, in my case, I could get around that:

10 CONSTANT TEN

Would lay down something like:

DOCON NOP 10
~~~~~~~~~ ~~
| |
2 field PFA
CFA

The NOP wouldn't actually be executed - it's just filler to line the
PFA up to make the implementation of words such as >BODY simpler.

IIRC I came across this particular issue early on in my ITC system
when implementing DOES>

Anton Ertl

unread,

Nov 28, 2012, 9:18:29 AM11/28/12

to

Why constants? I would expect that on most DTC systems there is a
"JMP/CALL DOCON" at the start of a constant.

For primitives, though, the code address points to the actual code,
not to a jump to the code.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2012: http://www.euroforth.org/ef12/

Andrew Haley

unread,

Nov 28, 2012, 9:23:40 AM11/28/12

to

Mark Wills <forth...@gmail.com> wrote:
> On Nov 28, 1:08?pm, Andrew Haley <andre...@littlepinkcloud.invalid>

> wrote:
>> Mark Wills <forthfr...@gmail.com> wrote:
>> > In a DTC, the CFA is a CALL or a BRANCH/JMP etc to the
>> > 'handler' (DOCOL et al).
>>
>> No: in some cases it's the actual code. Consider a CONSTANT, for
>> example.
>

> Good point. I think I see where you're leading me with this :-)
>
> It could lead to complications with words like >BODY for example,
> where a CFA might not strictly be a 1 or 2 cell wide field.

In DTC there may not be a CFA.

> Again, in my case, I could get around that:
>
> 10 CONSTANT TEN
>
> Would lay down something like:
>
> DOCON NOP 10
> ~~~~~~~~~ ~~
> | |
> 2 field PFA
> CFA
>
> The NOP wouldn't actually be executed - it's just filler to line the
> PFA up to make the implementation of words such as >BODY simpler.

Well, you could, but it would be rather lame IMO: you'd be throwing
away an excellent optimization opportunity, and for what? CONSTANTs
don't have bodies anyway.

Andrew.

Andrew Haley

unread,

Nov 28, 2012, 9:32:14 AM11/28/12

to

Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
> Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>>Mark Wills <forth...@gmail.com> wrote:
>>
>>> In a DTC, the CFA is a CALL or a BRANCH/JMP etc to the
>>> 'handler' (DOCOL et al).
>>
>>No: in some cases it's the actual code. Consider a CONSTANT, for
>>example.
>
> Why constants? I would expect that on most DTC systems there is a
> "JMP/CALL DOCON" at the start of a constant.

It's possible, but a pretty crappy implementation. Why would you do
that when there's such an obvious speedup?

Andrew.

Anton Ertl

unread,

Nov 28, 2012, 9:21:54 AM11/28/12

to

Mark Wills <forth...@gmail.com> writes:
>Very nice write-up here about the Forth virtual machine, complete with
>links to our very own Anton Ertl's pages:
>
>http://www.wordiq.com/definition/Forth_virtual_machine
>
>One thing that struck me, it mentions that direct threaded is not as
>"flexible" as ITC. I was wondering in what respect DTC is less
>flexible? Anyone have any opinions/comments?

You would best ask the author of that page what is meant.

According to Wikipedia:

|A famous aphorism of David Wheeler goes: "All problems in computer
|science can be solved by another level of indirection"

The indirection in ITC was not introduced gratuitiously; in
particular, it allows to treat words more uniformly. I don't know if
that is the flexibility that is meant there, though.

A typical DTC approach in Forth (used in gforth-0.5 on some machines)
would be to emulate ITC by replacing the indirect pointer in the code
field with a jump (or call) to the routine. One loss of flexibility
here is that creating a non-primitive now requires machine-dependent
code.

Another approach is the primitive-centric approach used since
gforth-0.6. There the threaded code contains only primitives and
their immediate arguments, and other words are compiled to a primitive
and an argument, e.g., a call to a colon definition is compiler to the
primitive CALL plus the body address of the colon definition. This
does not require machine-dependent code, but one cannot patch, e.g., a
variable into a deferred word in a way that affects existing uses of
the variable (whereas that works in ITC). Also, COMPILE, now becomes
much more complex.

You might be interested in

@InProceedings{ertl02,
author = {M. Anton Ertl},
title = {Threaded Code Variations and Optimizations (Extended
Version)},
booktitle = {Forth-Tagung 2002},
year = {2002},
address = {Garmisch-Partenkirchen},
url = {http://www.complang.tuwien.ac.at/papers/ertl02.ps.gz},
abstract = {Forth has been traditionally implemented as indirect
threaded code, where the code for non-primitives is
the code-field address of the word. To get the
maximum benefit from combining sequences of
primitives into superinstructions, the code produced
for a non-primitive should be a primitive followed
by a parameter (e.g., \code{lit} \emph{addr} for
variables). This paper takes a look at the steps
from a traditional threaded-code implementation to
superinstructions, and at the size and speed effects
of the various steps.\comment{It also compares these
variants of Gforth to various other Forth
implementations on contemporary machines.} The use
of superinstructions gives speedups of up to a
factor of 2 on large benchmarks on processors with
branch target buffers, but requires more space for
the primitives and the optimization tables, and also
a little more space for the threaded code.}

Anton Ertl

unread,

Nov 28, 2012, 10:00:18 AM11/28/12

to

Because it's closest in many respects to the traditional ITC
implementation. Because it allows defining

: COMPILE, , ;

> when there's such an obvious speedup?

What obvious speedup? How do you get it? How big is it? If it's so
obvious, big, and has no disadvantages, why is it not used in ITC?

Andrew Haley

unread,

Nov 28, 2012, 10:18:15 AM11/28/12

to

Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
> Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>>> Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>>>>Mark Wills <forth...@gmail.com> wrote:
>>>>
>>>>> In a DTC, the CFA is a CALL or a BRANCH/JMP etc to the
>>>>> 'handler' (DOCOL et al).
>>>>
>>>>No: in some cases it's the actual code. Consider a CONSTANT, for
>>>>example.
>>>
>>> Why constants? I would expect that on most DTC systems there is a
>>> "JMP/CALL DOCON" at the start of a constant.
>>
>>It's possible, but a pretty crappy implementation. Why would you do
>>that
>
> Because it's closest in many respects to the traditional ITC
> implementation.

I assume that if you want a traditional ITC implementation you'll use
ITC.

> Because it allows defining
>
> : COMPILE, , ;

It allows that anyway, assuming that ' returns the address of the
start of the code.

> What obvious speedup?

So that

1 CONSTANT 1

generates

mov r0, #1
push r0
next

or perhaps

mov r0, #1
jmp push

or something similar.

> How do you get it? How big is it?

That depends on the architecture.

> If it's so obvious, big, and has no disadvantages, why is it not
> used in ITC?

ITC can't do it as easily.

Andrew.

Anton Ertl

unread,

Nov 28, 2012, 11:36:09 AM11/28/12

to

Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>> What obvious speedup?
>
>So that
>
> 1 CONSTANT 1
>
>generates
>
> mov r0, #1
> push r0
> next
>
>or perhaps
>
> mov r0, #1
> jmp push
>
>or something similar.

I did not think of that. So much for "obvious".

>> How do you get it? How big is it?
>
>That depends on the architecture.

On IA-32 and AMD-64, you would get a big slowdown for many programs
unless you separate the generated native code from the threaded code
(which requires additional work that has not made it into a number of
native code systems yet, 16 years after the issue became known).

>> If it's so obvious, big, and has no disadvantages, why is it not
>> used in ITC?
>
>ITC can't do it as easily.

I don't think that adding a code field makes a significant difference.

1 constant 1 generates

dw *+cell

mov r0, #1
push r0
next

or perhaps

dw *+cell

mov r0, #1
jmp push

or something similar.

Andrew Haley

unread,

Nov 28, 2012, 12:02:04 PM11/28/12

to

Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
> Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>>> What obvious speedup?
>>
>>So that
>>
>> 1 CONSTANT 1
>>
>>generates
>>
>> mov r0, #1
>> push r0
>> next
>>
>>or perhaps
>>
>> mov r0, #1
>> jmp push
>>
>>or something similar.
>
> I did not think of that. So much for "obvious".
>
>>> How do you get it? How big is it?
>>
>>That depends on the architecture.
>
> On IA-32 and AMD-64, you would get a big slowdown for many programs
> unless you separate the generated native code from the threaded code
> (which requires additional work that has not made it into a number of
> native code systems yet, 16 years after the issue became known).

Err, why would

mov r0, #1
jmp push

be slower than

call docon
... the constant 1

? The latter mixes code and data in the same memory area, the former
doesn't. Besides, last time I looked the only penalty was for a write
to the same cache line as the code; that's irrelevant here.

>>> If it's so obvious, big, and has no disadvantages, why is it not
>>> used in ITC?
>>
>>ITC can't do it as easily.
>
> I don't think that adding a code field makes a significant difference.
>
> 1 constant 1 generates
>
> dw *+cell
> mov r0, #1
> push r0
> next
>
> or perhaps
>
> dw *+cell
> mov r0, #1
> jmp push
>
> or something similar.

I suppose it depends on how you count "significant". In the simple
DTC case of

mov r0, #1
jmp push

you've got (depending on architecture) low or zero code expansion with
some performance benefit. It's a tradeoff, like everything else in
interpreter design.

Andrew.

Anton Ertl

unread,

Nov 28, 2012, 12:13:55 PM11/28/12

to

Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>> On IA-32 and AMD-64, you would get a big slowdown for many programs
>> unless you separate the generated native code from the threaded code
>> (which requires additional work that has not made it into a number of
>> native code systems yet, 16 years after the issue became known).
>
>Err, why would
>
> mov r0, #1
> jmp push
>
>be slower than
>
> call docon
> ... the constant 1
>
>? The latter mixes code and data in the same memory area, the former
>doesn't.

Good point. Yes, I found that ITC-like DTC is slower than ITC,
because of this issue.

>Besides, last time I looked the only penalty was for a write
>to the same cache line as the code; that's irrelevant here.

I guess that myopic view is what causes the persistence of this
problem. Now consider:

variable foo
1 constant bar
variable boing
: flip bar foo ! bar boing ! ;
flip

Many native code Forth systems have bet on written data not being in
the same cache line as code, and lost; and actually "not being in the
same cache line" is not enough, thanks to prefetching.

>>>ITC can't do it as easily.
>>
>> I don't think that adding a code field makes a significant difference.

...

>I suppose it depends on how you count "significant". In the simple
>DTC case of
>
> mov r0, #1
> jmp push
>
>you've got (depending on architecture) low or zero code expansion with
>some performance benefit.

It depends more on what you mean with "easy". For me it has to do
with programmer effort, not memory usage.

Andrew Haley

unread,

Nov 28, 2012, 1:03:30 PM11/28/12

to

Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
> Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>
>>Besides, last time I looked the only penalty was for a write
>>to the same cache line as the code; that's irrelevant here.
>
> I guess that myopic view is what causes the persistence of this
> problem.

In what way is this view "myopic"? It is true.

> Now consider:
>
> variable foo
> 1 constant bar
> variable boing
> : flip bar foo ! bar boing ! ;
> flip

BOING is a variable. The problem is that BOING's data shares a cache
line with some code. That's irrelevant here: we're talking about
constants. Their data, which does not change, may be freely mixed
with code.

> Many native code Forth systems have bet on written data not being in
> the same cache line as code, and lost; and actually "not being in
> the same cache line" is not enough, thanks to prefetching.

If a processor prefetches you have to consider its accesses beyond
those explicitly written, but the issue is still that of code and
writable data in the same cache line.

Andrew.

Anton Ertl

unread,

Nov 28, 2012, 1:12:54 PM11/28/12

to

Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>> Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>>>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>>
>>>Besides, last time I looked the only penalty was for a write
>>>to the same cache line as the code; that's irrelevant here.
>>
>> I guess that myopic view is what causes the persistence of this
>> problem.
>
>In what way is this view "myopic"? It is true.
>
>> Now consider:
>>
>> variable foo
>> 1 constant bar
>> variable boing
>> : flip bar foo ! bar boing ! ;
>> flip
>
>BOING is a variable. The problem is that BOING's data shares a cache
>line with some code. That's irrelevant here: we're talking about
>constants. Their data, which does not change, may be freely mixed
>with code.

It's myopic, because it looks only at the constant, as if it existed
in isolation. So I show an example with some surroundings, and you
insist that the surroundings are irrelevant. They are not. They are
the reason why bigForth is 5 times slower then Gforth on cd16sim and
about 3 times for brew.

>If a processor prefetches you have to consider its accesses beyond
>those explicitly written, but the issue is still that of code and
>writable data in the same cache line.

No, the code and the data can be in different cache lines, and you can
still have cache consistency overhead. Also, on older processors (P5,
K6) even read-only data caused this problem.

Andrew Haley

unread,

Nov 28, 2012, 1:32:37 PM11/28/12

to

In the case above it does not matter whether the constant is
implemented with a separate data field or not: there will be a
slowdown if a variable is written in the same cache line as the code.
What does matter is where you put the data for the variable, not the
data for the constant.

>>If a processor prefetches you have to consider its accesses beyond
>>those explicitly written, but the issue is still that of code and
>>writable data in the same cache line.
>
> No, the code and the data can be in different cache lines, and you can
> still have cache consistency overhead.

You'll have to explain that a bit more; I don't understand the point
you're making.

Andrew.

Alex McDonald

unread,

Nov 28, 2012, 2:29:13 PM11/28/12

to

On Nov 28, 5:13 pm, an...@mips.complang.tuwien.ac.at (Anton Ertl)
wrote:

variable foo ok
1 constant bar ok
variable boing ok
: flip bar foo ! bar boing ! ; ok
see flip
: flip ( ? -- ? )
\ std call compiles; code=$41A76C len=20 type=1
\ defined in (console)
( $0 ) mov dword { $805164 } $1 \
C7056451800001000000
( $A ) mov dword { $805168 } $1 \
C7056851800001000000
( $14 ) ret \ C3 ( end ) ok

There are separate code and data areas in this native code Forth
(actually, it's a mixture of ITC and STC) as can be seen from the code
address and the addresses of the variables. On the micro benchmark
from Stephen Pelc's website, the speedup in having code and data in >
cache line distant areas is huge. If I force the code section to the
data section and use a traditional single space, the slowdown is 2.5-3
times; and the timings are very variable from run to run.

There are other advantages, although you can't do anything with this
"feature" given the ANS spec;

create x
10 value a
20 value b

A and B are adjacent cells, so x 2@ works as though this had been
specified as

create x 10 , 20 ,

[snip]

Hugh Aguilar

unread,

Nov 29, 2012, 12:56:41 AM11/29/12

to

On Nov 28, 4:51 am, Mark Wills <forthfr...@gmail.com> wrote:
> On Nov 28, 11:27 am, Andrew Haley <andre...@littlepinkcloud.invalid>
> wrote:
>
> > Mark Wills <forthfr...@gmail.com> wrote:
> > > On Nov 28, 10:41?am, Andrew Haley <andre...@littlepinkcloud.invalid>
> > > wrote:
> > >> Mark Wills <forthfr...@gmail.com> wrote:
> > >> > On Nov 28, 1:42?am, Hugh Aguilar <hughaguila...@yahoo.com> wrote:
>
> > >> >> With ITC it is possible to change how a word is interpreted by
> > >> >> changing the pointer at the cfa. With DTC, by comparison, you don't
> > >> >> have a pointer to the code that interprets the word, but rather you
> > >> >> have the code itself pasted in there.
>
> > >> > I don't think that's correct, Hugh.
>
> > >> I'm sure it is.
>
> > > Eh?
>
> > I don't understand the problem you're having with my reply. With ITC
> > it is possible to change how a word is interpreted by changing the
> > pointer at the cfa. This is simply true, there is no doubt about it,
> > and your comment is incorrect.
>
> > Andrew.
>

> Oh. Okay. Yes, I see. I should have read Hugh's reply more closely
> before posting.
>
> So, in the CFA of an ITC system there is a pointer to DOCOL, DOVAR
> whatever/etc. In DTC it's slightly more complicated, since the CFA
> field would contain executable code. In my particular processor of
> choice, the CFA field would be two cells wide. Yes, I can see that
> patching it to change how it is interpreted could be a pain. Though I
> suppose a dedicated helper word(s) could be provided to facilitate it.
> Like Hugh says, it would be quite a rare occurence.
>
> Okay. I get it. Sorry for the confusion.
>
> Mark

I think that you have got it, but I'm not sure, so I'll go over it
again:

1.) In ITC what you have in front of the threaded code of the colon
word (or the body of the whatever), is a single pointer to the
interpreter for that kind of word (DOCOLON for colon words, etc.). It
is easy to store a different pointer in that slot, so the word will be
interpreted differently (DOCOLON-WITH-SINGLE-STEPPING for colon words,
for example).

2.) In DTC what you have in front of the threaded code of the colon
word (or the body of the whatever), is the actual machine-code of the
interpreter for that kind of word (DOCOLON for colon words, etc.). It
is difficult to patch this code, because it is actual code, rather
than a pointer to some code.

3.) There is a kind of hybrid between ITC and DTC. This is DTC in the
sense that we have machine-code in front of each word. However, this
machine-code always consists of a single CALL instruction to DOCOLON
etc.. When a CALL is executed, it puts the address just after itself
on the processor return-stack. Normally this is for RET to use to go
back. Here is the clever part though: this is the address of the body
of the Forth word. DOCOLON can load this address into the IP and begin
interpreting. In this case, you get DTC which is faster than ITC, but
you also get an easy way to change how a word is interpreted (just
store a new pointer into the operand of the CALL instruction).

The PDP-11 had an interesting feature. The JSR (its term for CALL)
would store the address after itself into a register, and it would
first push that register onto the return-stack. Effectively, the top
value of the return-stack was held in a register. But that register
could be your IP! You have DTC code and just do a JSR to DOCOLON (#3
above), and DOCOLON automatically gets the address of the threaded
code loaded into the IP. I figured this out way back in 1985 when I
was taking a class in assembly-language at the city college, which was
PDP-11. This works so well, that I had to suppose that the designers
of the PDP-11 were Forth programmers, or at least, were trying to
support DTC threaded code. I've never seen this feature on any other
processor. Even in 1985 though, the PDP-11 was obsolete --- the city
college was still teaching it just because they had all the textbooks,
but the professor cheerfully admitted that the PDP-11 was obsolete and
we would never use what we learned in the real world. I've always
thought that the PDP-11 was pretty cool though --- I wish somebody
would come out with a micro-controller that runs PDP-11 code and RT11
and all that --- maybe on an FPGA.

BTW: There is a discussion of threading over on comp.lang.asm.x86:
https://groups.google.com/group/comp.lang.asm.x86/browse_thread/thread/971adcb57df96272

Mark: Since your TI Forth system is ITC, why don't you take a stab at
writing a single-step source-level debugger? As I mentioned, I wrote
one for my 65c02 system. It is not as difficult as you might suppose.
I did it with screen-file source-code. It can be done with seq-file
source-code though, I would suppose. I don't think that a debugger is
all that useful, but writing one is pretty interesting --- and your
users will be impressed. :-)

Have fun! Hugh

Hugh Aguilar

unread,

Nov 29, 2012, 1:25:42 AM11/29/12

to

On Nov 28, 4:58 am, "Rod Pemberton" <do_not_h...@notemailnotz.cnm>
wrote:
> "Hugh Aguilar" <hughaguila...@yahoo.com> wrote in message

> > My application program was a symbolic
> > math program that would do calculus --- I got as far as
> > determining the derivative of a function, and reducing the
> > equation to simplest terms, but never got as far as symbolic
> > integration of functions, which is much more difficult.
>
> You should talk about that instead of your slide-rule or novice
> packages.
>
> Did you use Laplace transforms to solve the calculus equations?
> If you're not familiar with them, they can convert many, but not
> all, calculus problems into algebra problems. There is a set of
> constraints which must be true before using the transforms.

Yes, I used the Laplace transforms.

BTW: There is an analogy between slide-rules and calculus.

A slide-rule is an analog computer. It theoretically provides infinite
precision, but in practice precision depends upon how thin the marks
are and how sharp your eyes are, which is good for about 3 digits. The
slide-rule has been obsoleted by digital computers which have finite
precision (a 64-bit mantissa on the x86), but in practice this
precision is more than enough.

Calculus is an analog technology. It theoretically provides infinite
precision because you get a function which is the integral of the
function that you want integrated. In practice, it is a hassle,
because you have to use your brain to integrate the function and get
the integral. Calculus has been obsoleted by digital computers which
just run a numeric integration on the original function. You get the
result to a reasonable precision --- without ever obtaining the actual
function for the integral, and without ever using your brain at all.

I don't think that anybody cares about calculus nowadays. Being able
to do integrals is not a practical skill --- it is about as useful as
being able to operate a slip-stick --- not something that you want to
mention during a job interview.

I was interested in calculus at the time because my brother was going
to college as a math major. He has long since graduated and forgotten
about all that stuff, and I've forgotten about it too --- that was
over 20 years ago. Also, I don't really know anything about math
beyond calculus, which is freshman-level math --- so I'm not a good
candidate for writing a symbolic math program that does anything
beyond freshman-level math. That is like standing on the beach with
the waves coming up to your knees, as compared to actually swimming in
the ocean.

Rod Pemberton

unread,

Nov 29, 2012, 4:12:16 AM11/29/12

to

"Hugh Aguilar" <hughag...@yahoo.com> wrote in message

news:bc947560-31a1-455d...@nl3g2000pbc.googlegroups.com...
...

> I don't think that anybody cares about calculus nowadays. Being
> able to do integrals is not a practical skill --- it is about
> as useful as being able to operate a slip-stick --- not
> something that you want to mention during a job interview.

It's true that few like doing calculus and true that many never
needed it in their careers. But, computers also offer the ability
to calculate or compute just about anything, e.g., Wolfram Alpha:

http://www.wolframalpha.com/

Personally, I see no point in teaching calculus, taxes, or even
cursive writing, but not because they're not important. Latin is
important too, if you're say a Bible Scholar. If students spend
their time learning calculus, that's time they won't have for
other tasks. So, if a computer can do the algebra and calculus,
then they can learn or do something more advanced. There is no
point in wasting time doing a task that a machine can do for you.
This is the computer equivalent of the industrial revolution.

Rod Pemberton

Rod Pemberton

unread,

Nov 29, 2012, 4:15:03 AM11/29/12

to

"Mark Wills" <forth...@gmail.com> wrote in message
news:9247b279-1506-4cbe...@w7g2000vbb.googlegroups.com...

> On Nov 28, 11:58 am, "Rod Pemberton"
> <do_not_h...@notemailnotz.cnm> wrote:

...

> > With DTC, you need an assembler, since the code is inlined
> > with the definition's data. With ITC, the Forth can be written
> > in a HLL language or assembly. This is because of the CFA
> > pointer allowing you to assign a function or primitive or
> > low-level word etc which determines how each word is
> > processed. The CFA pointer allows the common code routines to
> > be separated from the definition's data. By common code
> > routines, I mean DOCOL or ENTER, DOSEMIS or EXIT, DOVAR,
> > DOCON, etc. If there are CFA "primitives" for constants and
> > variables, why it there is no DOSTR (do string) in Forth? And,
> > if there is no DOSTR, e.g., Forth has S" , then are DOCON and
> > DOVAR really needed?
>
>
> Forth *does* have a DOSTR. In my system it's called (S"). It
> gets compiled into a definition with the string data immediately
> after it:

Ok. I didn't recognize that as a CFA "primitive" ...

I used (S") to implement S" 's run-time. ( ... ) seems to be de
facto notation for naming Forth word's run-time... But, I wrote
(S") in Forth. Should it have been a "primitive"?

> See the Rodriguez article I linked above. I'm sure
> you've seen it before.

Yes.

Sorry, nothing else to add.

Rod Pemberton

Mark Wills

unread,

Nov 29, 2012, 4:26:30 AM11/29/12

to

> BTW: There is a discussion of threading over on comp.lang.asm.x86:https://groups.google.com/group/comp.lang.asm.x86/browse_thread/threa...

>
> Mark: Since your TI Forth system is ITC, why don't you take a stab at
> writing a single-step source-level debugger? As I mentioned, I wrote
> one for my 65c02 system. It is not as difficult as you might suppose.
> I did it with screen-file source-code. It can be done with seq-file
> source-code though, I would suppose. I don't think that a debugger is
> all that useful, but writing one is pretty interesting --- and your
> users will be impressed. :-)
>

> Have fun! Hugh- Hide quoted text -

>
> - Show quoted text -

Hi Hugh,

Thanks for the clarification.

I'm going to take a serious look at DTC because in my case, it's low
hanging fruit in terms of a 'cheap' way to gain a performance boost.
It's already very fast for what it is, running on a 3 mHZ 16-bit chip
with a multiplexed 8 bit data bus (the fastest Forth ever produced for
that machine).

I figure I can effectively lose DOCOL and EXIT altogether. What I mean
is, normally DOCOL and EXIT are written as subroutines that each colon
definition calls, as described by you above. Well, a branch
instruction on the 9900 is a 4 byte instruction; 2 bytes for the
instruction, and two bytes for the address (it's only a 2 byte
instruction if jumping via a register, but I digress).

The code for DOCOL would be:

DECT RP ; create entry on return stack (DECT=decrement by two)
MOV IP,*RP ; move instruction pointer value to return stack

That's also four bytes (2x 2byte instructions). So no point jumping to
a subroutine; doing so slows things down!

EXIT becomes

MOV *RP+,IP ; pop return address into IP
B *IP ; go run that code

Also 4 bytes. So, again this gets inlined.

So there's a speed advantage right there. In addition, the single
level of indirection in DTC will increase performance (of the
"interpreter") by around 50% I think.

I'll sit down with a pencil and paper (my favourite method of working
stuff out) and the instruction set cycle counts at the weekend and
work out the saving of the overhead of DOCOL and EXIT. I think it'll
be a no-brainer though.

Also, primitives will not need a NEXT subroutine. Currently, in my
ITC, all primitives call NEXT at the end which moves along the thread
and causes the next word in the thread to be executed. Of course, the
branch to NEXT is again 4 bytes. But in DTC, I think it'll be two
bytes:

B *IP+ \ branch the next address in the thread

So each primitive executes the next word in the thread. There is no
"NEXT" - it's a single instruction. Again, huge payoff.

The more I think about, the more I convince myself!

As I say, DTC is relatively low hanging fruit; I don't if there would
be major complications with converting things like DOES> over. I'll
take a look at that.

The next step after that would be to have the compiler generate
machine code (subroutine threaded) but, on the TMS9900, which has no
stacks at all, there is I think no point in this. STC would possibly
be slower than DTC code.

Interesting stuff.

I'm not man enough to look at things like peephole optimisation on
native-code generating compilers. That's above my paygrade for now ;-)
I'm very interested in it though. And I wouldn't spend the time for
that on a 30 year old hobby. I'd move to an ARM project board and
study it in the context of an ARM based system.

But that's for another day. When I win the lottery, then I'll get onto
it!

Anton Ertl

unread,

Nov 29, 2012, 9:30:18 AM11/29/12

to

Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:

[...]

>In the case above it does not matter whether the constant is
>implemented with a separate data field or not: there will be a
>slowdown if a variable is written in the same cache line as the code.
>What does matter is where you put the data for the variable, not the
>data for the constant.

Certainly. Traditionally Forth systems have just one space where they
put things, and that tradition persists even into native code systems,
even to this day (although some have been fixed). So a DTC system
will likely have just one space, too, and will run into this problem.

You are right though, that ITC-emulating DTC systems will have this
problem anyway. Primitive-centric systems to a lesser degree (only
for EXECUTE etc.), and hybrid direct/indirect threaded systems only
for CODE words.

>> No, the code and the data can be in different cache lines, and you can
>> still have cache consistency overhead.
>
>You'll have to explain that a bit more; I don't understand the point
>you're making.

If there is a cache line containing code followed by a cache line
containing data, on (at least) some CPUs the code prefetcher will
prefetch the data line and incur cache consistency overhead. See
<http://www.complang.tuwien.ac.at/misc/pentium-switch/> for data.

Bernd Paysan

unread,

Nov 29, 2012, 12:05:30 PM11/29/12

to

Andrew Haley wrote:
> You'll have to explain that a bit more; I don't understand the point
> you're making.

Let's give some typical code you can find in many applications; in this
case we use it as micro-benchmark, it's a variable and the corresponding
modifier function side by side:

Variable foo#
: foo+ 1 foo# +! ;
: foos 0 ?DO foo+ LOOP ;

Test 1 is with foo#'s slot being in a different cache line as foo+ on a
Core2:

!time #1000000 foos .time 0,005371 sec ok
' foo+ . 1003350C ok
foo# . 100334F8 ok

Test 2 is with foo#'s slot and foo+ in the same cache line, same Core2:

!time #1000000 foos .time 0,185575 sec ok
' foo+ . 1003351C ok
foo# . 10033508 ok

Both tests done with bigForth. Factor 37. This is really significant.

Go to a different CPU, now instead of Core2, we chose an AMD Zacate, to
see how that is affected by the problem:

Test 1: 0,008208 sec ok
Test 2: 0,319169 sec ok

Factor 39. Not that different.

Same test with vfxForth on the same Zacate, for timing I use

LocalExtern: gettimeofday int gettimeofday ( int * , int * );
2variable tz 2variable td
: @time td tz gettimeofday drop td 2@ 1000000 um* rot 0 d+ ;

Test 1: 0.007708s
Test 2: 0.319617s

Factor 41.5 (the factor is higher, because VFX creates code that is a
bit faster than bigForth's code).

Test 2 in VFX is more difficult to achieve than in bigForth. First of
all, Stephen *does* now use a different memory pool for variables and
buffers, so if you declare foo# as VARIABLE, it will get allocated from
somewhere else (no problem). If you however declare foo# with CREATE
FOO# 0 , it will be close to the executed code, but the tokenizer pads
enough between FOO# and the actual increment instruction that it doesn't
affect Zacate.

The lesson lerned is that:

* You should separate code and data
* VFX does a good job now on variables, but on CREATEd words, it is
accidential
* bigForth doesn't, and as it compiles dense code without additional
information, it quickly exposes the problem
* If you use such a system, declare your variables first, add maybe some
padding, and then write your code

If I was going to write another code generator, I'd very likely use
Gforth's approach: EXECUTE has an indirection (this is pretty cheap on
current CPUs), and the dictionary contains variables, headers, and
pointer to the actual code, but not the code itself. It might also be
possible to change CREATE so that it uses a different area for code and
data, so that having a direct EXECUTE is possible. Separated headers
might give some benefits, but with a hash table, you don't pollute the
cache much.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/

Elizabeth D. Rather

unread,

Nov 29, 2012, 1:52:24 PM11/29/12

to

Yes, that's a common and helpful naming convention. It's also useful for
naming the core functionality of something that has a wrapper (named
without the parens) with external stuff such as setup, error checking,
or a CATCH.

That doesn't necessarily imply that the lower-level word is a
"primitive", although factoring it like that makes it easier to recode
the low-level word if necessary for performance.

Cheers,
Elizabeth

>> See the Rodriguez article I linked above. I'm sure
>> you've seen it before.
>
> Yes.
>
> Sorry, nothing else to add.
>
>
> Rod Pemberton
>
>
>

--
==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
http://www.forth.com

"Forth-based products and Services for real-time
applications since 1973."
==================================================

Alex McDonald

unread,

Nov 29, 2012, 2:19:43 PM11/29/12

to

The design decision for my modified Win32Forth was to have 3 areas;
code, dictionaries and data. Although the dictionaries and data could
be in one area, the advantages of having an identifiable range of
addresses for each outweighs the very slight (for the implementer) to
non-existent (for the user) inconvenience.

create x 0 , ok
see x
create x ( addr $805164 ) ( 0 -- 1 )
\ ' (comp-cons) compiles; code=$41A71A len=10 type=13
\ defined in (console)
( $0 ) mov ecx $805164 \ B964518000
( $5 ) jmp ' dovar \ E9E068FEFF
( end ) ok

The data address is an immediate. CREATE VARIABLE VALUE and their
children can all use >BODY as the code is identical for each; the JMP
is to DOVAR (sets top of stack from ecx) or DOVAL (fetches tos from
[ecx]). CONSTANTs have no data area, and the value is the constant
directly.

22 constant y ok
see y
22 constant y ( 0 -- 1 )
\ ' (comp-cons) compiles; code=$41A728 len=10 type=11
\ defined in (console)
( $0 ) mov ecx $16 \ B916000000
( $5 ) jmp ' dovar \ E9D268FEFF
( end ) ok

Compiled code eliminates all but references to the data area; the
interpret code shown is not used for these primitives.

: foo x y ; ok
see foo
: foo ( ? -- ? )
\ std call compiles; code=$41A75C len=18 type=1
\ defined in (console)
( $0 ) mov dword { $-4 ebp } eax \ 8945FC
( $3 ) mov eax $16 \ B816000000
( $8 ) mov dword { $-8 ebp } $805164 \ C745F864518000
( $F ) sub ebp $8 \ 83ED08
( $12 ) ret \ C3 ( end ) ok

Hugh Aguilar

unread,

Nov 30, 2012, 1:34:21 AM11/30/12

to

On Nov 29, 2:26 am, Mark Wills <forthfr...@gmail.com> wrote:
> On Nov 29, 5:56 am, Hugh Aguilar <hughaguila...@yahoo.com> wrote:
> > Mark: Since your TI Forth system is ITC, why don't you take a stab at
> > writing a single-step source-level debugger? As I mentioned, I wrote
> > one for my 65c02 system. It is not as difficult as you might suppose.
> > I did it with screen-file source-code. It can be done with seq-file
> > source-code though, I would suppose. I don't think that a debugger is
> > all that useful, but writing one is pretty interesting --- and your
> > users will be impressed. :-)

This suggestion was a bad idea. I wasn't thinking straight when I said
that.

I was able to write a source-level debugger because my 65c02 Forth was
a cross-compiler and it was running on an MS-DOS machine (it was
written in UR/Forth). The compiler needs to generate a large data
structure containing the addresses of every word compiled in every
definition. When the single-stepper is running, it stops at every one
of these (a BRK instruction in my system, although in an ITC system
DOCOLON will stop on every word). This address is looked up and the
corresponding source-code is displayed. The host computer has to have
a lot of memory for that gigantic data-structure, and it has to be
pretty fast. You don't want to try this on a 1980s vintage TI99/4A
computer --- you don't have the memory or the speed to do this --- it
taxed the limits of the 80386 computer that I was using as a host.

When I wrote that suggestion, I had forgotten that you don't have a
cross-compiler, but have an on-board Forth.

> Hi Hugh,
>
> Thanks for the clarification.
>
> I'm going to take a serious look at DTC because in my case, it's low
> hanging fruit in terms of a 'cheap' way to gain a performance boost.
> It's already very fast for what it is, running on a 3 mHZ 16-bit chip
> with a multiplexed 8 bit data bus (the fastest Forth ever produced for
> that machine).

For your old 16-bit computer, DTC should help to speed up the system.
It will also make the programs slightly larger. Instead of a pointer
in front of each colon word, you have a chunk of code. It is true that
NEXT should be smaller, so every primitive will be slightly smaller,
but this won't reduce the size of the system very much --- overall,
more memory will be needed.

A better way to boost the speed, is with some optimization. In many
cases, there are pairs of producers and consumers. For example, LIT is
a producer because it produces some data for the parameter stack, and
+ is a consumer because it consumes some data from the parameter
stack. These pairs are inefficient because the producer pushes data
onto the stack, and the consumer immediately pops that data off the
stack. The solution is to combine them into a single word. For
example, write a primitive LIT_+ that combines what LIT and + do. It
would hold the literal value in a register rather than push it onto
the stack and then pop it off again.

Even with ITC, it is possible to optimize pairs like this. Make your
compiler smart enough to remember what the last word it compiled was.
When it is ready to compile the next word, it checks what the last
word was and, if they are an optimizable pair, it compiles the combo
instead. For example, if the last thing you did was LIT, when you are
about to compile + your compiler will instead back up and get rid of
the LIT and replace it with LIT_+. This kind of peephole-optimization
not only makes your program faster, but smaller as well.

You are right though, that DTC is low-hanging fruit, and much easier
to implement. Peephole-optimization is somewhat more difficult, but
not unreasonably difficult. You can do the peephole-optimization on a
piece-meal basis. Start with + and make it smart enough to combine
with all the likely producers, then do ! and +! and so forth --- you
don't have to do everything at once, just doing + should boost the
speed significantly, and you can go from there.

BTW: I'm switching from DTC to ITC on my system. This is because I
realize (from reading this thread!), that with DTC the DOCOLON code is
scattered all around and won't be in the code cache, whereas with ITC
the whole VM should be in the code cache. That is only an issue on big
processors such as the modern x86 --- it is not an issue on the
TI99/4A. Also, listening to Paul Rubin promote single-step debuggers
has made me feel inclined to provide one for my system, and that is
easier with ITC than DTC --- I don't really like to use a debugger,
but other people do, and it isn't difficult to implement, so I might
as well go ahead and provide one.

Andrew Haley

unread,

Nov 30, 2012, 4:14:46 AM11/30/12

to

Bernd Paysan <bernd....@gmx.de> wrote:
> Andrew Haley wrote:
>> You'll have to explain that a bit more; I don't understand the point
>> you're making.
>
> Let's give some typical code you can find in many applications; in this
> case we use it as micro-benchmark, it's a variable and the corresponding
> modifier function side by side:

Umm no, that's not what I was asking. I was asking why Anton said no
when as far as I could tell he was agreeing with me. But never mind.

> The lesson lerned is that:
>
> * You should separate code and data
> * VFX does a good job now on variables, but on CREATEd words, it is
> accidential
> * bigForth doesn't, and as it compiles dense code without additional
> information, it quickly exposes the problem
> * If you use such a system, declare your variables first, add maybe some
> padding, and then write your code
>
> If I was going to write another code generator, I'd very likely use
> Gforth's approach: EXECUTE has an indirection (this is pretty cheap
> on current CPUs), and the dictionary contains variables, headers,
> and pointer to the actual code, but not the code itself. It might
> also be possible to change CREATE so that it uses a different area
> for code and data, so that having a direct EXECUTE is possible.

I don't understand the point of this. Surely you just put everything
write-once (which includes the code and the dictionary entries) in one
area and everything variable (HERE and ALLOT) in another. EXECUTE
doesn't need an indirection because ' can return a code address;
only >BODY needs the indirection.

Andrew.

Mark Wills

unread,

Nov 30, 2012, 4:42:20 AM11/30/12

to

On Nov 30, 6:34 am, Hugh Aguilar <hughaguila...@yahoo.com> wrote:
> On Nov 29, 2:26 am, Mark Wills <forthfr...@gmail.com> wrote:
>
> > On Nov 29, 5:56 am, Hugh Aguilar <hughaguila...@yahoo.com> wrote:
> > > Mark: Since your TI Forth system is ITC, why don't you take a stab at
> > > writing a single-step source-level debugger? As I mentioned, I wrote
> > > one for my 65c02 system. It is not as difficult as you might suppose.
> > > I did it with screen-file source-code. It can be done with seq-file
> > > source-code though, I would suppose. I don't think that a debugger is
> > > all that useful, but writing one is pretty interesting --- and your
> > > users will be impressed. :-)
>
> This suggestion was a bad idea. I wasn't thinking straight when I said
> that.
>

Oh. Well. Now you've gone and thrown the gauntlet down, haven't
you?! ;-)

> I was able to write a source-level debugger because my 65c02 Forth was
> a cross-compiler and it was running on an MS-DOS machine (it was
> written in UR/Forth). The compiler needs to generate a large data
> structure containing the addresses of every word compiled in every
> definition. When the single-stepper is running, it stops at every one
> of these (a BRK instruction in my system, although in an ITC system
> DOCOLON will stop on every word). This address is looked up and the
> corresponding source-code is displayed. The host computer has to have
> a lot of memory for that gigantic data-structure, and it has to be
> pretty fast. You don't want to try this on a 1980s vintage TI99/4A
> computer --- you don't have the memory or the speed to do this --- it
> taxed the limits of the 80386 computer that I was using as a host.
>

Well, I have already written a simple debugger. It's not a single
stepper, though. It works like this:

You load the debugger and it modifies : and ; such that each
subsequently defined colon definition makes a call into a word (can't
remember what it's called) that displays the name of the executing
word, and the data-stack.

As it goes, the depth of the return stack is measured, and it uses
this to produce indentations on the on-screen display - thus one can
see how one's program nests and un-nests.

A new word is defined, BREAK, which stops the program, returning to
the command line, and giving a full return stack dump. You can scatter
BREAKs around your code at point where you think it may be going awry.

For example:

: test swap break ;
: harry 3 test ;
: dick 2 harry ;
: tom 1 dick ;
tom

And you'd get output that looked like this:

>tom (1) 1
>dick (2) 1 2
>harry (3) 1 2 3
>test (3) 1 2 3
BREAK in test in harry in dick in tom

Without a break, if you just let the program run, you'd get:

>tom (1) 1
>dick (2) 1 2
>harry (3) 1 2 3
>test (3) 1 2 3
<test (3) 1 3 2
<harry (3) 1 3 2
<dick (3) 1 3 2
<tom (3) 1 3 2

It's fairly simple to extend the above into a single step debugger. An
on-screen display showing the definition currently being executed with
a cursor pointing to the current word is less trivial, but it
possible. Again, I have a starting point, in that I already have SEE
for my system. So I already have code to de-compile a word. So it's
possible, and doesn't require a large list/table in memory, it's just
a different technique. In fact it would be an interesting excercise!

There's just one little itsy bitsy problem: Since I wrote my TRACER
program, I've used it once. And that was to demo it to someone else.
And the only reason I was showing it was to show them that facilities
such as a tracer/debugger can be written in Forth itself and the Forth
environment simply augmented with the functionality (they were
suitably impressed). I don't think I've touched it since. I just debug
at the command line.

In fact, I rarely use SEE. I only use SEE if I'm debugging a compiling
word. The last time I used SEE was a couple of weeks ago when I was
implementing your MACRO: idea (duly implemented as a loadable
extension and working beautifully - thank you for the inspiration!).
SEE has limitations (at least on my system) because some subroutines
in my system are headerless (don't have dictionary entries) so they
display as a ? when de-compiled. It's no problem to me, since I know
what's going on. But a newbie would wonder what's going on.
Unfortunately I don't have the ROM space available to allow headers
for everything. For example, DOES> compiles a DODOES, but DODOES is
headerless. This would be a problem in a single stepping debugger,
because it would not be possible to display the names for headerless
words. This is a limitation of my system due to memory constraints. I
only have 16K. My Forth system is implemented as a plug in cartridge:

http://turboforth.net/about_turboforth.html

>
> For your old 16-bit computer, DTC should help to speed up the system.
> It will also make the programs slightly larger. Instead of a pointer
> in front of each colon word, you have a chunk of code. It is true that
> NEXT should be smaller, so every primitive will be slightly smaller,
> but this won't reduce the size of the system very much --- overall,
> more memory will be needed.
>

I had a look at this yesterday and got myself tied up in knots. I
couldn't work out how to bootstrap the thing; to get it started. How
does the 'interpreter' for a high-level definition execute the words
in the thread. I couldn't figure it out in my lunch break and had to
junk what I had done. Need more time to concentrate. I was missing
something very fundamental. I was using the : SQUARE DUP * ; as my
target but didn't get anywhere.

> A better way to boost the speed, is with some optimization. In many
> cases, there are pairs of producers and consumers. For example, LIT is
> a producer because it produces some data for the parameter stack, and
> + is a consumer because it consumes some data from the parameter
> stack. These pairs are inefficient because the producer pushes data
> onto the stack, and the consumer immediately pops that data off the
> stack. The solution is to combine them into a single word. For
> example, write a primitive LIT_+ that combines what LIT and + do. It
> would hold the literal value in a register rather than push it onto
> the stack and then pop it off again.

This is an excellent suggestion. Perhaps an easier way (at least, in
terms of performing optimisations) is to make every word in the
dictionary immediate. Then, every word can 'look ahead' and see what
is about to be compiled and intervene accordingly. It would be very
difficult to produce a standard Forth with such a system though! I
wonder if anyone has previously experimented with such a technique?

>
> Even with ITC, it is possible to optimize pairs like this. Make your
> compiler smart enough to remember what the last word it compiled was.
> When it is ready to compile the next word, it checks what the last
> word was and, if they are an optimizable pair, it compiles the combo
> instead. For example, if the last thing you did was LIT, when you are
> about to compile + your compiler will instead back up and get rid of
> the LIT and replace it with LIT_+. This kind of peephole-optimization
> not only makes your program faster, but smaller as well.
>

I'll add your peephole suggestion to my "things to look at in the next
version" list. The next version (V2.0) is the version that I tell
myself I'm *not* going to write, but I know I 99.9% probably will.
It's like a bloody drug. It's the classic symptom of wanting to start
with a clean sheet, to implement all the 'lessons learned' that you
spent blood, sweat and tears learning on the first implementation.
There are many aspects of V1.x that have been re-written a couple of
times as I learned (from other Forthers, some here on this list) or
simply discovered (as part of the Forth awakening procees) better way
to do things.

[ and to the nay-sayers: I *do* write Forth code too. Not just a
compiler. But my Forth coding is for fun. I'm still learning. I write
stuff like this:

http://turboforth.net/tutorials/darkstar.html ]

I also want to spend some time looking at Smalltalk though (a project
for 2013) - not writing a smalltalk system, just learning the
language. It's the OOP equivalent of Forth. It's beautiful (though
very slow, I believe). It looks very interesting indeed to me. Despite
being pure OO it shares the idea of terseness and brevity and total
simplicity that Forth has.

Things for 2013:
* VFX (I want to do some simple SCADA stuff using serial and IP comms)
* Smalltalk
* TurboForth V2.0 (maybe - it'll be a part-time when-feel-like-it
thing)

> You are right though, that DTC is low-hanging fruit, and much easier
> to implement. Peephole-optimization is somewhat more difficult, but
> not unreasonably difficult. You can do the peephole-optimization on a
> piece-meal basis. Start with + and make it smart enough to combine
> with all the likely producers, then do ! and +! and so forth --- you
> don't have to do everything at once, just doing + should boost the
> speed significantly, and you can go from there.
>

Yeah. You've got me thinking now! I need a lot more memory to do this
though than I currently have. Still, I plan to make the next version a
64K EPROM but I can go up to 128K in an eprom if I need to. That's 16
8K pages which is a PITA, but doable.

> BTW: I'm switching from DTC to ITC on my system. This is because I
> realize (from reading this thread!), that with DTC the DOCOLON code is
> scattered all around and won't be in the code cache, whereas with ITC
> the whole VM should be in the code cache.

Well, if your high-level definitions make a CALL to DOCOLON then
there's no reason why DOCOLON would not be in the cache. The expense
of the call might be less than the delay induced by a cache miss.

However, I'd urge you to take a step back and a deep breath. I was
reading your post on the x86 group where you are discussing caching
etc. However, I have to point out that if the Forth you are intending
to produce is primarily for embedded systems then the chances of the
embedded system running on an x86 processor are quite low. It's much
more likely to be an ARM variant. In other words, don't allow key
descisions about the architecture of your system to be guided by
relatively un-important architectural constraints of a particular
processor family.

I'd urge you to get out a notebook and pencil. Sit down somewhere
quiet and write a list of key things that you want the system to do.
Design goals. Then put the list away. Reflect on it for a couple of
days and go back and make changes. Iterate. Eventually your thoughts/
ideas/requirements will coalesce. There's your plan/design goals. When
you've got it done, pin it up on the wall above your computer. Let it
be your guide as you develop the *project*, and when you feel a knee-
jerk step-change coming on, consult the plan again! Don't be swayed.
Stick to the plan. Have faith in the design decisions you made
earlier, even if you've had a bright idea.

I failed to make a plan/design and ended up with many many more
iterations/builds/bugs/teeth-knashing/wailing than I should have. It's
okay for me, because it's a hobby system and is given away for free.
Your aspirations are somewhat higher though, wanting a good system for
embedded targets. So I'd urge due consideration and diligence!

Just my two cents, FWIW!

Mark

Anton Ertl

unread,

Nov 30, 2012, 9:12:39 AM11/30/12

to

Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>I don't understand the point of this. Surely you just put everything
>write-once (which includes the code and the dictionary entries) in one
>area and everything variable (HERE and ALLOT) in another. EXECUTE
>doesn't need an indirection because ' can return a code address;
>only >BODY needs the indirection.

Surely? Gforth puts stuff that the CPU puts into the I-cache in one
area, and stuff that the CPU puts into the D-cache in a different
area:

Advantages of the Gforth approach:

+ Better utilization of I-Cache and D-cache

+ Also avoids the cache consistency issues on processors that
invalidate I-cache lines on data reads (Pentium, K6).

+ No indirection on >BODY

Disadvantage:

- Indirection on EXECUTE.

Albert van der Horst

unread,

Nov 30, 2012, 10:34:00 AM11/30/12

to

In article <DKKdnRsQzsib5CXN...@supernews.com>,

You don't seem to appreciate that some processor have their code and
data separate to the point that data can't be reached to execute,
and code can't be read.

>
>Andrew.

Groetjes Albert
--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

Andrew Haley

unread,

Nov 30, 2012, 11:32:53 AM11/30/12

to

Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
> Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>>I don't understand the point of this. Surely you just put everything
>>write-once (which includes the code and the dictionary entries) in one
>>area and everything variable (HERE and ALLOT) in another. EXECUTE
>>doesn't need an indirection because ' can return a code address;
>>only >BODY needs the indirection.
>
> Surely? Gforth puts stuff that the CPU puts into the I-cache in one
> area, and stuff that the CPU puts into the D-cache in a different
> area:
>
> Advantages of the Gforth approach:
>
> + Better utilization of I-Cache and D-cache

That's interesting. I suppose Forth words are so extremely small that
you can get more than one of them in a single cache line, so it's
worth packing them together without intervening stuff. And you might
get good enough locality for this to be useful, with a following wind.
Is it really worth it?

> + Also avoids the cache consistency issues on processors that
> invalidate I-cache lines on data reads (Pentium, K6).

Uh, yeah. Strictly for retrocomputing fans.

> + No indirection on >BODY
>
> Disadvantage:
>
> - Indirection on EXECUTE.

But surely you can still have ' returning a code address even if
you have separated code from dictionary entries.

Andrew.

Andrew Haley

unread,

Nov 30, 2012, 11:36:06 AM11/30/12

to

Albert van der Horst <alb...@spenarnc.xs4all.nl> wrote:
>
> You don't seem to appreciate that some processor have their code and
> data separate to the point that data can't be reached to execute,
> and code can't be read.

You're right, I don't: even the likes of 8051 allow data to be put
into code space. Such bizarre animals as you describe, obviously,
require special-purpose handling, and you have to do whatever they
need. There's no point discussing what's best for them.

Andrew.

Anton Ertl

unread,

Nov 30, 2012, 11:40:37 AM11/30/12

to

Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:

>> + Also avoids the cache consistency issues on processors that
>> invalidate I-cache lines on data reads (Pentium, K6).
>
>Uh, yeah. Strictly for retrocomputing fans.

Supposedly Atoms and Larrabee use a core derived from the Pentium. I
would not rely on them not having this issue without doing
measurements first.

>> Disadvantage:
>>
>> - Indirection on EXECUTE.
>
>But surely you can still have ' returning a code address even if
>you have separated code from dictionary entries.

Yes, one can do this, but them a piece of code needs to be generated
for every word (variables, does> children, etc.), just for the sake of
EXECUTE and DEFER. Whether you want to do that is a tradeoff. In
Gforth we do the indirection.

Hugh Aguilar

unread,

Nov 30, 2012, 3:47:35 PM11/30/12

to

On Nov 30, 9:36 am, Andrew Haley <andre...@littlepinkcloud.invalid>
wrote:

Harvard architecture computers are "bizarre animals" to you? The
MiniForth was Harvard Architecture. It is a pretty common
architecture.

Modern processors such as the x86 are nominally von-Neumann
architecture, but they actually have a code cache and a data cache
which are distinct from each other --- so they work a lot more
efficiently if you keep your code and data separate, as if it were a
Harvard architecture computer.

Also, there are what I call "pseudo-Harvard" architecture. This
includes the 8051 mentioned above. These have code and data in
separately addressed spaces, although they can only access one or the
other at a time, but can't access both simultaneously like in a
Harvard architecture computer. They do this primarily to double the
amount of memory that they can access. The 8051, although it has a 16-
bit address bus, can access 64K of code and 64K of data (plus the
direct memory, which is yet another address space). This was also done
with the 65c02 in a few cases --- there were computers that could
distinguish whether a bus access was code or data and would switch
memory banks appropriately --- none of the personal computers did it,
but this is something that I heard about.

Hugh Aguilar

unread,

Nov 30, 2012, 4:18:50 PM11/30/12

to

On Nov 30, 2:42 am, Mark Wills <forthfr...@gmail.com> wrote:
> On Nov 30, 6:34 am, Hugh Aguilar <hughaguila...@yahoo.com> wrote:
> A new word is defined, BREAK, which stops the program, returning to
> the command line, and giving a full return stack dump. You can scatter
> BREAKs around your code at point where you think it may be going awry.

That is a good technique. I call it QI because it consists of QUERY
INTERPRET.

> There's just one little itsy bitsy problem: Since I wrote my TRACER
> program, I've used it once. And that was to demo it to someone else.
> And the only reason I was showing it was to show them that facilities
> such as a tracer/debugger can be written in Forth itself and the Forth
> environment simply augmented with the functionality (they were
> suitably impressed). I don't think I've touched it since. I just debug
> at the command line.

I agree --- it is easiest to just debug at the command line.

The only Forth single-step source-level debugger that I've ever used
was the one that I wrote myself for my 65c02 cross-compiler --- that
was a long time ago!

> I had a look at this yesterday and got myself tied up in knots. I
> couldn't work out how to bootstrap the thing; to get it started. How
> does the 'interpreter' for a high-level definition execute the words
> in the thread. I couldn't figure it out in my lunch break and had to
> junk what I had done. Need more time to concentrate. I was missing
> something very fundamental. I was using the : SQUARE DUP * ; as my
> target but didn't get anywhere.

ITC is more complicated than DTC because there is an extra level of
indirection. If you can't figure out how DTC works, but you've got ITC
already working, then I'm guessing that you ported your ITC over from
somewhere without really understanding it.

Keep at it --- if you still can't figure it out, contact me by email
and I will show you some code in x86 or whatever assembly-language you
want (not TI9900 though, as I don't know that one).

> > A better way to boost the speed, is with some optimization. In many
> > cases, there are pairs of producers and consumers. For example, LIT is
> > a producer because it produces some data for the parameter stack, and
> > + is a consumer because it consumes some data from the parameter
> > stack. These pairs are inefficient because the producer pushes data
> > onto the stack, and the consumer immediately pops that data off the
> > stack. The solution is to combine them into a single word. For
> > example, write a primitive LIT_+ that combines what LIT and + do. It
> > would hold the literal value in a register rather than push it onto
> > the stack and then pop it off again.
>
> This is an excellent suggestion. Perhaps an easier way (at least, in
> terms of performing optimisations) is to make every word in the
> dictionary immediate. Then, every word can 'look ahead' and see what
> is about to be compiled and intervene accordingly. It would be very
> difficult to produce a standard Forth with such a system though! I
> wonder if anyone has previously experimented with such a technique?

I don't recommend doing that. The way that I'm doing it in my own
system, is to smarten up what COMPILE, does. This doesn't just compile
the xt that it is given, but instead it puts the xt into a queue. When
it does this, it looks to see what xt is already in the queue, and
combines them if possible. On my system, the queue is only a single
item in length, so it is actually a variable not a queue --- I call it
LIMBO --- because the xt in there is in limbo, in the sense that it
has been compiled, but hasn't yet really been compiled.

> I also want to spend some time looking at Smalltalk though (a project
> for 2013) - not writing a smalltalk system, just learning the
> language. It's the OOP equivalent of Forth. It's beautiful (though
> very slow, I believe). It looks very interesting indeed to me. Despite
> being pure OO it shares the idea of terseness and brevity and total
> simplicity that Forth has.

Smalltalk is where the idea of dynamic-OOP originated. For the most
part though, CLOS really is where dynamic-OOP got going (there are
dynamic-OOP systems for Scheme too).

I would recommend learning Scheme or Lisp instead of Smalltalk, as
these still have an active community, which I don't think Smalltalk
does. I'm learning Scheme --- if you learn it too, we could bounce
ideas back and forth by email.

Learning Lisp has always been on my bucket list, but now I'm finally
doing it. :-)

> > BTW: I'm switching from DTC to ITC on my system. This is because I
> > realize (from reading this thread!), that with DTC the DOCOLON code is
> > scattered all around and won't be in the code cache, whereas with ITC
> > the whole VM should be in the code cache.
>
> Well, if your high-level definitions make a CALL to DOCOLON then
> there's no reason why DOCOLON would not be in the cache. The expense
> of the call might be less than the delay induced by a cache miss.

Yes, but the CALL isn't in the code cache with DTC. With ITC however,
you don't have code that calls your interpreter, but each word has a
pointer in the front (at the cfa) that points to the interpreter.

> However, I'd urge you to take a step back and a deep breath. I was
> reading your post on the x86 group where you are discussing caching
> etc. However, I have to point out that if the Forth you are intending
> to produce is primarily for embedded systems then the chances of the
> embedded system running on an x86 processor are quite low. It's much
> more likely to be an ARM variant. In other words, don't allow key
> descisions about the architecture of your system to be guided by
> relatively un-important architectural constraints of a particular
> processor family.

I'm writing two Forth systems. HostForth runs on the host computer
(the x86). TargForth is written in HostForth and it generates the
micro-controller code. I'm must working on HostForth right now.

What I was primarily trying to figure out on comp.lang.asm.x86 is how
to support overlays. I think I've got that figured out now though.
That is pretty important! It involves a fundamental design feature. If
I hadn't thought about this early on, but had left for later, I would
have been in trouble when later on when I discovered that supporting
overlays would be impossible without a complete rewrite. Fundamental
stuff like this really has to be figured out as early as possible.

I'm always able to learn something from those discussions on clax ---
most of those guys are really knowledgeable! --- unlike clf, where it
is mostly just baloney with b.s. frosting.

Mark Wills

unread,

Dec 1, 2012, 4:42:44 AM12/1/12

to

On Nov 30, 9:18 pm, Hugh Aguilar <hughaguila...@yahoo.com> wrote:
> > I had a look at this yesterday and got myself tied up in knots. I
> > couldn't work out how to bootstrap the thing; to get it started. How
> > does the 'interpreter' for a high-level definition execute the words
> > in the thread. I couldn't figure it out in my lunch break and had to
> > junk what I had done. Need more time to concentrate. I was missing
> > something very fundamental. I was using the : SQUARE DUP * ; as my
> > target but didn't get anywhere.
>
> ITC is more complicated than DTC because there is an extra level of
> indirection. If you can't figure out how DTC works, but you've got ITC
> already working, then I'm guessing that you ported your ITC over from
> somewhere without really understanding it.
>
> Keep at it --- if you still can't figure it out, contact me by email
> and I will show you some code in x86 or whatever assembly-language you
> want (not TI9900 though, as I don't know that one).
>

Thanks. I got it working. It was much simpler than I thought.

It seems it doesn't make much difference on the 9900; it saves a
single MOV assembly instruction in NEXT (one less level of
indirection).

So, NEXT is two assembly instructions instead of three. That takes the
same space as a TMS9900 BRANCH instruction, so, at the end of a
primitive, rather than branching to NEXT, it can just be in-lined.
However, the Kernal runs in 8-bit memory, and, doing the math, it
looks like its faster to put next in 16-bit (0 wait state) memory, and
have primitives branch to NEXT.

It all evens out to pretty much the same. For sure, *not* worth coding
a new system in DTC on the 9900. It's so marginal that it's just not
worth it.

Maybe I'll take a look at a system that generates native machine code.
That opens up all sorts of possibilities for optimisation, such as in-
lining. Words can have a bit reserved in their dictionary entry that
determines is a word is to be in-lined or not. If not, a branch/call
is compiled, otherwise, the code is pasted into the current
definition.

It would be a nice learning exercise to learn about native code
compilers. Maybe I could incrementally add optimisations as I learn
the techniques. The peephole that you mentioned (I got a list of about
25-30 words that could be optimised using the combining literal
technique that you described). I was also reading about constant
folding on wikipedia. It's extremely clever, though I can't currently
see how it is actually implemented.

Any optimising compiler that I wrote for the 4A would have to be
simple, low-hanging fruit optimisations. It's not worth the effort to
write a compiler that compiles to intermediate language that is then
optimised. Simple optimisations would be the order of the day.

How far into HostForth are you?

Anton Ertl

unread,

Dec 1, 2012, 10:34:16 AM12/1/12

to

Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:

>> + Better utilization of I-Cache and D-cache
>
>That's interesting. I suppose Forth words are so extremely small that
>you can get more than one of them in a single cache line, so it's
>worth packing them together without intervening stuff. And you might
>get good enough locality for this to be useful, with a following wind.

The way I think about this is not on a per-word basis. Instead,
consider the working set of the application: Does it fit in the
I-Cache and in the D-Cache, or conversely, how big would they have to
be to fit? If you mix code with read-only data, the working-set will
grow (both in the I-cache and in the D-cache), and if the caches are
smaller than the working set, you will have more misses in these
caches. Actually, even if the caches are big enough for the working
set, you will have more misses (compulsory misses), although the
absolute number of misses will be small.

>Is it really worth it?

Given that it was easier to implement that based on the earlier Gforth
implementation (which had no separate headers and no native code
(apart from the ITC-emulating DTC trampolines)) than to implement your
approach, yes, this advantage is definitely worth the negative cost.

How small or big the advantage is depends on the application and can
only be determined by implementing both variants in the same Forth
system and measuring the result.

Bernd Paysan

unread,

Dec 1, 2012, 3:23:01 PM12/1/12

to

Anton Ertl wrote:
> The way I think about this is not on a per-word basis. Instead,
> consider the working set of the application: Does it fit in the
> I-Cache and in the D-Cache, or conversely, how big would they have to
> be to fit? If you mix code with read-only data, the working-set will
> grow (both in the I-cache and in the D-cache), and if the caches are
> smaller than the working set, you will have more misses in these
> caches. Actually, even if the caches are big enough for the working
> set, you will have more misses (compulsory misses), although the
> absolute number of misses will be small.

Using a separate region to allocate variables as VFX now does is
probably further reducing D-cache footprint: To access variables, you
don't actually need all its header and stuff - this part is only needed
for compilation, and should not pollute the cache during execution.

Gforth EC does provide such a capability, for flash systems, where you
need to separate write-once and write-many parts.

Andrew Haley

unread,

Dec 3, 2012, 5:50:47 AM12/3/12

to

Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
> Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>>> + Better utilization of I-Cache and D-cache
>>
>>That's interesting. I suppose Forth words are so extremely small
>>that you can get more than one of them in a single cache line, so
>>it's worth packing them together without intervening stuff. And you
>>might get good enough locality for this to be useful, with a
>>following wind.
>
> The way I think about this is not on a per-word basis. Instead,
> consider the working set of the application: Does it fit in the
> I-Cache and in the D-Cache, or conversely, how big would they have
> to be to fit? If you mix code with read-only data, the working-set
> will grow (both in the I-cache and in the D-cache), and if the
> caches are smaller than the working set, you will have more misses
> in these caches.

It's the cache line granularity that really matters, though: you won't
gain anything from separating headers and words unless more words in
your working set share the same cache lines. A word that's only 20
bytes long may occupy a line of its own regardless of whether its
headers are in the same space.

Andrew.

Bernd Paysan

unread,

Dec 3, 2012, 10:48:17 AM12/3/12

to

Andrew Haley wrote:
> It's the cache line granularity that really matters, though: you won't
> gain anything from separating headers and words unless more words in
> your working set share the same cache lines. A word that's only 20
> bytes long may occupy a line of its own regardless of whether its
> headers are in the same space.

But usually, you do have spatial locality, you have several factors
close together and you will use them together. Desktop CPUs usually have
an abundance of cache, but having to fetch a whole cache line for a
single variable just because it is surrounded by header and code field
is not very efficient.

Anton Ertl

unread,

Dec 3, 2012, 10:59:58 AM12/3/12

to

Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>> The way I think about this is not on a per-word basis. Instead,
>> consider the working set of the application: Does it fit in the
>> I-Cache and in the D-Cache, or conversely, how big would they have
>> to be to fit? If you mix code with read-only data, the working-set
>> will grow (both in the I-cache and in the D-cache), and if the
>> caches are smaller than the working set, you will have more misses
>> in these caches.
>
>It's the cache line granularity that really matters, though: you won't
>gain anything from separating headers and words unless more words in
>your working set share the same cache lines.

Sure. And if the code for a word does not just happen to start at the
start of a cache line and is exactly one cache line long, then yes,
more words in the working set do share the same cache lines.

>A word that's only 20
>bytes long may occupy a line of its own regardless of whether its
>headers are in the same space.

That leaves 12 or 44 bytes for the code of the next word.

If you have 3 20-byte words and you don't mix them up with headers,
together they can occupy 2 32-byte or 1 64-byte cache line. If you
pad the code to start each word at a new cache line, it occupies at
least 3 cache lines, if you mix code with headers, it's three cache
lines or more; if you mix headers with code and use padding, you need
three cache lines for the code, but the code is distributed more
sparsely, which leads to higher miss rates for low-associativity
caches.

Anton Ertl

unread,

Dec 3, 2012, 11:28:09 AM12/3/12

to

Bernd Paysan <bernd....@gmx.de> writes:
>Using a separate region to allocate variables as VFX now does is
>probably further reducing D-cache footprint: To access variables, you
>don't actually need all its header and stuff - this part is only needed
>for compilation, and should not pollute the cache during execution.

Yes, there are ways to reduce the D-cache footprint by separating
various kinds of data, and this one is probably a good one. But
separating the code from the data is a very obvious one, because the
code does not even live in the same L1 cache as the data.

>Gforth EC does provide such a capability, for flash systems, where you
>need to separate write-once and write-many parts.

That is a different kind of separation. It may be beneficial with
write-back caches, but if it separates data that is used at the same
time and would be close to each other without that separation, it's
not a clear win. Separating the headers is a win, because they are
normally not used during program execution.

Bernd Paysan

unread,

Dec 3, 2012, 12:44:05 PM12/3/12

to

Anton Ertl wrote:
>>Gforth EC does provide such a capability, for flash systems, where you
>>need to separate write-once and write-many parts.
>
> That is a different kind of separation. It may be beneficial with
> write-back caches, but if it separates data that is used at the same
> time and would be close to each other without that separation, it's
> not a clear win. Separating the headers is a win, because they are
> normally not used during program execution.

That's a side effect of this property: A variable consists of a header
(only used during compilation), a code field (only used for compilation
and EXECUTE, since Gforth is primitive centric), and a body, where the
actual value sits, and this is used during execution. That body lives
in RAM, the rest lives in flash.

Hugh Aguilar

unread,

Dec 3, 2012, 6:25:41 PM12/3/12

to

On Dec 1, 2:42 am, Mark Wills <forthfr...@gmail.com> wrote:
> Thanks. I got it working. It was much simpler than I thought.
>
> It seems it doesn't make much difference on the 9900; it saves a
> single MOV assembly instruction in NEXT (one less level of
> indirection).
>
> So, NEXT is two assembly instructions instead of three. That takes the
> same space as a TMS9900 BRANCH instruction, so, at the end of a
> primitive, rather than branching to NEXT, it can just be in-lined.
> However, the Kernal runs in 8-bit memory, and, doing the math, it
> looks like its faster to put next in 16-bit (0 wait state) memory, and
> have primitives branch to NEXT.
>
> It all evens out to pretty much the same. For sure, *not* worth coding
> a new system in DTC on the 9900. It's so marginal that it's just not
> worth it.

I'm glad you figured out DTC. Most things are simple after you figure
them out! :-)

You are right that there is often not much difference between ITC and
DTC in speed. There was more difference on primitive processors that
lacked addressing-modes and/or didn't have enough registers. That was
a different world --- nowadays, it is all about cache efficiency.

> Maybe I'll take a look at a system that generates native machine code.
> That opens up all sorts of possibilities for optimisation, such as in-
> lining. Words can have a bit reserved in their dictionary entry that
> determines is a word is to be in-lined or not. If not, a branch/call
> is compiled, otherwise, the code is pasted into the current
> definition.

Inlining isn't all that good of a technique on modern processors. Your
code becomes bloated, which causes it to thrash the cache (Hey! That
rhymed! I'm a poet!).

Also, on the modern x86, CALL and RET are very efficient. Doing a CALL
to a function, and it doing a RET back again, is almost as fast as
inlining that function --- and it saves a lot of memory.

SwiftForth inlines small functions, but it just pastes them in. You
see code that pushes a datum onto the stack from a particular
register, and then immediately pops the datum back into that same
register. That isn't optimization! You are better off to just leave
those functions as functions, so you reduce your bloat.

> It would be a nice learning exercise to learn about native code
> compilers. Maybe I could incrementally add optimisations as I learn
> the techniques. The peephole that you mentioned (I got a list of about
> 25-30 words that could be optimised using the combining literal
> technique that you described). I was also reading about constant
> folding on wikipedia. It's extremely clever, though I can't currently
> see how it is actually implemented.
>
> Any optimising compiler that I wrote for the 4A would have to be
> simple, low-hanging fruit optimisations. It's not worth the effort to
> write a compiler that compiles to intermediate language that is then
> optimised. Simple optimisations would be the order of the day.

Well, you could stick with ITC and make your peephole-optimizer
combine word pairs. For example, OVER + would get compiled as a single
function: OVER_+ .

This works quite well. Also, it is not processor dependent. If you get
this to work on your TI9900, you can later port it over directly to
your ARM Forth. You will have to write all of the functions, such as
OVER and + and OVER_+ in ARM assembly, but this is easy. The
complicated part, of recognizing the word pairs and combining them,
will be exactly the same no matter what processor is underneath the
hood.

I wouldn't recommend writing an optimizer for generating machine-code.
That requires a lot of knowledge of the processor under the hood. Very
little that you do on the TI9900 would port over to the ARM, as they
are quite different. Stick with ITC though, and you are largely
processor independent.

Are you planning on jumping to the ARM or to the MSP430 in the future?
You know, you have to abandon that TI99/4A someday! What if you drop
it and it breaks? You can't go to WalMart and buy another one...

> How far into HostForth are you?

Not too far.

I had HLA code for generating optimized machine-code, but that was
getting complicated and I got bogged down. Then I decided to rewrite
in traditional assembly language and generate ITC code. I will only
generate optimized machine-code for the micro-controllers, where it
matters.

I liked HLA, but it is limited to 32-bit x86, and I want to be cutting-
edge for once in my life --- so I'm going with 64-bit x86 instead.

I'm taking it slow on Straight Forth because I have a lot to learn. I
don't know much about low-level stuff, such as caches. I also don't
know much about high-level stuff, such as closures. There seems to be
only a narrow window of mid-level stuff that I know about. lol

It is always worthwhile to learn new ideas, and to better oneself!

Alex McDonald

unread,

Dec 3, 2012, 7:18:47 PM12/3/12

to

On Dec 3, 11:25 pm, Hugh Aguilar <hughaguila...@yahoo.com> wrote:
> On Dec 1, 2:42 am, Mark Wills <forthfr...@gmail.com> wrote:

> > Maybe I'll take a look at a system that generates native machine code.
> > That opens up all sorts of possibilities for optimisation, such as in-
> > lining. Words can have a bit reserved in their dictionary entry that
> > determines is a word is to be in-lined or not. If not, a branch/call
> > is compiled, otherwise, the code is pasted into the current
> > definition.
>
> Inlining isn't all that good of a technique on modern processors. Your
> code becomes bloated, which causes it to thrash the cache (Hey! That
> rhymed! I'm a poet!).
>
> Also, on the modern x86, CALL and RET are very efficient. Doing a CALL
> to a function, and it doing a RET back again, is almost as fast as
> inlining that function --- and it saves a lot of memory.

No it doesn't.

>
> SwiftForth inlines small functions, but it just pastes them in. You
> see code that pushes a datum onto the stack from a particular
> register, and then immediately pops the datum back into that same
> register. That isn't optimization! You are better off to just leave
> those functions as functions, so you reduce your bloat.
>

STC Experimental 32bit: 0.06.05 Build: 363

With simple inlining of words <10 bytes long

Test time including overhead ms times ns (each)
Eratosthenes sieve 1899 Primes 120 8190000 14
Fibonacci recursion ( 35 -> 9227465 ) 65 9227430 7
Hoare's quick sort (reverse order) 129 2000000 64
Generate random numbers (1024 kb array) 92 262144 350
LZ77 Comp. (400 kb Random Data Mem>Mem) 134 1
Dhrystone (integer) 81 500000 162
6172839 Dh
rystones/sec
Total: 647 1
APP mem: 113,865, CODE mem: 15,273, SYS mem: 5,488 Total: 134,626

Without inlining

Test time including overhead ms times ns (each)
Eratosthenes sieve 1899 Primes 126 8190000 15
Fibonacci recursion ( 35 -> 9227465 ) 66 9227430 7
Hoare's quick sort (reverse order) 252 2000000 126
Generate random numbers (1024 kb array) 104 262144 396
LZ77 Comp. (400 kb Random Data Mem>Mem) 155 1
Dhrystone (integer) 88 500000 176
5681818 Dh
rystones/sec
Total: 800 1
APP mem: 113,865, CODE mem: 14,799, SYS mem: 5,488 Total: 134,152

A overall speed decrease of 25% for 470 bytes extra code.

Rod Pemberton

unread,

Dec 3, 2012, 9:20:20 PM12/3/12

to

"Hugh Aguilar" <hughag...@yahoo.com> wrote in message
news:9e679e24-97e8-4efb...@r10g2000pbd.googlegroups.com...
...

> Also, on the modern x86, CALL and RET are very efficient. Doing
> a CALL to a function, and it doing a RET back again, is almost

> as fast as inlining that function --- [...]

That may be almost true now. But, I don't know how true it is
that they're fast now. I haven't checked the x86 manuals in three
to five years. Even if they're fast now, CALL and RET still have
some overhead that's not present with inlining. Historically,
CALL and RET being fast on x86 wasn't true.

The problem with RET on modern x86 - according to those on
c.l.a.x. - is that it must be _matched_ with a CALL or it causes a
slowdown of the processor. E.g., the RET location was pushed
onto the stack via PUSH instead of by a CALL.

Rod Pemberton

Elizabeth D. Rather

unread,

Dec 3, 2012, 10:29:40 PM12/3/12

to

On 12/3/12 4:20 PM, Rod Pemberton wrote:
...
>> Also, on the modern x86, CALL and RET are very efficient. Doing
>> a CALL to a function, and it doing a RET back again, is almost
>> as fast as inlining that function --- [...]
>
> That may be almost true now. But, I don't know how true it is
> that they're fast now. I haven't checked the x86 manuals in three
> to five years. Even if they're fast now, CALL and RET still have
> some overhead that's not present with inlining. Historically,
> CALL and RET being fast on x86 wasn't true.

Whether it's worth inlining or not really depends on the length of the
code being inlined. If it's just a few instructions, the ratio of the
CALL/RET to the code is such that the inlining pays off. For a longer
sequence, it does not. It also depends on whether the CALL is set up by
C, which adds overhead for calling sequences that is missing in Forth
written in Forth/assembler.

Cheers,
Elizabeth

Mark Wills

unread,

Dec 4, 2012, 3:12:16 AM12/4/12

to

On Dec 4, 2:20 am, "Rod Pemberton" <do_not_h...@notemailnotz.cnm>
wrote:
> "Hugh Aguilar" <hughaguila...@yahoo.com> wrote in message

I would imagine (I have no experience) that with CALL/RET you also run
the risk of the routine that you are CALLing not being in the cache,
which adds a further performance penalty. At least if the the code is
inlined (even with inefficiencies such as pushing to the data stack
and immediately popping again) there is a good chance it's running
from cache.

My knowledge of cache's is very 1990's though; maybe they are a lot
cleverer these days.

What is the difference between a level 1 and a level 2 cache? Is the
level 2 cache a cache for the level 1 cache? So there are two caches
between the CPU and external memory? Is that how it works?

Do caches run 'metrics' on subroutines like "Hmmm... This subroutine
here seems to be called a lot more often than these others. I'll keep
it in my cache where I can access it quickly" or are they simply dumb,
where, if a section of memory is called for, and it's not in the
cache, it reads the memory, plus n bytes into the cache?

Guess I should read up on caches!

I don't have any such complications in my hobby system! It's nice and
simple. The only potential complication is the TMS9995 because it has
instruction prefetch. This means some self-modifying code can trip you
up, but only if you are modifying the instruction immediately in
front. A simple NOP between the instructions fixes that.

Hugh Aguilar

unread,

Dec 4, 2012, 11:17:30 PM12/4/12

to

On Dec 3, 7:20 pm, "Rod Pemberton" <do_not_h...@notemailnotz.cnm>
wrote:

> The problem with RET on modern x86 - according to those on
> c.l.a.x. - is that it must be _matched_ with a CALL or it causes a
> slowdown of the processor. E.g., the RET location was pushed
> onto the stack via PUSH instead of by a CALL.

Most of my knowledge of this kind of thing comes form CLAX, most
likely from the same threads that you have been reading. I've also
read some of Intel's optimization manual. See 3.4.1.4 for a discussion
of inlining versus CALL/RET.

It is true that CALL and RET have to be paired for them to be
optimized. This means that my stack-threading, in which RSP is used as
the Forth IP, won't get optimized, because there are a lot of RET
instructions and no CALL instructions at all. That is not what we are
talking about here though. We are talking about subroutine-threading
(as done by SwiftForth), and the decision of whether to inline a
function or just CALL it. It ends in RET, so the CALL and RET are
paired, which will work. The optimization only works for 16 nesting
levels, but that is not generally a problem.

Mark is right, that if the sub-function is not in the same 32KB memory
block, then calling it will thrash the cache. If it is close though,
then calling it is better than inlining it. Also, if you mostly call
sub-functions rather than inline them, you save a lot of memory. I
don't know why Alex said that it doesn't save memory, as it obviously
does assuming that the function is larger than a CALL instruction and
it gets called more than once. Calling rather than inlining functions
makes the code less bloated and boosts the chance of the sub-function
being close to the calling function.

For the most part, I recommend against just blindly inlining
functions, as done in SwiftForth (for everything defined with ICODE).
You might as well just CALL them rather than inline them, if you
aren't going to do any optimization (in the sense of holding values in
registers, rather than pushing them onto the stack and pulling them
off again).

Generating machine-code for the x86 was somewhat beyond my knowledge,
which is why I reverted to ITC for HostForth. Nobody cares about the
x86 anyway --- all desktop-computer software is given away for free
--- the only thing that matters is micro-controllers. For this reason,
I have decided to make HostForth fairly simple, and get it to run.
Then I can focus most of my effort on TargForth that generates code
for the micro-controllers, to make it generate efficient code. That
makes sense economically, as micro-controllers get sold for money, so
quality is important. Also, it is easier, as micro-controllers don't
have complicated cache systems, and all of that other complicated
stuff that the x86 has. With the x86, nobody really knows what it does
internally, because the manufacturer won't say because they don't want
to give away trade secrets to their competitors --- the result is that
the assembly-language programmer gets a lot of vague heuristics about
how to optimize his code --- these are like voodoo rituals in that
they seem to work, but you don't know why. By comparison, with the
80486 we had the u and v pipes, and we could know at compile-time
pretty much exactly what the processor would do at run-time --- life
was much simpler then.

This is an example of SwiftForth just blindly inlining sub-functions:

: init-node ( node -- node )
0 over ! ; ok

see init-node
46E8BF 4 # EBP SUB 83ED04
46E8C2 EBX 0 [EBP] MOV 895D00
46E8C5 0 # EBX MOV BB00000000 ( 0 )
46E8CA 4 # EBP SUB 83ED04
46E8CD EBX 0 [EBP] MOV 895D00
46E8D0 4 [EBP] EBX MOV 8B5D04 ( OVER )
46E8D3 0 [EBP] EAX MOV 8B4500
46E8D6 EAX 0 [EBX] MOV 8903
46E8D8 4 [EBP] EBX MOV 8B5D04
46E8DB 8 # EBP ADD 83C508 ( ! )
46E8DE RET C3 ok

see over
40383F 4 # EBP SUB 83ED04
403842 EBX 0 [EBP] MOV 895D00
403845 4 [EBP] EBX MOV 8B5D04
403848 RET C3 ok

see !
40335F 0 [EBP] EAX MOV 8B4500
403362 EAX 0 [EBX] MOV 8903
403364 4 [EBP] EBX MOV 8B5D04
403367 8 # EBP ADD 83C508
40336A RET C3 ok

This is how VFX does it:

: init-code ( node -- node )
0 over ! ; ok
see init-code
INIT-CODE
( 004C7A90 C70300000000 ) MOV DWord Ptr 0 [EBX],
00000000
( 004C7A96 C3 ) NEXT,
( 7 bytes, 2 instructions )
ok

This is how I hand-wrote the same thing in my novice package for
SwiftForth:

icode init-node ( node -- node )
eax eax xor eax 0 [ebx] mov ret

I think VFX's generated code is actually more efficient than my hand-
written code. I recently read in the Intel optimization manual that
register-to-register instructions (such as my xor) are not efficient
nowadays. I wrote my function in the way that I would have written it
for the 80486 --- but that isn't necessarily the best way for the
modern x86. All of this x86 stuff is complicated now, and I don't
really know very much about the subject --- even a simple function
such as init-code is a mystery as to how to write it efficiently. :-(

Ron Aaron

unread,

Dec 5, 2012, 1:31:33 AM12/5/12

to

On 12/05/2012 06:17 AM, Hugh Aguilar wrote:

> Nobody cares about the
> x86 anyway --- all desktop-computer software is given away for free

I'm sure Microsoft and Apple would be fascinated to find out their
software is given away for free!

> With the x86, nobody really knows what it does
> internally, because the manufacturer won't say because they don't want
> to give away trade secrets to their competitors --- the result is that
> the assembly-language programmer gets a lot of vague heuristics about
> how to optimize his code --- these are like voodoo rituals in that
> they seem to work, but you don't know why. By comparison, with the
> 80486 we had the u and v pipes, and we could know at compile-time
> pretty much exactly what the processor would do at run-time --- life
> was much simpler then.

Modern desktop computer CPUs are complex and their microcode is complex,
making optimization of hand-written assembly more challenging -- that's
certainly true. However, the old technique of "measure your code's
performance" still works, even on the most advanced processors in
existence. Still and all, the best optimization is to use your head and
choose a better algorithm, rather than try to micro-optimize CPU pipelines.

Mark Wills

unread,

Dec 5, 2012, 2:48:45 AM12/5/12

to

On Dec 5, 6:31 am, Ron Aaron <rambam...@gmail.com> wrote:
>
> I'm sure Microsoft and Apple would be fascinated to find out their
> software is given away for free!
>

Well, not all software is given away for free, but an awful lot of it
is. I'd argue that practically all of Apple's software is in fact
given away for free. It's a loss leader to sell the hardware.

Ditto microsoft: Despite the many millions it must have cost MS to
develop IE, Media Player, Movie Maker et al they are given away for
free to make the OS more attractive.

So yes, I'd agree in general terms with Hugh on that point. It's quite
hard to make money on x86 software sales IMHO when the big boys are
giving a lot of software (iTunes etc) away for free.

Even Winamp, which I am using right now, is free, relying on
donations. Same with the Linux vendors: Entire operating systems given
away for free. Personally I think it's bonkers, it's a race to the
bottom (and we're seeing the same in the mobile entertainment market:
Phone and tablet games) and I take my hat off to companies like Red
Hat who have managed to carve out a profit from their business.

Mozilla not doing so well. IIRC they rely heavily on Google for their
income.

Just thought i'd chip in!

Mark Wills

unread,

Dec 5, 2012, 2:53:12 AM12/5/12

to

Related to my earlier post:

Half of all app store revenue goes to just 25 developers

http://www.theregister.co.uk/2012/12/04/top_25_app_devs_earn_half_of_revenue/

Alex McDonald

unread,

Dec 5, 2012, 7:03:23 AM12/5/12

to

On my Forth 10 calls means 50 bytes but 25 inlined @ is the same
number of bytes. That's shorter, not longer.

For 470 extra bytes (3% extra) by inlining words <10 bytes, there's a
25% decrease in CPU time. I can show this; you can't show squat
diddly, since you are pulling it out of your ass.

>
> For the most part, I recommend against just blindly inlining

Recommend? Snort!

About the only true statement in all of this.

> such as init-code is a mystery as to how to write it efficiently. :-(

Have you ever, ever, ever tested any -- even a single line -- of this
bloviating speculation by running a benchmark or pointing to a
benchmark or other evidence? Ever? At all? The same problem was
apparent in your symtab code; you couldn't demonstrate anything but an
ability to write screeds of unsubstantiated opinions masquerading as
hard evidence.

Much of the above is demonstrably and measurably wrong, and you can't
justify a word of it. That's because you assume your audience is just
like you. Too damn lazy and too damn stupid to check.

Paul Rubin

unread,

Dec 5, 2012, 2:35:44 PM12/5/12

to

Mark Wills <forth...@gmail.com> writes:
> What is the difference between a level 1 and a level 2 cache? ...

> Guess I should read up on caches!

This is a bit out of date, but it is still informative:

http://magazine.redhat.com/2007/10/26/ulrich-drepper-memory-article-series/

Hugh Aguilar

unread,

Dec 5, 2012, 3:13:05 PM12/5/12

to

On Dec 5, 12:48 am, Mark Wills <forthfr...@gmail.com> wrote:
> On Dec 5, 6:31 am, Ron Aaron <rambam...@gmail.com> wrote:
>
>
>
> > I'm sure Microsoft and Apple would be fascinated to find out their
> > software is given away for free!
>
> Well, not all software is given away for free, but an awful lot of it
> is. I'd argue that practically all of Apple's software is in fact
> given away for free. It's a loss leader to sell the hardware.

All software is a loss-leader to sell hardware.

The difference between micro-controllers and desktop-computers, is
that with micro-controllers you are selling the hardware, so it makes
sense for you to provide it with software to give it value. With
desktop-computer software, somebody else (Dell, etc.) is selling the
hardware, and you are providing software to give it value but not
getting paid. The reason why MicroSoft gets paid, is because they have
a deal to bundle their software with the hardware, and the hardware
vendors pay them for this. Theoretically, MicroSoft will eventually
fail when the hardware vendors switch over to bundling Linux, which
makes sense for them considering that the Linux programmers are
willing to work for free --- this will never happen though, because
the American federal government has standardized on Windows for their
own bureaus, meaning that MicroSoft has a monopoly --- Benito
Mussolini (who coined the term: "corporatism") would be proud!

Richard Stallman was right when he said that programming is a service,
but that programs aren't products. The days of programs getting sold
in shrink-wrapped packages containing a pile of disks and a stack of
manuals, is long gone (I bought TASM like that, in a store's going-out-
of-business sale). Nowadays, most programming is done writing software
that will be run once and will become obsolete a week later --- this
is typically software that supports a moving target, such as health-
care billing in which the regulations change almost daily at the whim
of a byzantine bureaucracy. I myself wrote IBM370 assembly-language
software for direct-mail, so the software became obsolete a week later
after the mailing was done --- the company that I worked for had one
of the few licenses from the Post Office to do that and get the big
discount on postage.

It is a battle between fascists and communists --- on one hand we have
the fascists who want MicroSoft to have a monopoly, and for everybody
to pay for their software --- on the other hand we have the GNU
communists who want all software to be given away for free, but for
programmers to get paid by the hour to make custom upgrades to the
software (after investing years of their own time in writing the
software for free). Neither plan is very good, but I would prefer
communism to fascism --- mostly because I don't think that MicroSoft
is going to hire me. By comparison, I expect that Elizabeth Rather
would prefer fascism --- mostly because she hopes to get the monopoly
on Forth programming (that is why she tried so hard to kill FIG in the
1980s) --- maybe the monopoly will even be government enforced, in the
sense that she could have me shot for writing Forth code without
permission. :-)

Alex McDonald

unread,

Dec 5, 2012, 6:26:51 PM12/5/12

to

On Dec 5, 8:13 pm, Hugh Aguilar <hughaguila...@yahoo.com> wrote:
> On Dec 5, 12:48 am, Mark Wills <forthfr...@gmail.com> wrote:
>
> > On Dec 5, 6:31 am, Ron Aaron <rambam...@gmail.com> wrote:
>
> > > I'm sure Microsoft and Apple would be fascinated to find out their
> > > software is given away for free!
>
> > Well, not all software is given away for free, but an awful lot of it
> > is. I'd argue that practically all of Apple's software is in fact
> > given away for free. It's a loss leader to sell the hardware.
>
> All software is a loss-leader to sell hardware.

That's (almost) the wrong way round. There is no to little margin in
most hardware.

>
> The difference between micro-controllers and desktop-computers, is
> that with micro-controllers you are selling the hardware, so it makes
> sense for you to provide it with software to give it value. With
> desktop-computer software, somebody else (Dell, etc.) is selling the
> hardware, and you are providing software to give it value but not
> getting paid.

Dell's 2012 Q2 had a consumer products operating margin of 0.5
percent.

> The reason why MicroSoft gets paid, is because they have
> a deal to bundle their software with the hardware, and the hardware
> vendors pay them for this. Theoretically, MicroSoft will eventually
> fail when the hardware vendors switch over to bundling Linux, which
> makes sense for them considering that the Linux programmers are
> willing to work for free

Most Linux programmers -- the vast majority -- work for companies and
are paid by them to write code for Linux. My company funds 5 full time
Linux developers, and many others contributing to other open source
projects like OpenStack and so on. There are good business reasons for
doing so which will be beyond your undestanding.

> --- this will never happen though, because
> the American federal government has standardized on Windows for their
> own bureaus, meaning that MicroSoft has a monopoly --- Benito
> Mussolini (who coined the term: "corporatism") would be proud!

Linux federal users;

US Dept of Defense; "single biggest install base for Red Hat Linux"
"RedHat's largest customer"
US Navy
US Postal Service
US Federal Aviation Authority

There are many more.

Mussolini did not coin the term corporatism (first used c 1890),
although he was one of the first to attempt to effect it.

>
> Richard Stallman was right when he said that programming is a service,
> but that programs aren't products. The days of programs getting sold
> in shrink-wrapped packages containing a pile of disks and a stack of
> manuals, is long gone (I bought TASM like that, in a store's going-out-
> of-business sale).

Much open source software can be bought with maintenance contracts.
RedHat is an example; Linux is free, and RedHat support Fedora that
way. But commercial support will cost you money for RedHat Enterprise
Linux (RHEL).

> Nowadays, most programming is done writing software
> that will be run once and will become obsolete a week later --- this
> is typically software that supports a moving target, such as health-
> care billing in which the regulations change almost daily at the whim
> of a byzantine bureaucracy.

Complete fantasy.

> I myself wrote IBM370 assembly-language
> software for direct-mail, so the software became obsolete a week later
> after the mailing was done --- the company that I worked for had one
> of the few licenses from the Post Office to do that and get the big
> discount on postage.
>
> It is a battle between fascists and communists --- on one hand we have
> the fascists who want MicroSoft to have a monopoly, and for everybody
> to pay for their software --- on the other hand we have the GNU
> communists who want all software to be given away for free, but for
> programmers to get paid by the hour to make custom upgrades to the
> software (after investing years of their own time in writing the
> software for free). Neither plan is very good, but I would prefer
> communism to fascism --- mostly because I don't think that MicroSoft
> is going to hire me.

I can guarantee that no-one will hire you while you spout such
ridiculous and easily disproven "facts".

> By comparison, I expect that Elizabeth Rather
> would prefer fascism --- mostly because she hopes to get the monopoly
> on Forth programming (that is why she tried so hard to kill FIG in the
> 1980s) --- maybe the monopoly will even be government enforced, in the
> sense that she could have me shot for writing Forth code without
> permission. :-)

I do hope so. Actually, you writing anything at all would seem to be a
really good excuse.

Rod Pemberton

unread,

Dec 5, 2012, 8:39:26 PM12/5/12

to

"Hugh Aguilar" <hughag...@yahoo.com> wrote in message

news:3cfcf1ea-df96-45e4...@m4g2000pbd.googlegroups.com...

> On Dec 5, 12:48 am, Mark Wills <forthfr...@gmail.com> wrote:
> > On Dec 5, 6:31 am, Ron Aaron <rambam...@gmail.com> wrote:

...

> > > I'm sure Microsoft and Apple would be fascinated to find
> > > out their software is given away for free!
>
> > Well, not all software is given away for free, but an awful
> > lot of it is. I'd argue that practically all of Apple's
> > software is in fact given away for free. It's a loss leader to
> > sell the hardware.
>
> All software is a loss-leader to sell hardware.

It's generally the reverse, but can be either, both or neither.

> The difference between micro-controllers and desktop-computers,
> is that with micro-controllers you are selling the hardware, so
> it makes sense for you to provide it with software to give it
> value. With desktop-computer software, somebody else (Dell,
> etc.) is selling the hardware, and you are providing software to
> give it value but not getting paid. The reason why MicroSoft
> gets paid, is because they have a deal to bundle their software
> with the hardware, and the hardware vendors pay them for this.
> Theoretically, MicroSoft will eventually fail when the hardware
> vendors switch over to bundling Linux, which makes sense for
> them considering that the Linux programmers are willing to work

> for free --- [...]

Was that the theory? Lol! I'll ignore the fact the "theory" is
faulty. Linux has had two decades now ... What happened?

> [...] this will never happen though, because

> the American federal government has standardized on Windows
> for their own bureaus, meaning that MicroSoft has a monopoly

> [...]

That's first I've heard of that rationale for MS having a
monopoly. But, I'd say it's a very good idea to have those who
once "attacked" you via abuse of authority as a complicit partner
in your "crimes" of monopoly.

> Richard Stallman was right when he said that programming is a
> service, but that programs aren't products.

Is he confused?

> The days of programs getting sold
> in shrink-wrapped packages containing a pile of disks and a
> stack of manuals, is long gone (I bought TASM like that, in
> a store's going-out-of-business sale).

True.

> Nowadays, most programming is done writing software
> that will be run once and will become obsolete a week later ---

True.

> [...] this

> is typically software that supports a moving target, such as

> health-care billing in which the regulations change almost

> daily at the whim of a byzantine bureaucracy. I myself wrote
> IBM370 assembly-language software for direct-mail, so the
> software became obsolete a week later after the mailing was
> done --- the company that I worked for had one of the few
> licenses from the Post Office to do that and get the
> big discount on postage.

So true.

Business has to do what is legally required. So, the
irrationality inherent in the law is therefore irrelevant.

I had to do much the same when programming for the brokerage
industry about a decade ago. I had to implement any SEC or
regulatory changes in securities law as code to prohibit stock
traders from breaking the laws. Of course, I had generally zero
help from the legal department to help determine what the laws
meant. They were too busy. My direct management was utterly
incompetent, i.e., 4 hours of fantasy football, 1 hour of
bullshitting, 1 hour in a random meeting that had nothing to do
with our department, and a 2 hour lunch with drinking during
the allotted one hour lunch break. So, there was generally no
help there either. Sometimes, it was impossible to implement.
The entire application would've needed to be scrapped. The one
time the legal department decided the issue was important enough
to "help" me, our in-house legal team spent nearly all day arguing
about what the law actually meant. Then, they spent and hour on
how it could affect the company, and what it would cost them under
various scenarios. Eventually, five minutes before it was time to
go home for the day (hint), they decided they'd rather just fight
it out in court for a violation of the law than have me implement
the law in code. That was the first time I truly wondered why I
was there. It was farcical. I felt like becoming an overpaid,
underworked, lazy, incompetent, corporate lawyer, just like them.
I had no understanding of the law, but I could still do no worse.

> It is a battle between fascists and communists --- on one hand
> we have the fascists who want MicroSoft to have a monopoly,
> and for everybody to pay for their software --- on the other
> hand we have the GNU communists who want all software to
> be given away for free, but for programmers to get paid by the
> hour to make custom upgrades to the software (after investing
> years of their own time in writing the software for free).

Fascists? Did you mean Socialists? Or, perhaps Capitalists ... ?

If not, you may wish to review fascism and socialism:
http://en.wikipedia.org/wiki/Fascism
http://en.wikipedia.org/wiki/Socialism

I.e., communism and socialism are related to an economic system,
whereas fascism isn't ...

> Neither plan is very good, but I would prefer
> communism to fascism ---

I prefer capitalism. I'm not quite sure why you mentioned fascism
and communism together instead of socialism and communism ...

> [...] mostly because I don't think that MicroSoft

> is going to hire me.

Irrelevant.

There are lots of tech companies making money, e.g., Apple,
Google. Recently, Apple earned more than Microsoft, Google, eBay,
Yahoo!, Facebook, and Amazon combined. If you can program well,
someone will hire you. You might not be payed as well as you
should be. But, in overpriced or expensive west coast and east
coast markets, you can still do well.

Rod Pemberton

Ron Aaron

unread,

Dec 5, 2012, 11:32:50 PM12/5/12

to

On 12/06/2012 01:26 AM, Alex McDonald wrote:> On Dec 5, 8:13 pm, Hugh

Aguilar <hughaguila...@yahoo.com> wrote:
>>
>> All software is a loss-leader to sell hardware.
>
> That's (almost) the wrong way round. There is no to little margin in
> most hardware.

I've been working as a programmer almost thirty years now; none of the
software I wrote was "a loss-leader to sell hardware".

>> The reason why MicroSoft gets paid, is because they have
>> a deal to bundle their software with the hardware, and the hardware
>> vendors pay them for this. Theoretically, MicroSoft will eventually
>> fail when the hardware vendors switch over to bundling Linux, which
>> makes sense for them considering that the Linux programmers are
>> willing to work for free
>
> Most Linux programmers -- the vast majority -- work for companies and
> are paid by them to write code for Linux. My company funds 5 full time
> Linux developers, and many others contributing to other open source
> projects like OpenStack and so on. There are good business reasons for
> doing so which will be beyond your undestanding.

In fact, for the past several years I've been working almost exclusively
on *commercial* Linux software. Not free, and not open-source, but
commercial or proprietary. The company I'm at has a team of five Linux
programmers, and looking to hire more.

> Linux federal users;
>
> US Dept of Defense; "single biggest install base for Red Hat Linux"
> "RedHat's largest customer"
> US Navy
> US Postal Service
> US Federal Aviation Authority
>
> There are many more.

Indeed there are, especially if you look outside the US.

>> Nowadays, most programming is done writing software
>> that will be run once and will become obsolete a week later --- this
>> is typically software that supports a moving target, such as health-
>> care billing in which the regulations change almost daily at the whim
>> of a byzantine bureaucracy.
>
> Complete fantasy.

Did you expect something based in reality from him?

My personal experience has not involved writing health-care-billing
programs. The world of programming is much wider than Hugh seems to
think it is.

Mark Wills

unread,

Dec 6, 2012, 4:07:23 AM12/6/12

to

On Dec 5, 11:26 pm, Alex McDonald <b...@rivadpm.com> wrote:
>
> That's (almost) the wrong way round. There is no to little margin in
> most hardware.
>

Snort!

Really?

I must let Allen Bradley, ABB and Siemens know.

Alex McDonald

unread,

Dec 6, 2012, 7:23:50 AM12/6/12

to

Go look at the figures from HP, Dell, IBM and any other number of
computer hardware suppliers. Tin doesn't pay. Here's a classic one-
liner from IBM's last earning's call; "We continue to see value
shifting to software."

Out of curiosity I went and looked at the companies you mentioned.

Rockwell (owners of Allen Bradley) provide a mixture of software,
hardware and services and don't break out the mix of each in their two
major divisions. Sales rose but profits dropped in their last quarter.

Siemens is a large and diverse German company, and I can't get any
sense of their specific computer hardware sales vs expenses from their
annual report; their business sectors aren't structured that way,
since computer hardware is not their primary business. Ditto for the
UK company ABB.

Albert van der Horst

unread,

Dec 6, 2012, 8:13:35 AM12/6/12

to

In article <dabba9b6-c352-4cd4...@r20g2000yql.googlegroups.com>,

Alex McDonald <bl...@rivadpm.com> wrote:
>On Dec 5, 8:13 pm, Hugh Aguilar <hughaguila...@yahoo.com> wrote:
>> On Dec 5, 12:48 am, Mark Wills <forthfr...@gmail.com> wrote:
>>
>> > On Dec 5, 6:31 am, Ron Aaron <rambam...@gmail.com> wrote:
>>
>> > > I'm sure Microsoft and Apple would be fascinated to find out their
>> > > software is given away for free!
>>
>> > Well, not all software is given away for free, but an awful lot of it
>> > is. I'd argue that practically all of Apple's software is in fact
>> > given away for free. It's a loss leader to sell the hardware.
>>
>> All software is a loss-leader to sell hardware.
>
>That's (almost) the wrong way round. There is no to little margin in
>most hardware.

Maybe we should start to make a third category: extorsionware.
We have long ago passed from a free market society to an extorsion
society. extorsion ware is where you have government guaranteed
rights to customer money. It is clear that if you control the
governement (and they do!) this paying for IP rights is most
profitable (though profit in the free market sense is no longer
an appropriate word).

"hardware and software both are loss-leaders to extorsion ware."
I can't stand a full 100% behind it, but certainly is more truthful
than what Hugh says.
The big money is in extorsionware, films, records, medicines.
You can also think of the mobile phone market as almost extorsion
ware: the phone is free, then you pay for the right to use
an (originally public funded!) phone net.

Microsoft stuff ("Windows 7") is some 78% extorsion ware, 12%
software. Etc.

Groetjes Albert
--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

Bernd Paysan

unread,

Dec 6, 2012, 9:46:10 AM12/6/12

to

Mark Wills wrote:
> Even Winamp, which I am using right now, is free, relying on
> donations. Same with the Linux vendors: Entire operating systems given
> away for free. Personally I think it's bonkers, it's a race to the
> bottom (and we're seeing the same in the mobile entertainment market:
> Phone and tablet games) and I take my hat off to companies like Red
> Hat who have managed to carve out a profit from their business.

What kind of strange misunderstanding of free market economy is behind
words like "race to the bottom"? The cost to copy some software is
almost zero, which means that the price limit for software sold in high
volumes is about zero, and free market economies tend to reach that
price limit. It's not a race to the bottom, as this software still is
high quality. The business model in 90% of all software development
projects is either "pay us for the service" or "pay us for the
development", the shelf-ware software had been a small niche, mostly
occupied by Microsoft, Adobe, and a few others. Nowadays, we have Apps,
but most of them are free-to-download and are add-driven.

Bernd Paysan

unread,

Dec 6, 2012, 9:49:49 AM12/6/12

to

Alex McDonald wrote:
> Siemens is a large and diverse German company, and I can't get any
> sense of their specific computer hardware sales vs expenses from their
> annual report; their business sectors aren't structured that way,
> since computer hardware is not their primary business.

They did have that (making PCs, mainframes and Unix/Sinix workstations),
and sold them a long time ago to Japanese companies like Fujitsu. No
wonder you don't find computer hardware sales any more.

Mark Wills

unread,

Dec 6, 2012, 10:42:20 AM12/6/12

to

ABB, Allen Bradley, and Siemens all make and sell industrial PLC's
(programmable logic controllers). They cost an absolute arm and a leg.
They're not computers. It's true that they do have other divisions
(well, not really in Allen Bradley's case). ABB have a Power division,
Turbines division etc. Siemens have power, communications etc.

But take it from me, their hardware costs an absolute freaking
fortune. So, there is money to be made in hardware! Just not PC's,
which are consumer products, no different to a VDR or a DVD player.

Mark Wills

unread,

Dec 6, 2012, 10:47:27 AM12/6/12

to

No misunderstanding at all. It's a race to the bottom for the software
developers writing the free products. They are actually in competition
with their "competitors" to give stuff away for free.

"Here, here, take mine, it's free."
"No, take mine. It's also free!"

Bonkers.

Meanwhile, their mother and father is feeding and clothing them to
develop software to be given away for free.

It's software-socialism, and, like we have seen with all countries
that have developed a socialist/communist society, it's a race to the
bottom.

That's why it's bonkers.

Alex McDonald

unread,

Dec 6, 2012, 11:30:49 AM12/6/12

to

On Dec 6, 3:42 pm, Mark Wills <forthfr...@gmail.com> wrote:
> On Dec 6, 2:49 pm, Bernd Paysan <bernd.pay...@gmx.de> wrote:
>
> > Alex McDonald wrote:
> > > Siemens is a large and diverse German company, and I can't get any
> > > sense of their specific computer hardware sales vs expenses from their
> > > annual report; their business sectors aren't structured that way,
> > > since computer hardware is not their primary business.
>
> > They did have that (making PCs, mainframes and Unix/Sinix workstations),
> > and sold them a long time ago to Japanese companies like Fujitsu. No
> > wonder you don't find computer hardware sales any more.

Yes, I'd forgotten.

>
> > --
> > Bernd Paysan
> > "If you want it done right, you have to do it yourself"http://bernd-paysan.de/
>
> ABB, Allen Bradley, and Siemens all make and sell industrial PLC's
> (programmable logic controllers). They cost an absolute arm and a leg.
> They're not computers. It's true that they do have other divisions
> (well, not really in Allen Bradley's case). ABB have a Power division,
> Turbines division etc. Siemens have power, communications etc.
>
> But take it from me, their hardware costs an absolute freaking
> fortune. So, there is money to be made in hardware! Just not PC's,
> which are consumer products, no different to a VDR or a DVD player.

Ah, PLCs. Not quite what I had in mind.

PS Ever heard of Sun?

Alex McDonald

unread,

Dec 6, 2012, 11:36:06 AM12/6/12

to

It's only bonkers if it's true. Which it isn't; it's a gross
mischaracterisation by extrapolating a small %age of software
development to an entire industry. Software with no value never gets
used. The race is to the top in terms of quality, regardless of price.

Paul Rubin

unread,

Dec 6, 2012, 12:45:59 PM12/6/12

to

Mark Wills <forth...@gmail.com> writes:
> ABB, Allen Bradley, and Siemens all make and sell industrial PLC's
> (programmable logic controllers). They cost an absolute arm and a leg.

I've been wondering why anyone buys these things other than
traditionalism. Surely there are some high-assurance microprocessors by
now. Ladder logic doesn't seem like a particularly safer programming
language than, well, you know.

Elizabeth D. Rather

unread,

Dec 6, 2012, 1:01:53 PM12/6/12

to

Make no mistake, a PLC is a very software-intensive device! The fact
that it is bundled with the hardware conceals that, but a PLC is
basically a specialized computer with elaborate process-control software.

Albert van der Horst

unread,

Dec 6, 2012, 4:41:44 PM12/6/12

to

In article <7xfw3jn...@ruckus.brouhaha.com>,

Last job I was in they were switching from PLC to industrial PC's running
Windows and emulating PLC. Now that is progress ;-)

A. K.

unread,

Dec 6, 2012, 5:15:06 PM12/6/12

to

On 06.12.2012 22:41, Albert van der Horst wrote:
> Last job I was in they were switching from PLC to industrial PC's running
> Windows and emulating PLC. Now that is progress ;-)
>
> Groetjes Albert
>

A bit oversimplifying, aren't we? ;-)

Windows (or Linux) for browser-based man-machine-interface. By the way
tablet PCs as maintenance tools have also become very common.

Intel-based hardware (say industrial PCs) as PLC CPUs but with
(redundant) industrial interface to inputs/outputs or communication
links to 3rd party systems. So it's really not desktop PC technology,
and it's really not Windows as realtime OS.

Special hardware and software for any safety-related PLCs.

Mark Wills

unread,

Dec 6, 2012, 5:16:16 PM12/6/12

to

On Dec 6, 9:41 pm, alb...@spenarnc.xs4all.nl (Albert van der Horst)
wrote:
> In article <7xfw3jnkfc....@ruckus.brouhaha.com>,
> Paul Rubin <no.em...@nospam.invalid> wrote:

>
> >Mark Wills <forthfr...@gmail.com> writes:
> >> ABB, Allen Bradley, and Siemens all make and sell industrial PLC's
> >> (programmable logic controllers). They cost an absolute arm and a leg.
>
> >I've been wondering why anyone buys these things other than
> >traditionalism. Surely there are some high-assurance microprocessors by
> >now. Ladder logic doesn't seem like a particularly safer programming
> >language than, well, you know.
>
> Last job I was in they were switching from PLC to industrial PC's running
> Windows and emulating PLC. Now that is progress ;-)
>
> Groetjes Albert
> --
> Albert van der Horst, UTRECHT,THE NETHERLANDS
> Economic growth -- being exponential -- ultimately falters.
> albert@spe&ar&c.xs4all.nl &=nhttp://home.hccnet.nl/a.w.m.van.der.horst

We have the same things here, emulating SLC-500's (Allen Bradley).
They are termed 'Soft PLCs'.

They crash every seven days.

Progress indeed!

Mark Wills

unread,

Dec 6, 2012, 5:18:03 PM12/6/12

to

> Los Angeles, CA 90045http://www.forth.com

>
> "Forth-based products and Services for real-time
> applications since 1973."
> ==================================================

Sure. I'm aware of that. I wrote my first ladder logic in 1989 on
Omron PLCs. Rockwell (now absorbed in Allen Bradley) actually made a
PLC with a specialised Forth processor for a while.

Mark Wills

unread,

Dec 6, 2012, 5:21:45 PM12/6/12

to

On Dec 6, 5:45 pm, Paul Rubin <no.em...@nospam.invalid> wrote:

They are *extremely* reliable. They have been known to run > 20 years
without ever being powered off.

The key to ladder logic is that it's not considered to be programming
(although it clearly is). It is easily understood by electrical
engineers, because the code looks like many parallel electric
circuits. Because of this, an engineer can turn up and debug or change
ladder code very easily indeed without having ever seen the code
before. I know, because I've been that guy in the past! In fact,
that's exactly how I was offered my current position with the company
I'm working for!

Elizabeth D. Rather

unread,

Dec 6, 2012, 6:48:14 PM12/6/12

to

On 12/6/12 12:18 PM, Mark Wills wrote:
> On Dec 6, 6:01 pm, "Elizabeth D. Rather" <erat...@forth.com> wrote:

...

>> Make no mistake, a PLC is a very software-intensive device! The fact
>> that it is bundled with the hardware conceals that, but a PLC is
>> basically a specialized computer with elaborate process-control software.
>

> Sure. I'm aware of that. I wrote my first ladder logic in 1989 on
> Omron PLCs. Rockwell (now absorbed in Allen Bradley) actually made a
> PLC with a specialised Forth processor for a while.
>

Yes, they were the most important design win for the Harris RTX, and a
strong indicator that the RTX could have survived if Harris hadn't had
corporate indigestion at higher levels.

Rod Pemberton

unread,

Dec 6, 2012, 7:03:08 PM12/6/12

to

"Alex McDonald" <bl...@rivadpm.com> wrote in message
news:d6da8672-b061-41c8...@b11g2000yqh.googlegroups.com...
...

> Here's a classic one-liner from IBM's last earning's call;

> "We continue to see value shifting to software."

That quote is entirely out of context. To what do they refer?

The quote provided doesn't say if they meant big world-class
software (nuclear, weather, oil) on powerful IBM mainframes, or if
they meant x86 PC's, or if they meant Internet servers, or if they
meant embedded PC's, or if they meant ECM's for automobiles, or
PLC's for industry, or if they meant x86 PC software, and whether
or not that software where they "see value shifts" is for business
market or for home market etc.

Rod Pemberton

Clyde W. Phillips Jr.

unread,

Dec 6, 2012, 11:16:31 PM12/6/12

to

On Wednesday, November 28, 2012 7:54:01 AM UTC-6, Albert van der Horst wrote:
> In article <25-dnfvDKpqEaCjN...@supernews.com>,
>
> Andrew Haley <andr...@littlepinkcloud.invalid> wrote:
>
> >Mark Wills <forth...@gmail.com> wrote:
>
> >> On Nov 28, 10:41?am, Andrew Haley <andre...@littlepinkcloud.invalid>

>
> >> wrote:
>
> >>> Mark Wills <forthfr...@gmail.com> wrote:
>

> >>> > On Nov 28, 1:42?am, Hugh Aguilar <hughaguila...@yahoo.com> wrote:
>
> >>>
>
> >>> >> With ITC it is possible to change how a word is interpreted by
>
> >>> >> changing the pointer at the cfa. With DTC, by comparison, you don't
>
> >>> >> have a pointer to the code that interprets the word, but rather you
>
> >>> >> have the code itself pasted in there.
>
> >>>
>
> >>> > I don't think that's correct, Hugh.
>
> >>>
>
> >>> I'm sure it is.
>
> >>
>
> >> Eh?
>
> >
>
> >I don't understand the problem you're having with my reply. With ITC
>
> >it is possible to change how a word is interpreted by changing the
>
> >pointer at the cfa. This is simply true, there is no doubt about it,
>
> >and your comment is incorrect.
>
>
>
> This may be a good opportunity to show how nice ITC is in this
>
> respect.
>
>
>
> The possibility to "patch" an existing word is used in ciforth all
>
> over the place.
>
> Technically it patches the dfa not the cfa as it replaces a pointer
>
> to high level code. So no change to assembler code is needed.
>
>
>
> 1. You can turn the dictionary (FIND) into a case insensitive one.
>
> 2. you can print the stack after each execution, or change the prompt
>
> 3. You can turn a word into a postfix word for once, then it flips
>
> back.
>
> 4. It is used for turnkeys, by patching into ABORT
>
> 5. ?ERROR is patched to allow better error recovery
>
> 6. Revectoring I/O by patching TYPE. (Intentionally all output is via TYPE)
>
> 7. ALIAS works by copying three words, works for low level too.
>
> 8. Run time buffers, that don't take up space in the executable (dfa)
>
> 9. Changing in CONSTANT afterwards.
>
>
>
> >
>
> >Andrew.

>
> --
>
> Albert van der Horst, UTRECHT,THE NETHERLANDS
>
> Economic growth -- being exponential -- ultimately falters.
>
> albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

I Like your points here. In ITC is it as simple as the cfa either points to inline asm code or another address pointing to system standards such as DOCOL's asm code, other primitives, and such.

Clyde

Clyde W. Phillips Jr.

unread,

Dec 6, 2012, 11:30:16 PM12/6/12

to

On Wednesday, November 28, 2012 11:56:41 PM UTC-6, Hugh Aguilar wrote:
> On Nov 28, 4:51 am, Mark Wills <forthfr...@gmail.com> wrote:
>
> > On Nov 28, 11:27 am, Andrew Haley <andre...@littlepinkcloud.invalid>

>
> > wrote:
>
> >
>
> > > Mark Wills <forthfr...@gmail.com> wrote:
>
> > > > On Nov 28, 10:41?am, Andrew Haley <andre...@littlepinkcloud.invalid>
>
> > > > wrote:
>
> > > >> Mark Wills <forthfr...@gmail.com> wrote:
>
> > > >> > On Nov 28, 1:42?am, Hugh Aguilar <hughaguila...@yahoo.com> wrote:
>
> >
>
> > > >> >> With ITC it is possible to change how a word is interpreted by
>
> > > >> >> changing the pointer at the cfa. With DTC, by comparison, you don't
>
> > > >> >> have a pointer to the code that interprets the word, but rather you
>
> > > >> >> have the code itself pasted in there.
>
> >
>
> > > >> > I don't think that's correct, Hugh.
>
> >
>
> > > >> I'm sure it is.
>
> >
>
> > > > Eh?
>
> >
>
> > > I don't understand the problem you're having with my reply. With ITC
>
> > > it is possible to change how a word is interpreted by changing the
>
> > > pointer at the cfa. This is simply true, there is no doubt about it,
>
> > > and your comment is incorrect.
>
> >
>

> > > Andrew.
>
> >
>
> > Oh. Okay. Yes, I see. I should have read Hugh's reply more closely
>
> > before posting.
>
> >
>
> > So, in the CFA of an ITC system there is a pointer to DOCOL, DOVAR
>
> > whatever/etc. In DTC it's slightly more complicated, since the CFA
>
> > field would contain executable code. In my particular processor of
>
> > choice, the CFA field would be two cells wide. Yes, I can see that
>
> > patching it to change how it is interpreted could be a pain. Though I
>
> > suppose a dedicated helper word(s) could be provided to facilitate it.
>
> > Like Hugh says, it would be quite a rare occurence.
>
> >
>
> > Okay. I get it. Sorry for the confusion.
>
> >
>
> > Mark
>
>
>
> I think that you have got it, but I'm not sure, so I'll go over it
>
> again:
>
>
>
> 1.) In ITC what you have in front of the threaded code of the colon
>
> word (or the body of the whatever), is a single pointer to the
>
> interpreter for that kind of word (DOCOLON for colon words, etc.). It
>
> is easy to store a different pointer in that slot, so the word will be
>
> interpreted differently (DOCOLON-WITH-SINGLE-STEPPING for colon words,
>
> for example).
>
>
>
> 2.) In DTC what you have in front of the threaded code of the colon
>
> word (or the body of the whatever), is the actual machine-code of the
>
> interpreter for that kind of word (DOCOLON for colon words, etc.). It
>
> is difficult to patch this code, because it is actual code, rather
>
> than a pointer to some code.
>
>
>
> 3.) There is a kind of hybrid between ITC and DTC. This is DTC in the
>
> sense that we have machine-code in front of each word. However, this
>
> machine-code always consists of a single CALL instruction to DOCOLON
>
> etc.. When a CALL is executed, it puts the address just after itself
>
> on the processor return-stack. Normally this is for RET to use to go
>
> back. Here is the clever part though: this is the address of the body
>
> of the Forth word. DOCOLON can load this address into the IP and begin
>
> interpreting. In this case, you get DTC which is faster than ITC, but
>
> you also get an easy way to change how a word is interpreted (just
>
> store a new pointer into the operand of the CALL instruction).
>
>
>
> The PDP-11 had an interesting feature. The JSR (its term for CALL)
>
> would store the address after itself into a register, and it would
>
> first push that register onto the return-stack. Effectively, the top
>
> value of the return-stack was held in a register. But that register
>
> could be your IP! You have DTC code and just do a JSR to DOCOLON (#3
>
> above), and DOCOLON automatically gets the address of the threaded
>
> code loaded into the IP. I figured this out way back in 1985 when I
>
> was taking a class in assembly-language at the city college, which was
>
> PDP-11. This works so well, that I had to suppose that the designers
>
> of the PDP-11 were Forth programmers, or at least, were trying to
>
> support DTC threaded code. I've never seen this feature on any other
>
> processor. Even in 1985 though, the PDP-11 was obsolete --- the city
>
> college was still teaching it just because they had all the textbooks,
>
> but the professor cheerfully admitted that the PDP-11 was obsolete and
>
> we would never use what we learned in the real world. I've always
>
> thought that the PDP-11 was pretty cool though --- I wish somebody
>
> would come out with a micro-controller that runs PDP-11 code and RT11
>
> and all that --- maybe on an FPGA.
>
>
>
> BTW: There is a discussion of threading over on comp.lang.asm.x86:
>
> https://groups.google.com/group/comp.lang.asm.x86/browse_thread/thread/971adcb57df96272
>
>
>
> Mark: Since your TI Forth system is ITC, why don't you take a stab at
>
> writing a single-step source-level debugger? As I mentioned, I wrote
>
> one for my 65c02 system. It is not as difficult as you might suppose.
>
> I did it with screen-file source-code. It can be done with seq-file
>
> source-code though, I would suppose. I don't think that a debugger is
>
> all that useful, but writing one is pretty interesting --- and your
>
> users will be impressed. :-)
>
>
>
> Have fun! Hugh

Good points Hugh. I too didi the PDP-11 asm test. You get lambasted a lot but this is coo with me.

Alex McDonald

unread,

Dec 7, 2012, 5:59:24 AM12/7/12

to

On Dec 7, 12:03 am, "Rod Pemberton" <do_not_h...@notemailnotz.cnm>
wrote:
> "Alex McDonald" <b...@rivadpm.com> wrote in message

IBM do nothing with embedded PCs, PLCs, ECMs for cars, and have no
home market.

http://www.eweek.com/it-management/ibms-3q-earnings-flat-as-revenues-dip-5-percent/

Brad Eckert

unread,

Dec 7, 2012, 12:00:26 PM12/7/12

to

On Thursday, December 6, 2012 2:41:44 PM UTC-7, Albert van der Horst wrote:
> In article <7xfw3jn...@ruckus.brouhaha.com>,

>
> Last job I was in they were switching from PLC to industrial PC's running
> Windows and emulating PLC. Now that is progress ;-)
>

If an hour of down time costs as much as several real PLCs, it may not be progress in the right direction.

Albert van der Horst

unread,

Dec 7, 2012, 8:27:25 PM12/7/12

to

In article <50c118ea$0$6549$9b4e...@newsspool4.arcor-online.net>,

A. K. <a...@nospam.org> wrote:
>On 06.12.2012 22:41, Albert van der Horst wrote:
>> Last job I was in they were switching from PLC to industrial PC's running
>> Windows and emulating PLC. Now that is progress ;-)
>>
>> Groetjes Albert
>>
>
>A bit oversimplifying, aren't we? ;-)

Stating a fact cannot be oversimplification.

>
>Windows (or Linux) for browser-based man-machine-interface. By the way
>tablet PCs as maintenance tools have also become very common.
>
>Intel-based hardware (say industrial PCs) as PLC CPUs but with
>(redundant) industrial interface to inputs/outputs or communication
>links to 3rd party systems. So it's really not desktop PC technology,
>and it's really not Windows as realtime OS.
>
>Special hardware and software for any safety-related PLCs.

Industrial PC's, with specifically designed hardware yes, and
OEM supplied drivers yes, but running MS-Windows.

A. K.

unread,

Dec 8, 2012, 5:19:28 AM12/8/12

to

On 08.12.2012 02:27, Albert van der Horst wrote:
> Industrial PC's, with specifically designed hardware yes, and
> OEM supplied drivers yes, but running MS-Windows.
>
> Groetjes Albert

In most industries you have to follow the NERC safety standards when
selling PLCs.

I wonder how they got along with it using a vulnerable OS like Windows
as controller OS.

Regards
Andreas

Mark Wills

unread,

Dec 8, 2012, 6:55:24 AM12/8/12

to

Simple. They're not 'on the critical path' - in other words, if they
fall over, nobody dies.

Industrial PC's are used a lot for things like HMI's/MMI's or data
gathering systems. But anything that is mission critical will *not* be
*controlled* by a PC based system or *rely* on a PC based system*. It
will be controlled by an appropriately rated system, perhaps with a
SIL rating, and the application code will have been designed,
documented, and written in accordance with appropriate standards, e.g.
IEC 61508.

https://en.wikipedia.org/wiki/IEC_61508

* they may be *monitored* by PC based systems, in fact, they almost
always are, but *monitoring* is not *controlling*. For example, every
major DCS vendor (ABB, Emerson, Kongsberg Maritime, Honeywell,
Invensys et al) has autonomous industrial controllers (non PC) hooked
up to PC based (and in every case, Windows based) monitoring systems.

Bernd Paysan

unread,

Dec 8, 2012, 7:37:49 AM12/8/12

to

A. K. wrote:
> I wonder how they got along with it using a vulnerable OS like Windows
> as controller OS.

Hahaha, have you ever heard of Stuxnet? The work written by Mossad and
CIA, to attack the Siemens PLCs that controlled the Uranium enrichment
plant in Iran? Note that Stuxnet actually managed to get from the
Windows PC (the primary infection target) onto the PLC, too.

The assumption is that such an industral control system is not
vulnerable, because it is not directly connected to the Internet, and no
stupid idiot is plugging in USB sticks...

Security? Through obscurity. The CIA did obtain internal documents
about the Siemens PLC.

They get along with Windows by being ignorant and stupid, just like the
rest of the world does.

Bernd Paysan

unread,

Dec 8, 2012, 7:44:28 AM12/8/12

to

Mark Wills wrote:
> Simple. They're not 'on the critical path' - in other words, if they
> fall over, nobody dies.

You are wrong, and you apparently didn't hear about Stuxnet. Well,
nobody died, but that's not the point: The attack did cause considerable
damage, and was a significant setback in the Iranian Uranium enrichment
program. It's *not* safe and secure against a malicious attacker;
though of course, the Stuxnet attack was quite expensive.

A. K.

unread,

Dec 8, 2012, 8:41:15 AM12/8/12

to

On 08.12.2012 13:37, Bernd Paysan wrote:
> A. K. wrote:
>> I wonder how they got along with it using a vulnerable OS like Windows
>> as controller OS.
>
> Hahaha, have you ever heard of Stuxnet? The work written by Mossad and
> CIA, to attack the Siemens PLCs that controlled the Uranium enrichment
> plant in Iran? Note that Stuxnet actually managed to get from the
> Windows PC (the primary infection target) onto the PLC, too.

Yes, the Stuxnet worm used the S7 WinCC HMI running on Windows,
exploiting Window's "USB plug'n play feature". Once in the system it
migrated to the PLC level and took control over some control blocks.

One could imagine a small injected Forth kernel could be an excellent
malware carrier of that sort too.

By the way, original Siemens PCS7 installations disable USB drivers and
ports. But the Iranians must have "cloned" it with standard PCs. They
had to, because of the embargo.

Mark Wills

unread,

Dec 8, 2012, 9:05:54 AM12/8/12

to

The nuclear industry, with respect to safety-instrumented-systems is
governed by IEC-61513 and 61511. 61508 must figure in the mix too,
since nuclear systems will be a minimum of SIL-3. Paul Bennett may
have more background than me in nuclear. I'm from the oil and gas
industry. We have SIL-3 systems (HIPPS systems) but they are hard-
wired discrete-logic systems. No software and no CPU. Anyway, as far
as I can remember, none of the standards mentioned above even mention
network security. It's all about (very basically) redundancy and
reducing the probability of failure.

In the case of Stuxnet, the virus (which was unwittingly transmitted
by USB sticks owned by the maintenance engineers to the engineering
network (note: not the process network)) was programmed to target a
communications card in the Siemens PLC racks. That comms card comes
with a factory default password (you can telnet into it). It used that
password to 'log in' to the PLC and upload root the PLC with hacked
firmware, and upload subtly different process control logic.

So, in this case, the 'cause' of the problem was, once again, humans,
who couldn't be bothered (or didn't know) to change the default
password. It was no failure of the PLC system per-se. There's a nice
article about it on Wikipedia but it doesn't mention the TCP/IP card
password issue, possibly at Siemens' request? Dunno. I'm speculating
on that point. I heard first-hand (sort of) from the Siemens rep that
visits us once every few months. IIRC I think the virus also replaced
certain DLLs in the Siemens PC based PLC programming software (called
Step-7), though I can't recall the reason. It might have been to hide
the fact that the PLC logic and/or firmware itself had been changed -
so that when engineers reviewed the code running in the PLC, it
*looked* correct, but it was in fact running different logic.

So. Human Factors were to blame. The engineers carrying USB sticks
around, and not setting passwords. In the late 80's it used to be
sport to dial around the MilNet looking for VAX machines and log in
with a username of engineer and a blank password. See
https://en.wikipedia.org/wiki/The_Cuckoo%27s_Egg

It's the same thing IMO. Ignorance.

It's also ironic that it's precisely because PLCs have become more
sophisticated, in particular, opening themselves up to TCP/IP and OPC
that attacks like this are now possible. They were more secure a
decade ago when they each ran their own proprietary protocols on
closed networks. Those days are gone though.

Mark Wills

unread,

Dec 8, 2012, 9:10:16 AM12/8/12

to

On Dec 8, 12:44 pm, Bernd Paysan <bernd.pay...@gmx.de> wrote:

>It's *not* safe and secure against a malicious attacker;
> though of course, the Stuxnet attack was quite expensive.

Subtle change in topic in order to score a point noted.

I never mentioned security. I mentioned reliability.

A PLC will run your code for 20 years without falling over. A Windows
PC won't.