well formed flag ?

Chris Hinsley

unread,

Oct 27, 2012, 10:59:48 AM10/27/12

to

Folks, I'm doing some tweaking to my hobby Forth compiler and just been
doing some inlineing work.

I'm not very happy about the size of the x86 code for comparision words
(I know I could reduce the code size if I had a full optimizing code
generator, but I don't yet…) this is my code for "<"

defword ">", 0, WORD_GT, WORD_INLINE_COMMA
POPDSP eax
cmp eax, ebx
setg bl
movzx ebx, bl
neg ebx
ret
defword_end

I followed the 'well formed flag' comments in Ms Rathers book for this,
but the IF and TRUE, FALSE words only talk about 0 for false or not
zero for true. Is it definately the case that Forth needs -1 to come
from the comparason words ?

Are you allowed to write code that _needs_ the -1 from the ">" ? ie if
it produce just a 1, would that be incorrect ?

Chris

Chris Hinsley

unread,

Oct 27, 2012, 11:03:10 AM10/27/12

to

I might also add that my 0BRANCH word is:

defword "0BRANCH", 0, WORD_ZBRANCH, WORD_INLINE_COMMA
mov eax, ebx
POPDSP ebx
test eax, eax
jz strict near i_jmp
ret
defword_end

Regards

Chris

Alex McDonald

unread,

Oct 27, 2012, 11:33:03 AM10/27/12

to

: < ( 2 -- 1 )
\ ' nseopt compiles; code=$4014DF len=16 type=1
\ defined in src\kernel\gkernel32.f at line 564
( $0 ) mov edx eax \ 8BD0
( $2 ) or ecx $-1 \ 83C9FF
( $5 ) xor eax eax \ 33C0
( $7 ) cmp dword { $0 ebp } edx \ 395500
( $A ) cmovl eax ecx \ 0F4CC1
( $D ) add ebp $4 \ 83C504
( $10 ) ret \ C3 ( end ) ok

EAX is top of stack, EBP is the stack pointer and { EBP } is next to
top.

I'd definitely stick to well formed flags since this;

variable var 0 var !
: foo 10 < negate var +! ;

is well-formed Forth.

Alex McDonald

unread,

Oct 27, 2012, 12:06:07 PM10/27/12

to

There's also

: >= ( 2 -- 1 )
( $0 ) sub eax dword { $0 ebp } \ 2B4500
( $3 ) cdq \ 99
( $4 ) mov eax edx \ 8BC2
( $6 ) add ebp $4 \ 83C504
( $9 ) ret \ C3 ( end ) ok

and correspondingly < is neg eax

Alex McDonald

unread,

Oct 27, 2012, 12:26:24 PM10/27/12

to

Sorry, wrong way round. I should cut & paste more accurately...

: < ( 2 -- 1 )

\ ' nseopt compiles; code=$4014DF len=9 type=1
\ defined in src\kernel\gkernel32.f at line 574
( $0 ) sub dword { $0 ebp } eax \ 294500

Mark Wills

unread,

Oct 27, 2012, 12:31:24 PM10/27/12

to

Yes, you want well formed flags otherwise code such as the following
might not do what you want:

BEGIN
MyFile EOF? NOT WHILE
PAD MyFile #Get PAD COUNT TYPE
REPEAT
MyFile #Close

I had this issue on Thursday! EOF was not returning a boolean value
for EOF. It was returning 8. Ouch.

Chris Hinsley

unread,

Oct 27, 2012, 12:44:47 PM10/27/12

to

One thing I just noticed is that NASM isn't encodeing small constant
adds as a single byte, it's useing the full 4 byte encodeing ! I don't
recall there being an option to tell NASM to optimise constants, I
thought there was only -O flags to say optimize branches ?

Need to double check the NASM manual...

A. K.

unread,

Oct 27, 2012, 1:21:33 PM10/27/12

to

On 27.10.2012 18:31, Mark Wills wrote:
>
> Yes, you want well formed flags otherwise code such as the following
> might not do what you want:
>
> BEGIN
> MyFile EOF? NOT WHILE
> PAD MyFile #Get PAD COUNT TYPE
> REPEAT
> MyFile #Close
>
> I had this issue on Thursday! EOF was not returning a boolean value
> for EOF. It was returning 8. Ouch.
>

In that case not EOF? is to blame, but NOT. I should produce a boolean.

Mark Wills

unread,

Oct 27, 2012, 1:33:30 PM10/27/12

to

Not in Forth 83 it isn't. It's a ones compliment. My system is a Forth
83 system.

http://forthworks.com/standards/F83/fst83-12.htm#not

Chris Hinsley

unread,

Oct 27, 2012, 2:25:28 PM10/27/12

to

>> : < ( 2 -- 1 )
>> \ ' nseopt compiles; code=$4014DF len=16 type=1
>> \ defined in src\kernel\gkernel32.f at line 564
>> ( $0 ) mov edx eax \ 8BD0
>> ( $2 ) or ecx $-1 \ 83C9FF
>> ( $5 ) xor eax eax \ 33C0
>> ( $7 ) cmp dword { $0 ebp } edx \ 395500
>> ( $A ) cmovl eax ecx \ 0F4CC1
>> ( $D ) add ebp $4 \ 83C504
>> ( $10 ) ret \ C3 ( end ) ok
>>
>> EAX is top of stack, EBP is the stack pointer and { EBP } is next to
>> top.

defword "<", 0, WORD_LT, WORD_INLINE_COMMA
cmp [ebp], ebx
setl bl
sub ebp, byte 4

movzx ebx, bl
neg ebx
ret
defword_end

This is shorter at 14 bytes rather than 16 ! ;)

Chris

Chris Hinsley

unread,

Oct 27, 2012, 2:33:44 PM10/27/12

to

> defword "<", 0, WORD_LT, WORD_INLINE_COMMA
> cmp [ebp], ebx
> setl bl
> sub ebp, byte 4
> movzx ebx, bl
> neg ebx
> ret
> defword_end
>
> This is shorter at 14 bytes rather than 16 ! ;)
>
> Chris

That should of course be add ebp, byte 4 !!!

Chris

Rod Pemberton

unread,

Oct 27, 2012, 2:52:49 PM10/27/12

to

"Chris Hinsley" <chris....@gmail.com> wrote in message
news:2012102717444755036-chrishinsley@gmailcom...

> One thing I just noticed is that NASM isn't encodeing small constant
> adds as a single byte, it's useing the full 4 byte encodeing !

Old style NASM produced the most generic instruction encoding, i.e., large
offsets. With NASM, you need BYTE keyword for imm8 and maybe mem8
encodings. That's before H.P.A. started messing with NASM's syntax and made
it more MASM like. So, for new versions of NASM w/64-bit support, I'd check
the manual for any syntax changes.

NASM requires you to specify data sizes when it's indeterminable from the
instruction. This is the opposite of assemblers like MASM where you use
keywords to specify pointer types. GAS (GCC) places sizes on the
instruction name. Unfortunately, that produces naming conflicts with Intel
instructions names for a few instructions.

> I don't
> recall there being an option to tell NASM to optimise constants, I
> thought there was only -O flags to say optimize branches ?
>

Maybe, ask on comp.lang.asm.x86 or alt.lang.asm.

Rod Pemberton

Elizabeth D. Rather

unread,

Oct 27, 2012, 2:54:10 PM10/27/12

to

On 10/27/12 4:59 AM, Chris Hinsley wrote:
> Folks, I'm doing some tweaking to my hobby Forth compiler and just been
> doing some inlineing work.
>
> I'm not very happy about the size of the x86 code for comparision words
> (I know I could reduce the code size if I had a full optimizing code

> generator, but I don't yetâ€¦) this is my code for "<"

>
> defword ">", 0, WORD_GT, WORD_INLINE_COMMA
> POPDSP eax
> cmp eax, ebx
> setg bl
> movzx ebx, bl
> neg ebx
> ret
> defword_end
>
> I followed the 'well formed flag' comments in Ms Rathers book for this,
> but the IF and TRUE, FALSE words only talk about 0 for false or not zero
> for true. Is it definately the case that Forth needs -1 to come from the
> comparason words ?
>
> Are you allowed to write code that _needs_ the -1 from the ">" ? ie if
> it produce just a 1, would that be incorrect ?

Yes, the standard (everything since Forth83) provides that the
comparison words return well-formed flags, so they can be used with
things like AND, as masks, etc., even though IF, WHILE, and UNTIL don't
require well-formed flags.

In SwiftForth:
see >
354F EBX 0 [EBP] CMP 395D00
3552 0 # EBX MOV BB00000000
3557 355A JLE 7E01
3559 EBX DEC 4B
355A 4 # EBP ADD 83C504
355D RET C3 ok

Cheers,
Elizabeth

--
==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
http://www.forth.com

"Forth-based products and Services for real-time
applications since 1973."
==================================================

Chris Hinsley

unread,

Oct 27, 2012, 3:06:31 PM10/27/12

to

Thanks Rod, I found the relavent section in the NASM manual. You do
need to specify 'byte' on constant loads of a 32bit reg from an imm8 !
So I changed my macro to do this:

; adjust data stack
%macro ADDDSP 1
%if ((%1 >= -128) && (%1 < 128))
add ebp, byte %1
%else
add ebp, %1
%endif
%endm

This has shorten quite a few words ! :)

Chris

Chris Hinsley

unread,

Oct 27, 2012, 3:11:25 PM10/27/12

to

> In SwiftForth:
> see >
> 354F EBX 0 [EBP] CMP 395D00
> 3552 0 # EBX MOV BB00000000
> 3557 355A JLE 7E01
> 3559 EBX DEC 4B
> 355A 4 # EBP ADD 83C504
> 355D RET C3 ok
>
> Cheers,
> Elizabeth

Ouch, not keen on the JLE there ! I no branch predicators have got better but !

How about useing 'xor ebx,ebx' rather than the mov too !

Chris

Chris Hinsley

unread,

Oct 27, 2012, 3:33:45 PM10/27/12

to

> : < ( 2 -- 1 )
> \ ' nseopt compiles; code=$4014DF len=9 type=1
> \ defined in src\kernel\gkernel32.f at line 574
> ( $0 ) sub dword { $0 ebp } eax \ 294500
> ( $3 ) cdq \ 99
> ( $4 ) mov eax edx \ 8BC2
> ( $6 ) add ebp $4 \ 83C504
> ( $9 ) ret \ C3 ( end ) ok

Here your wringting back the result of the subtract to [ebp], which is
not required, wasting memory bandwidth. Why not change it to a 'cmp
[ebp], eax' ?

I like the trick with cdq ! Neat !

Chris

Chris Hinsley

unread,

Oct 27, 2012, 3:38:40 PM10/27/12

to

Hang on, I may have got my src and dest back to front ! Sorry, it
actually subtrating [ebp] from eax ! And then setting edx to 0/-1 by
extending the top bit of eax !

Apologies !

Chris

Alex McDonald

unread,

Oct 27, 2012, 3:52:02 PM10/27/12

to

That doesn't set EAX, just the flags, so CDQ generates the wrong sign
in EDX. SUB is required.

>
> I like the trick with cdq ! Neat !
>
> Chris

Ignore it anyway. It fails the Hsyes core tests due to overflow & an
incorrect sign on maximum sized 32bit integers.

Chris Hinsley

unread,

Oct 27, 2012, 4:01:40 PM10/27/12

to

Phew ! I was getting jelous and starting to wonder why I picked ebx as
top of stack rather than eax ! ;)

Chris

Coos Haak

unread,

Oct 27, 2012, 4:18:49 PM10/27/12

to

Op Sat, 27 Oct 2012 21:01:40 +0100 schreef Chris Hinsley:

Because it was natural in (your old, my current) 16 bit x86 implementation?

> Chris

--
Coos

CHForth, 16 bit DOS applications
http://home.hccnet.nl/j.j.haak/forth.html

Coos Haak

unread,

Oct 27, 2012, 4:21:31 PM10/27/12

to

Op Sat, 27 Oct 2012 10:33:30 -0700 (PDT) schreef Mark Wills:

Since F83 there is no NOT
Try 0= or INVERT

Alex McDonald

unread,

Oct 27, 2012, 6:08:25 PM10/27/12

to

Yes, but it may run slower.

Chris Hinsley

unread,

Oct 27, 2012, 7:53:11 PM10/27/12

to

Not sure about that, its 1 instruction less, and should probably
shedule ok on a modern micro op x86 design. Only hard benchmarks would
tell. I'll probably leave it as that for now, better than what I had.

Regards

Chris

Ed

unread,

Oct 27, 2012, 8:05:55 PM10/27/12

to

Chris Hinsley wrote:
> ...

> Are you allowed to write code that _needs_ the -1 from the ">" ? ie if
> it produce just a 1, would that be incorrect ?

A well-formed flag benefits the programmer since flags are often used
to logically mask previous values on the stack. The well-formed flag
was introduced in Forth-83.

That Forth requires flags and results be left as discrete values on a
stack is one of the reasons Forth won't beat C in the speed stakes.
Highly optimized forths such as VFX go to length to eliminate these
by in-lining sequences of forth words as pure machine code. Thus
the result of a comparison is reflected at the CPU level, not at the
Forth level. It's only the start/end values that appear on the stack.
The downside is that a good forth optimizer isn't cheap. IIRC Stephen
once quoted a figure of 50,000 lines of code. And still it may not be
as fast as a good C compiler.

On the plus side, Forth has incremental compile and test and an
assembler for writing code words. For many, this compensates
for the less than stellar speed performance.

Ed

unread,

Oct 27, 2012, 9:33:24 PM10/27/12

to

Yes, and it was considered one the worst decisions of Forth-83 - even
before the ink had dried.

Elizabeth D. Rather

unread,

Oct 27, 2012, 9:46:07 PM10/27/12

to

And I agree with that wholeheartedly. By the time we started work on ANS
Forth, though, it had become common practice. The solution was to
replace NOT with either NEGATE (arithmetic inversion) or INVERT (invert
bits). All compromises have flaws, but at least you know what you're
getting.

Rod Pemberton

unread,

Oct 28, 2012, 4:19:57 AM10/28/12

to

"Chris Hinsley" <chris....@gmail.com> wrote in message

news:2012102721014029342-chrishinsley@gmailcom...

> Phew ! I was getting jelous and starting to wonder why I picked ebx as
> top of stack rather than eax ! ;)
>

It seems that only XLAT uses EBX. So, it's more flexible in terms of
instruction choice, or more likely to not be clobbered. ECX or CL is used
by loop, branching, and rotate or shift instructions. EDX is used by
multiply and division instructions. EAX is used by multiply, division,
string instructions, etc. If you're moving immediates into registers, quite
a few x86 instructions have shorter forms for the EAX register.

Rod Pemberton

Stephen Pelc

unread,

Oct 28, 2012, 7:53:04 AM10/28/12

to

On Sun, 28 Oct 2012 11:05:55 +1100, "Ed" <inv...@nospam.com> wrote:

>The downside is that a good forth optimizer isn't cheap. IIRC Stephen
>once quoted a figure of 50,000 lines of code.

Aout 5,000 lines for many CPUs. 6,000 lines for the x86 version.
Then add the code for the assembler and disassembler.

>And still it may not be as fast as a good C compiler.

The important comparison is the number of hours spent on the
code generators for C and Forth.

Stephen

--
Stephen Pelc, steph...@mpeforth.com
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691
web: http://www.mpeforth.com - free VFX Forth downloads

Bernd Paysan

unread,

Oct 28, 2012, 9:17:39 AM10/28/12

to

Stephen Pelc wrote:
> The important comparison is the number of hours spent on the
> code generators for C and Forth.

IMHO VFX is limited on x86 by the numbers of registers. As are C
compilers. When I wrote my Wurstkessel cryptography core, Marcel did
compare the C compiler result with iForth on x64, where more registers
are available, and the speed was about the same.

On x86, Andy Glew (one of the designers of the Pentium Pro) suggested
adding an instruction for "well formed flags" in the Forth sense - which
could be used for bit-wise operations. It got turned down by senior
management, because C compilers couldn't make any benefit from it;
though assembly code would.

Later, in the SSE instruction set, the idea was revived, and indeed
implemented: SSE compares store their result flags as either all bits
set or all bits cleared in the destination register.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/

Marcel Hendrix

unread,

Oct 28, 2012, 10:59:16 AM10/28/12

to

Bernd Paysan <bernd....@gmx.de> writes Re: well formed flag ?

> Stephen Pelc wrote:
>> The important comparison is the number of hours spent on the
>> code generators for C and Forth.

> IMHO VFX is limited on x86 by the numbers of registers. As are C
> compilers. When I wrote my Wurstkessel cryptography core, Marcel did
> compare the C compiler result with iForth on x64, where more registers
> are available, and the speed was about the same.

I redid these just now on an i7 2.66 GHz machine:

64bits - 0.985 seconds; Speed: 518,743,667 bytes/sec.
32bits - 2.981 seconds; Speed: 171,696,847 bytes/sec. ( rngs_wurst in CODE )
32bits - 4.741 seconds; Speed: 107,971,320 bytes/sec. ( high-level )

> On x86, Andy Glew (one of the designers of the Pentium Pro) suggested
> adding an instruction for "well formed flags" in the Forth sense - which
> could be used for bit-wise operations. It got turned down by senior
> management, because C compilers couldn't make any benefit from it;
> though assembly code would.

Rngs_wurst is in CODE because the carry bit is needed.

: rngs_wurst ( ud1 index -- ud2 )
64s ( 8 * ) 'rngs + 64@
2>R
DUP 0< >R D2* R> DUP D- 2R>
ROT XOR >R XOR R> ; PRIVATE

> Later, in the SSE instruction set, the idea was revived, and indeed
> implemented: SSE compares store their result flags as either all bits
> set or all bits cleared in the destination register.

-marcel

Bernd Paysan

unread,

Oct 28, 2012, 1:37:20 PM10/28/12

to

Marcel Hendrix wrote:
>> IMHO VFX is limited on x86 by the numbers of registers. As are C
>> compilers. When I wrote my Wurstkessel cryptography core, Marcel did
>> compare the C compiler result with iForth on x64, where more
>> registers are available, and the speed was about the same.
>
> I redid these just now on an i7 2.66 GHz machine:
>
> 64bits - 0.985 seconds; Speed: 518,743,667 bytes/sec.
> 32bits - 2.981 seconds; Speed: 171,696,847 bytes/sec. ( rngs_wurst in
> CODE ) 32bits - 4.741 seconds; Speed: 107,971,320 bytes/sec. (
> high-level )

In theory, Wurstkessel should take about twice the time on a 32 bit
processor than on a 64 bit processor. However, when we compile high-
level code for both, it's a factor 5. The way the code is arranged
gives a benefit on x64 with 8 accumulator registers - a version tuned
for 32 bits would use one accumulator after the other (or maybe two
accumulators, 4 registers), but will have a lot less intrinsic
parallelism.

Ed

unread,

Oct 29, 2012, 8:46:51 AM10/29/12

to

Bernd Paysan wrote:
> Stephen Pelc wrote:
> > The important comparison is the number of hours spent on the
> > code generators for C and Forth.
>
> IMHO VFX is limited on x86 by the numbers of registers. As are C
> compilers. When I wrote my Wurstkessel cryptography core, Marcel did
> compare the C compiler result with iForth on x64, where more registers
> are available, and the speed was about the same.

How many C compilers were tested?

> On x86, Andy Glew (one of the designers of the Pentium Pro) suggested
> adding an instruction for "well formed flags" in the Forth sense - which
> could be used for bit-wise operations. It got turned down by senior
> management, because C compilers couldn't make any benefit from it;
> though assembly code would.
>
> Later, in the SSE instruction set, the idea was revived, and indeed
> implemented: SSE compares store their result flags as either all bits
> set or all bits cleared in the destination register.

I expect C (and similar languages) to generate better code, more easily,
because they operate at a higher level of abstraction than does Forth.
Unlike other languages, Forth is (as you point out) highly reliant on registers
and optimizers. It needs these to eliminate the inefficiency of pushing items
around a virtual stack - something which conventional cpu's were never
designed to do.

Ed

unread,

Oct 29, 2012, 8:47:23 AM10/29/12

to

Stephen Pelc wrote:
> On Sun, 28 Oct 2012 11:05:55 +1100, "Ed" <inv...@nospam.com> wrote:
>
> >The downside is that a good forth optimizer isn't cheap. IIRC Stephen
> >once quoted a figure of 50,000 lines of code.
>
> Aout 5,000 lines for many CPUs. 6,000 lines for the x86 version.
> Then add the code for the assembler and disassembler.

No doubt you're glad it's not 50,000 :)

Andrew Haley

unread,

Oct 29, 2012, 9:33:53 AM10/29/12

to

Ed <inv...@nospam.com> wrote:
> I expect C (and similar languages) to generate better code, more
> easily, because they operate at a higher level of abstraction than
> does Forth.

On what is this opinion based?

> Unlike other languages, Forth is (as you point out) highly reliant
> on registers and optimizers.

And C isn't?

> It needs these to eliminate the inefficiency of pushing items around
> a virtual stack - something which conventional cpu's were never
> designed to do.

They weren't designed to evaluate expression trees either. What you
perhaps don't realize is that expression trees and RPN are equivalent:
there is a 1:1 mapping between the two. In both cases you have to
convert the abstract form into instructions that push data between
memory and registers.

Andrew.

Paul Rubin

unread,

Oct 29, 2012, 11:25:17 AM10/29/12

to

Andrew Haley <andr...@littlepinkcloud.invalid> writes:
> What you perhaps don't realize is that expression trees and RPN are
> equivalent: there is a 1:1 mapping between the two.

I think not quite right--RPN is more like linear logic, unless you have
locals or something like PICK. This is apparent from Koopman's book
where he talks about ways to compile C into Forth. There's also a cool
paper by Henry Baker:

http://home.pipeline.com/~hbaker1/ForthStack.html

> In both cases you have to convert the abstract form into instructions
> that push data between memory and registers.

If the Forth code is written in the classic style (using global
variables to hold temporary values when stack shuffling gets too
complex) then it may be harder to compile good code.

Andrew Haley

unread,

Oct 29, 2012, 1:27:11 PM10/29/12

to

Paul Rubin <no.e...@nospam.invalid> wrote:
> Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>> What you perhaps don't realize is that expression trees and RPN are
>> equivalent: there is a 1:1 mapping between the two.
>
> I think not quite right--RPN is more like linear logic, unless you have
> locals or something like PICK.

Of course you have locals. We've had locals in Forth for 20 years.

> This is apparent from Koopman's book where he talks about ways to
> compile C into Forth.

OK, I shouldn't have written that without a list of caveats. :-)

If we're going to nitpick, and I guess we are, expression trees and
RPN are equivalent in the absence of stack order manipulation and
loops: in that case the algorithm to convert between the two is pretty
trivial. In the presence of stack manipulation things get a bit more
complicated but it's a fairly simple matter of copying and creating
temporaries. (We did all of this in gcj, which converts stack code to
expression trees in a GCC front end.) There's certainly nothing
inherent to generating code for a register machine from stack code
that would make generation less efficient. Data flow analysis on the
stack is no different from data flow analysis between a bunch of local
variables. As Stephen has pointed out on many occasions, the only
additional thing you need to do in a Forth optimizer is a stack
shuffle at block boundaries.

> There's also a cool paper by Henry Baker:
>
> http://home.pipeline.com/~hbaker1/ForthStack.html
>
>> In both cases you have to convert the abstract form into instructions
>> that push data between memory and registers.
>
> If the Forth code is written in the classic style (using global
> variables to hold temporary values when stack shuffling gets too
> complex)

'tain't my classic style! I always had to bear multi-tasking in mind,
anyway.

> then it may be harder to compile good code.

The same applies to C, surely.

Andrew.

Paul Rubin

unread,

Oct 30, 2012, 11:18:56 AM10/30/12

to

Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>>> expression trees and RPN are equivalent: ...

> Of course you have locals. We've had locals in Forth for 20 years.

OK, in this case there's equivalence, it's just no longer pure RPN, but
no problem.

>> If the Forth code is written in the classic style (using global
>> variables to hold temporary values when stack shuffling gets too
>> complex)
>
> 'tain't my classic style! I always had to bear multi-tasking in mind,
> anyway.

Oh good point, by "global" I just meant "not local". In particular if
you store to a variable, any subroutine you call might access or modify
it, so you need whole-program analysis to tell if that has happeened.

> The same applies to C, surely.

There are a bunch of rules in C about when the compiler can assume
variables aren't aliased. I don't think Forth has anything like that.

Andrew Haley

unread,

Oct 30, 2012, 11:33:46 AM10/30/12

to

Paul Rubin <no.e...@nospam.invalid> wrote:
> Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>>>> expression trees and RPN are equivalent: ...
>> Of course you have locals. We've had locals in Forth for 20 years.
>
> OK, in this case there's equivalence, it's just no longer pure RPN, but
> no problem.

In what sense is it not "pure" RPN? IMO,

A B +

is still RPN, even if A and B are local variables. RPN is just how
you write expressions. It's no different from A + B or (+ A B) .

>>> If the Forth code is written in the classic style (using global
>>> variables to hold temporary values when stack shuffling gets too
>>> complex)
>>
>> 'tain't my classic style! I always had to bear multi-tasking in mind,
>> anyway.
>
> Oh good point, by "global" I just meant "not local". In particular if
> you store to a variable, any subroutine you call might access or modify
> it, so you need whole-program analysis to tell if that has happeened.

Sure, as in any language. That's my point: there isn't really much of
a difference in practice.

Andrew.

Paul Rubin

unread,

Oct 30, 2012, 11:48:13 AM10/30/12

to

Andrew Haley <andr...@littlepinkcloud.invalid> writes:
> In what sense is it not "pure" RPN? IMO,
> A B +
> is still RPN, even if A and B are local variables.

The impurity happens when you have to store to locals (or to the
interior of the stack) in the middle of the evaluation. I don't think
pure RPN has a way to do that.

Andrew Haley

unread,

Oct 30, 2012, 12:45:23 PM10/30/12

to

No, I'm sure it doesn't. But pure RPN doesn't have a way to store
into a global either: all RPN can do is expressions.

Andrew.

Stephen Pelc

unread,

Oct 30, 2012, 2:02:29 PM10/30/12

to

Oh, yes.

Bernd Paysan

unread,

Oct 30, 2012, 3:10:42 PM10/30/12

to

Paul Rubin wrote:
> There are a bunch of rules in C about when the compiler can assume
> variables aren't aliased. I don't think Forth has anything like that.

Standard Forth has IMHO 6 separate memory areas:

* dictionary + heap
* code memory
* data stack
* return stack
* floating point stack
* locals

You can only address the dictionary and heap with @ and !.

In any case, it would be *much* better for an optimizer if you
deliberately and explicitely can specify unaliased variables if it can
help the compiler.

Languages without arbitrary pointers like traditional Fortran have much
stricter rules for unaliased memories.

Alex McDonald

unread,

Oct 30, 2012, 6:10:23 PM10/30/12

to

On Oct 30, 8:10 pm, Bernd Paysan <bernd.pay...@gmx.de> wrote:
> Paul Rubin wrote:
> > There are a bunch of rules in C about when the compiler can assume
> > variables aren't aliased. I don't think Forth has anything like that.
>
> Standard Forth has IMHO 6 separate memory areas:
>
> * dictionary + heap
> * code memory
> * data stack
> * return stack
> * floating point stack
> * locals
>
> You can only address the dictionary and heap with @ and !.
>
> In any case, it would be *much* better for an optimizer if you
> deliberately and explicitely can specify unaliased variables if it can
> help the compiler.

DEFER causes an aliasing problem. Beyond DEFER, I don't think there
are any other cases.

Elizabeth D. Rather

unread,

Oct 30, 2012, 6:50:49 PM10/30/12

to

On 10/30/12 9:10 AM, Bernd Paysan wrote:
> Paul Rubin wrote:
>> There are a bunch of rules in C about when the compiler can assume
>> variables aren't aliased. I don't think Forth has anything like that.
>
> Standard Forth has IMHO 6 separate memory areas:
>
> * dictionary + heap
> * code memory

The correct ANS terminology for those to is "data space" and "code
space". Data space is accessible to user programs. The "dictionary" is
in two parts, "name space" (headers, etc.) and "code space" (regardless
of whether they're in the same address space). Neither name space nor
code space guaranteed accessible to user programs.

Ed

unread,

Oct 30, 2012, 8:44:30 PM10/30/12

to

Stephen Pelc wrote:
> On Mon, 29 Oct 2012 23:47:23 +1100, "Ed" <inv...@nospam.com> wrote:
>
> >Stephen Pelc wrote:
> >> On Sun, 28 Oct 2012 11:05:55 +1100, "Ed" <inv...@nospam.com> wrote:
> >>
> >> >The downside is that a good forth optimizer isn't cheap. IIRC Stephen
> >> >once quoted a figure of 50,000 lines of code.
> >>
> >> Aout 5,000 lines for many CPUs. 6,000 lines for the x86 version.
> >> Then add the code for the assembler and disassembler.
> >
> >No doubt you're glad it's not 50,000 :)
>
> Oh, yes.

You mention the disassembler. Is this used by the code generator
in any way?

Paul Rubin

unread,

Oct 30, 2012, 9:28:43 PM10/30/12

to

Alex McDonald <bl...@rivadpm.com> writes:
> DEFER causes an aliasing problem. Beyond DEFER, I don't think there
> are any other cases.

variable x
variable y x y ! \ y points at x
3 y @ ! \ sets x to 3

Stephen Pelc

unread,

Oct 31, 2012, 5:25:04 AM10/31/12

to

On Wed, 31 Oct 2012 11:44:30 +1100, "Ed" <inv...@nospam.com> wrote:

>You mention the disassembler. Is this used by the code generator
>in any way?

It's used by the humans testing the code generator.

Anton Ertl

unread,

Oct 31, 2012, 10:03:34 AM10/31/12

to

Paul Rubin <no.e...@nospam.invalid> writes:
>There are a bunch of rules in C about when the compiler can assume
>variables aren't aliased. I don't think Forth has anything like that.

Wrong: If we applied the same attitude as seems to be dominating among
C compiler writers (that miscompiling non-standard programs is ok or
even encouraged), there is no aliasing beween the memory accesses in
the sequence "F! ! c@". And something like that is pretty much all
that a C compiler can assume, too, even with the attitude above.
Fortunately the Forth implementors are saner and don't have that
attitude; or maybe they would be just as insane, if limits on compile
time and compiler writer time did not prevent them from implementing
optimizations that would profit from knowledge about non-aliasing.

If you are thinking of the keyword "restrict" in C, that is hardly
used. One might introduce such a feature in Forth, too, but currently
systems would not do anything with it, and even if they did, would
programmers use it? Judging from the C experience, most probably
would not.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2012: http://www.euroforth.org/ef12/

Anton Ertl

unread,

Oct 31, 2012, 10:57:54 AM10/31/12

to

Bernd Paysan <bernd....@gmx.de> writes:
>Languages without arbitrary pointers like traditional Fortran have much
>stricter rules for unaliased memories.

Fortran has one particular such rule: The compiler may pass parameters
by reference, but a program must not pass the same array to a
functioon through two parameters.

That rule is not present in other languages without arbitrary
pointers, such as Algol, Pascal or Java. In particular, Jensen's
Device (Algol 60) relies on two parameters referencing the same
variable through name calling.

Conversely, one can have a rule like the Fortran rule in a language
with arbitrary pointers like Forth, or add arbitrary pointers to
Fortran. The generalization of Fortrans parameter rule would be: two
parameters would not be allowed to point to the same contiguous region
(in Forth terminology). I think that it would be a bad idea to add
such a rule to Forth, though, even if we would not have to care about
backwards compatibility.

Rod Pemberton

unread,

Oct 31, 2012, 12:06:28 PM10/31/12

to

"Paul Rubin" <no.e...@nospam.invalid> wrote in message
news:7xhapcg...@ruckus.brouhaha.com...
...

> There are a bunch of rules in C about when the compiler can assume
> variables aren't aliased. I don't think Forth has anything like that.

After Anton pointed it out, I have to ask what you're talking about too.

Except for 'restrict' keyword, a pointer may point to or within any C
object, i.e., aliased. Or, a pointer may point to a special location
referred to as NULL. NULL is typically zero but not required to be.
NULL just needs to be a location where no valid C object can be located.
A pointer can usually point to non-C objects too. While that's usually a
standard C feature, it's implementation specific according to the
specifications.

Rod Pemberton

Paul Rubin

unread,

Oct 31, 2012, 12:31:14 PM10/31/12

to

"Rod Pemberton" <do_no...@notemailnotz.cnm> writes:
> Except for 'restrict' keyword, a pointer may point to or within any C
> object, i.e., aliased.

I thought the compiler could assume that (e.g.) local variables were
unaliased, unless the code actually took the address. E.g., if you say

int x = 3;
foo();
print(x);

I have the impression the compiler is allowed to constant-propagate x,
giving

foo();
print(3);

obviously this is not allowed if you do something with &x that might
change the value of x.

To prevent the propagation from happening (among other uses), the
"volatile" keyword was introduced.

Bernd Paysan

unread,

Oct 31, 2012, 12:58:20 PM10/31/12

to

Anton Ertl wrote:
> Conversely, one can have a rule like the Fortran rule in a language
> with arbitrary pointers like Forth, or add arbitrary pointers to
> Fortran. The generalization of Fortrans parameter rule would be: two
> parameters would not be allowed to point to the same contiguous region
> (in Forth terminology). I think that it would be a bad idea to add
> such a rule to Forth, though, even if we would not have to care about
> backwards compatibility.

Usually, applying the same parameter twice to a Fortran function, even
when the function would be compiled in a way that it treats aliasing
well, doesn't make much sense. And so it would in a comparable Forth
function. E.g.

: matmul ( a b c -- )
\ matrix multiply a x b, store in matrix c

What would you expect as result if you use this to, let's say, rotate
Matrix A by 90°, stored in place?

a (( (( 0 1 ))
(( -1 0 )) )) a matmul

It quite likely will not give what you wanted, aliasing considered or
not. So a programmer rather would declare

: matmul ( a b -- c )
\ matrix multiply a x b, result is newly allocated matrix c

The possible aliasing between a an b is no problem, you can say "Ok, I
want a 180° rotation, let's use the 90°, dup it and matmul it with
itself".

(( (( 0 1 ))
(( -1 0 )) )) dup matmul

Should work.

The rule I would add to the compiler to optimize for such cases is that
"no pointer resulting from allocate is an alias to any pointer that
already existed, or to any other pointer that was returned by another
allocate". Then the compiler knows that it can reoder the read accesses
into the matrix multiplication as it likes, and e.g. use cache-sized
chunks of the matrix or apply recursive divided "fast" matrix
multiplications, which have a slightly lower O than O(n³).

Anton Ertl

unread,

Oct 31, 2012, 1:02:53 PM10/31/12

to

Paul Rubin <no.e...@nospam.invalid> writes:
>I thought the compiler could assume that (e.g.) local variables were
>unaliased, unless the code actually took the address. E.g., if you say
>
> int x = 3;
> foo();
> print(x);
>
>I have the impression the compiler is allowed to constant-propagate x,
>giving
>
> foo();
> print(3);

There is no memory involved here, so no pointers that might be
aliased. If you do the equivalent in Forth:

3 { x }
foo
x print

there is no memory or aliasing, either.

Anton Ertl

unread,

Oct 31, 2012, 1:39:46 PM10/31/12

to

Bernd Paysan <bernd....@gmx.de> writes:
>Usually, applying the same parameter twice to a Fortran function, even
>when the function would be compiled in a way that it treats aliasing
>well, doesn't make much sense. And so it would in a comparable Forth
>function. E.g.
>
>: matmul ( a b c -- )
>\ matrix multiply a x b, store in matrix c
>
>What would you expect as result if you use this to, let's say, rotate
>Matrix A by 90°, stored in place?
>
>a (( (( 0 1 ))
> (( -1 0 )) )) a matmul

Ideally the rotated matrix. But yes, in a typical implementation that
passes parameters by reference this will not work.

OTOH, if you do, say,

a a c matmul

the Fortran equivalent would be non-standard, and the gcc maintainers
would feel encouraged to miscompile it.

In any case, from what I read, there are cases where people passed the
same parameter twice to Fortran function, and got problems on some
compilers.

>The rule I would add to the compiler to optimize for such cases is that
>"no pointer resulting from allocate is an alias to any pointer that
>already existed, or to any other pointer that was returned by another
>allocate".

Yes, that's what the standard says; and I also think that programs
actually comply with this (instead of using knowledge about the
internals to go beyond the memory coming from an ALLOCATE).

Rod Pemberton

unread,

Oct 31, 2012, 8:06:24 PM10/31/12

to

"Paul Rubin" <no.e...@nospam.invalid> wrote in message

news:7xy5imk...@ruckus.brouhaha.com...
> "Rod Pemberton" <do_no...@notemailnotz.cnm> writes:
...

> > Except for 'restrict' keyword, a pointer may point to or within any C
> > object, i.e., aliased.
>
> I thought the compiler could assume that (e.g.) local variables were
> unaliased, unless the code actually took the address. E.g., if you say
>
> int x = 3;
> foo();
> print(x);
>
> I have the impression the compiler is allowed to constant-propagate x,
> giving
>
> foo();
> print(3);
>

...

> obviously this is not allowed if you do something with &x that might
> change the value of x.

int x = 3;
int *y;

printf("%d\n",x);
y=&x;
*y=5;
printf("%d\n",x);

This prints 3 and 5 here ... Of course, the contents of 'x' is being
fetched in order to display it, due to the string conversion. Yes?

Odd ... I don't know if C can actually print or display a value without
going through a string conversion. I don't believe that there is a way. At
the moment, I can't recall any method to do so, if there is ... I think
that the ability to directly display an integer would be required in order
to do a constant-propagation to a print-like statement that you've
demonstrated.

So, the question becomes how do you detect if a constant has been propagated
or is being fetched for display? Any internal reference to 'x' in order to
check it's value would be detected by the compiler and might change the code
accordingly. So, this would require something outside the C context to
modify the contents of the 'x' variable. Of course, that something needs to
"know" exactly where to store the value. That leads into the problem of how
do you do know the location without referencing 'x' in a way that the C
compiler or linker can detect the reference, in order to not affect the
outcome ...

> To prevent the propagation from happening (among other uses), the
> "volatile" keyword was introduced.

Well, I'm not familiar with that, or perhaps forgot it ... A few years
back, I familiarized myself with "restrict". I know there were some changes
made for implementing "restrict", but I don't recall what they were. Maybe,
it's related to those changes. I don't use "restrict", keeping mostly to
ANSI C.

AIUI, the "volatile" keyword is to indicate that a variable's value can be
modified by something outside the C context. This prevents optimizing away
the read or write of a variables' value from or to memory, e.g., by holding
the value in a register. E.g., it can be used to ensure reading/writing of
correct values from memory mapped I/O devices, e.g., screen, accessing
values passed from assembly routines, e.g., interrupts, etc.

Rod Pemberton

Alex McDonald

unread,

Nov 2, 2012, 8:28:05 AM11/2/12

to

On Oct 31, 1:28 am, Paul Rubin <no.em...@nospam.invalid> wrote:

Y is not an alias of X, it's a pointer to X.

Paul Rubin

unread,

Nov 2, 2012, 11:13:01 AM11/2/12

to

Alex McDonald <bl...@rivadpm.com> writes:
> Y is not an alias of X, it's a pointer to X.

y @ is an alias of x.

David Thompson

unread,

Nov 16, 2012, 11:21:59 PM11/16/12

to

On Wed, 31 Oct 2012 14:57:54 GMT, an...@mips.complang.tuwien.ac.at
(Anton Ertl) wrote:

> Bernd Paysan <bernd....@gmx.de> writes:
> >Languages without arbitrary pointers like traditional Fortran have much
> >stricter rules for unaliased memories.
>
> Fortran has one particular such rule: The compiler may pass parameters
> by reference, but a program must not pass the same array to a
> functioon through two parameters.
>

Not quite. A compiler may pass by reference or by copy, and a program
must not depend on which. This means if the same actual is used for
multiple parameters (in Fortran terminology dummy arguments), OR a
parameter and also accessed by shared COMMON since forever or shared
MODULE or 'host-association' since F90; AND you write one path and
read another, it's not safe. You can have multiple paths as long as
there are no writes to any of them; or all writes and reads use one
path (but in that case why did you bother with the other paths?).

If you use certain newer features, they override. If you use POINTER
it requires by-reference and aliasing works, and may reduce
optimization accordingly. If you use VALUE it requires by-value.

> That rule is not present in other languages without arbitrary
> pointers, such as Algol, Pascal or Java. In particular, Jensen's
> Device (Algol 60) relies on two parameters referencing the same
> variable through name calling.
>

And COBOL, FWIW. Java (at least the standard JVM implementation of
Java) doesn't even have a mechanism to do class objects by-copy; it
must alias. Pascal does have arbitrary (but typesafe) pointers *in
addition to* by-reference VAR parameters: new(p) dispose(p) p^ .