> Yes, and it was considered one the worst decisions of Forth-83 - even
> before the ink had dried.
And I agree with that wholeheartedly. By the time we started work on ANS Forth, though, it had become common practice. The solution was to replace NOT with either NEGATE (arithmetic inversion) or INVERT (invert bits). All compromises have flaws, but at least you know what you're getting.
Cheers,
Elizabeth
-- ==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
http://www.forth.com
"Forth-based products and Services for real-time
applications since 1973."
==================================================
> Phew ! I was getting jelous and starting to wonder why I picked ebx as
> top of stack rather than eax ! ;)
It seems that only XLAT uses EBX. So, it's more flexible in terms of
instruction choice, or more likely to not be clobbered. ECX or CL is used
by loop, branching, and rotate or shift instructions. EDX is used by
multiply and division instructions. EAX is used by multiply, division,
string instructions, etc. If you're moving immediates into registers, quite
a few x86 instructions have shorter forms for the EAX register.
On Sun, 28 Oct 2012 11:05:55 +1100, "Ed" <inva...@nospam.com> wrote:
>The downside is that a good forth optimizer isn't cheap. IIRC Stephen
>once quoted a figure of 50,000 lines of code.
Aout 5,000 lines for many CPUs. 6,000 lines for the x86 version.
Then add the code for the assembler and disassembler.
>And still it may not be as fast as a good C compiler.
The important comparison is the number of hours spent on the
code generators for C and Forth.
Stephen
-- Stephen Pelc, stephen...@mpeforth.com
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691
web: http://www.mpeforth.com - free VFX Forth downloads
Stephen Pelc wrote:
> The important comparison is the number of hours spent on the
> code generators for C and Forth.
IMHO VFX is limited on x86 by the numbers of registers. As are C compilers. When I wrote my Wurstkessel cryptography core, Marcel did compare the C compiler result with iForth on x64, where more registers are available, and the speed was about the same.
On x86, Andy Glew (one of the designers of the Pentium Pro) suggested adding an instruction for "well formed flags" in the Forth sense - which could be used for bit-wise operations. It got turned down by senior management, because C compilers couldn't make any benefit from it; though assembly code would.
Later, in the SSE instruction set, the idea was revived, and indeed implemented: SSE compares store their result flags as either all bits set or all bits cleared in the destination register.
-- Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/
Bernd Paysan <bernd.pay...@gmx.de> writes Re: well formed flag ?
> Stephen Pelc wrote:
>> The important comparison is the number of hours spent on the
>> code generators for C and Forth.
> IMHO VFX is limited on x86 by the numbers of registers. As are C > compilers. When I wrote my Wurstkessel cryptography core, Marcel did > compare the C compiler result with iForth on x64, where more registers > are available, and the speed was about the same.
> On x86, Andy Glew (one of the designers of the Pentium Pro) suggested > adding an instruction for "well formed flags" in the Forth sense - which > could be used for bit-wise operations. It got turned down by senior > management, because C compilers couldn't make any benefit from it; > though assembly code would.
Rngs_wurst is in CODE because the carry bit is needed.
> Later, in the SSE instruction set, the idea was revived, and indeed > implemented: SSE compares store their result flags as either all bits > set or all bits cleared in the destination register.
Marcel Hendrix wrote:
>> IMHO VFX is limited on x86 by the numbers of registers. As are C
>> compilers. When I wrote my Wurstkessel cryptography core, Marcel did
>> compare the C compiler result with iForth on x64, where more
>> registers are available, and the speed was about the same.
> I redid these just now on an i7 2.66 GHz machine:
In theory, Wurstkessel should take about twice the time on a 32 bit processor than on a 64 bit processor. However, when we compile high-
level code for both, it's a factor 5. The way the code is arranged gives a benefit on x64 with 8 accumulator registers - a version tuned for 32 bits would use one accumulator after the other (or maybe two accumulators, 4 registers), but will have a lot less intrinsic parallelism.
-- Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/
Bernd Paysan wrote:
> Stephen Pelc wrote:
> > The important comparison is the number of hours spent on the
> > code generators for C and Forth.
> IMHO VFX is limited on x86 by the numbers of registers. As are C
> compilers. When I wrote my Wurstkessel cryptography core, Marcel did
> compare the C compiler result with iForth on x64, where more registers
> are available, and the speed was about the same.
How many C compilers were tested?
> On x86, Andy Glew (one of the designers of the Pentium Pro) suggested
> adding an instruction for "well formed flags" in the Forth sense - which
> could be used for bit-wise operations. It got turned down by senior
> management, because C compilers couldn't make any benefit from it;
> though assembly code would.
> Later, in the SSE instruction set, the idea was revived, and indeed
> implemented: SSE compares store their result flags as either all bits
> set or all bits cleared in the destination register.
I expect C (and similar languages) to generate better code, more easily,
because they operate at a higher level of abstraction than does Forth.
Unlike other languages, Forth is (as you point out) highly reliant on registers
and optimizers. It needs these to eliminate the inefficiency of pushing items
around a virtual stack - something which conventional cpu's were never
designed to do.
Ed <inva...@nospam.com> wrote:
> I expect C (and similar languages) to generate better code, more
> easily, because they operate at a higher level of abstraction than
> does Forth.
On what is this opinion based?
> Unlike other languages, Forth is (as you point out) highly reliant
> on registers and optimizers.
And C isn't?
> It needs these to eliminate the inefficiency of pushing items around
> a virtual stack - something which conventional cpu's were never
> designed to do.
They weren't designed to evaluate expression trees either. What you
perhaps don't realize is that expression trees and RPN are equivalent:
there is a 1:1 mapping between the two. In both cases you have to
convert the abstract form into instructions that push data between
memory and registers.
Andrew Haley <andre...@littlepinkcloud.invalid> writes:
> What you perhaps don't realize is that expression trees and RPN are
> equivalent: there is a 1:1 mapping between the two.
I think not quite right--RPN is more like linear logic, unless you have
locals or something like PICK. This is apparent from Koopman's book
where he talks about ways to compile C into Forth. There's also a cool
paper by Henry Baker:
> In both cases you have to convert the abstract form into instructions
> that push data between memory and registers.
If the Forth code is written in the classic style (using global
variables to hold temporary values when stack shuffling gets too
complex) then it may be harder to compile good code.
Paul Rubin <no.em...@nospam.invalid> wrote:
> Andrew Haley <andre...@littlepinkcloud.invalid> writes:
>> What you perhaps don't realize is that expression trees and RPN are
>> equivalent: there is a 1:1 mapping between the two.
> I think not quite right--RPN is more like linear logic, unless you have
> locals or something like PICK.
Of course you have locals. We've had locals in Forth for 20 years.
> This is apparent from Koopman's book where he talks about ways to
> compile C into Forth.
OK, I shouldn't have written that without a list of caveats. :-)
If we're going to nitpick, and I guess we are, expression trees and
RPN are equivalent in the absence of stack order manipulation and
loops: in that case the algorithm to convert between the two is pretty
trivial. In the presence of stack manipulation things get a bit more
complicated but it's a fairly simple matter of copying and creating
temporaries. (We did all of this in gcj, which converts stack code to
expression trees in a GCC front end.) There's certainly nothing
inherent to generating code for a register machine from stack code
that would make generation less efficient. Data flow analysis on the
stack is no different from data flow analysis between a bunch of local
variables. As Stephen has pointed out on many occasions, the only
additional thing you need to do in a Forth optimizer is a stack
shuffle at block boundaries.
Andrew Haley <andre...@littlepinkcloud.invalid> writes:
>>> expression trees and RPN are equivalent: ...
> Of course you have locals. We've had locals in Forth for 20 years.
OK, in this case there's equivalence, it's just no longer pure RPN, but
no problem.
>> If the Forth code is written in the classic style (using global
>> variables to hold temporary values when stack shuffling gets too
>> complex)
> 'tain't my classic style! I always had to bear multi-tasking in mind,
> anyway.
Oh good point, by "global" I just meant "not local". In particular if
you store to a variable, any subroutine you call might access or modify
it, so you need whole-program analysis to tell if that has happeened.
> The same applies to C, surely.
There are a bunch of rules in C about when the compiler can assume
variables aren't aliased. I don't think Forth has anything like that.
Paul Rubin <no.em...@nospam.invalid> wrote:
> Andrew Haley <andre...@littlepinkcloud.invalid> writes:
>>>> expression trees and RPN are equivalent: ...
>> Of course you have locals. We've had locals in Forth for 20 years.
> OK, in this case there's equivalence, it's just no longer pure RPN, but
> no problem.
In what sense is it not "pure" RPN? IMO,
A B +
is still RPN, even if A and B are local variables. RPN is just how
you write expressions. It's no different from A + B or (+ A B) .
>>> If the Forth code is written in the classic style (using global
>>> variables to hold temporary values when stack shuffling gets too
>>> complex)
>> 'tain't my classic style! I always had to bear multi-tasking in mind,
>> anyway.
> Oh good point, by "global" I just meant "not local". In particular if
> you store to a variable, any subroutine you call might access or modify
> it, so you need whole-program analysis to tell if that has happeened.
Sure, as in any language. That's my point: there isn't really much of
a difference in practice.
Andrew Haley <andre...@littlepinkcloud.invalid> writes:
> In what sense is it not "pure" RPN? IMO,
> A B +
> is still RPN, even if A and B are local variables.
The impurity happens when you have to store to locals (or to the
interior of the stack) in the middle of the evaluation. I don't think
pure RPN has a way to do that.
Paul Rubin <no.em...@nospam.invalid> wrote:
> Andrew Haley <andre...@littlepinkcloud.invalid> writes:
>> In what sense is it not "pure" RPN? IMO,
>> A B +
>> is still RPN, even if A and B are local variables.
> The impurity happens when you have to store to locals (or to the
> interior of the stack) in the middle of the evaluation. I don't think
> pure RPN has a way to do that.
No, I'm sure it doesn't. But pure RPN doesn't have a way to store
into a global either: all RPN can do is expressions.
Paul Rubin wrote:
> There are a bunch of rules in C about when the compiler can assume
> variables aren't aliased. I don't think Forth has anything like that.
Standard Forth has IMHO 6 separate memory areas:
* dictionary + heap
* code memory
* data stack
* return stack
* floating point stack
* locals
You can only address the dictionary and heap with @ and !.
In any case, it would be *much* better for an optimizer if you deliberately and explicitely can specify unaliased variables if it can help the compiler.
Languages without arbitrary pointers like traditional Fortran have much stricter rules for unaliased memories.
-- Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/
> Paul Rubin wrote:
> > There are a bunch of rules in C about when the compiler can assume
> > variables aren't aliased. I don't think Forth has anything like that.
> Standard Forth has IMHO 6 separate memory areas:
> You can only address the dictionary and heap with @ and !.
> In any case, it would be *much* better for an optimizer if you
> deliberately and explicitely can specify unaliased variables if it can
> help the compiler.
DEFER causes an aliasing problem. Beyond DEFER, I don't think there
are any other cases.
> Paul Rubin wrote:
>> There are a bunch of rules in C about when the compiler can assume
>> variables aren't aliased. I don't think Forth has anything like that.
> Standard Forth has IMHO 6 separate memory areas:
> * dictionary + heap
> * code memory
The correct ANS terminology for those to is "data space" and "code space". Data space is accessible to user programs. The "dictionary" is in two parts, "name space" (headers, etc.) and "code space" (regardless of whether they're in the same address space). Neither name space nor code space guaranteed accessible to user programs.
Cheers,
Elizabeth
-- ==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
http://www.forth.com
"Forth-based products and Services for real-time
applications since 1973."
==================================================
Paul Rubin <no.em...@nospam.invalid> writes:
>There are a bunch of rules in C about when the compiler can assume
>variables aren't aliased. I don't think Forth has anything like that.
Wrong: If we applied the same attitude as seems to be dominating among
C compiler writers (that miscompiling non-standard programs is ok or
even encouraged), there is no aliasing beween the memory accesses in
the sequence "F! ! c@". And something like that is pretty much all
that a C compiler can assume, too, even with the attitude above.
Fortunately the Forth implementors are saner and don't have that
attitude; or maybe they would be just as insane, if limits on compile
time and compiler writer time did not prevent them from implementing
optimizations that would profit from knowledge about non-aliasing.
If you are thinking of the keyword "restrict" in C, that is hardly
used. One might introduce such a feature in Forth, too, but currently
systems would not do anything with it, and even if they did, would
programmers use it? Judging from the C experience, most probably
would not.
Bernd Paysan <bernd.pay...@gmx.de> writes:
>Languages without arbitrary pointers like traditional Fortran have much >stricter rules for unaliased memories.
Fortran has one particular such rule: The compiler may pass parameters
by reference, but a program must not pass the same array to a
functioon through two parameters.
That rule is not present in other languages without arbitrary
pointers, such as Algol, Pascal or Java. In particular, Jensen's
Device (Algol 60) relies on two parameters referencing the same
variable through name calling.
Conversely, one can have a rule like the Fortran rule in a language
with arbitrary pointers like Forth, or add arbitrary pointers to
Fortran. The generalization of Fortrans parameter rule would be: two
parameters would not be allowed to point to the same contiguous region
(in Forth terminology). I think that it would be a bad idea to add
such a rule to Forth, though, even if we would not have to care about
backwards compatibility.
> There are a bunch of rules in C about when the compiler can assume
> variables aren't aliased. I don't think Forth has anything like that.
After Anton pointed it out, I have to ask what you're talking about too.
Except for 'restrict' keyword, a pointer may point to or within any C
object, i.e., aliased. Or, a pointer may point to a special location
referred to as NULL. NULL is typically zero but not required to be.
NULL just needs to be a location where no valid C object can be located.
A pointer can usually point to non-C objects too. While that's usually a
standard C feature, it's implementation specific according to the
specifications.