I didn't use CREATE DOES> anywhere in this file. For example, I have
these words:
: <1array> ( dim1 size name -- )
: <2array> ( dim1 dim2 size name -- )
: <3array> ( dim1 dim2 dim3 size name -- )
: <4array> ( dim1 dim2 dim3 dim4 size name -- )
: <5array> ( dim1 dim2 dim3 dim4 dim5 size name -- )
: <6array> ( dim1 dim2 dim3 dim4 dim5 dim6 size name -- )
: 1array ( dim1 size -- )
: 2array ( dim1 dim2 size -- )
: 3array ( dim1 dim2 dim3 size -- )
: 4array ( dim1 dim2 dim3 dim4 size -- )
: 5array ( dim1 dim2 dim3 dim4 dim5 size -- )
: 6array ( dim1 dim2 dim3 dim4 dim5 dim6 size -- )
Consider what happens when you define an array like this:
10 w 1array aaa \ W is the size of a cell (4)
This generates several colon words:
1.) ^AAA ( -- adr ) Returns the base-address of the array.
2.) LIM-AAA ( -- adr ) Returns the limit-address of the array.
3.) AAA-ZERO ( -- ) Fills the entire array with zeros.
4.) AAA-DIM ( -- dim1 ) Returns the dimension of the array (10).
5.) AAA ( index -- adr ) Returns the address of an element at the
index.
With CREATE DOES>, you would only get action #5. With :NAME (what I
used to define my array definers), you can have as many actions
associated with your data type as you need (five in my arrays). These
actions are similar to the method functions that are associated with
data types in C++ (called "methods" in other OOP languages). My :NAME
is not OOP, but it is a big step in that direction. By comparison,
CREATE DOES> is a primitive technique invented by Chuck Moore in the
1970s prior to the rise of OOP popularity.
My package also includes the word FIELD. Most Forth programmers would
define FIELD like this:
: field ( offset size -- new-offset )
create
over , +
does> ( record -- field-adr )
@ + ;
It could be used like this:
0
w field .aaa
w field .bbb
constant rrr
create sss rrr allot
: ttt ( record -- aaa+bbb )
dup .aaa @ swap .bbb @ + ;
Using SwiftForth, the CREATE-DOES> version of FIELD generates this
code:
see field
4756FF 40DDBF ( CREATE ) CALL E8BB86F9FF
475704 4 # EBP SUB 83ED04
475707 EBX 0 [EBP] MOV 895D00
47570A 4 [EBP] EBX MOV 8B5D04
47570D 40828F ( , ) CALL E87D2BF9FF
475712 0 [EBP] EBX ADD 035D00
475715 4 # EBP ADD 83C504
475718 40C2CF ( (;CODE) ) CALL E8B26BF9FF
47571D 4 # EBP SUB 83ED04 .AAA and .BBB call this
475720 EBX 0 [EBP] MOV 895D00
475723 EBX POP 5B EBX now the base-adr
475724 0 [EBX] EBX MOV 8B1B
475726 0 [EBP] EBX ADD 035D00
475729 4 # EBP ADD 83C504
47572C RET C3
see .aaa
47573F 47571D ( field +1E ) CALL E8D9FFFFFF
see .bbb
47575F 47571D ( field +1E ) CALL E8B9FFFFFF
see ttt
47579F 4 # EBP SUB 83ED04
4757A2 EBX 0 [EBP] MOV 895D00
4757A5 47573F ( .aaa ) CALL E895FFFFFF
4757AA 0 [EBX] EAX MOV 8B03
4757AC 0 [EBP] EBX MOV 8B5D00
4757AF EAX 0 [EBP] MOV 894500
4757B2 47575F ( .bbb ) CALL E8A8FFFFFF
4757B7 0 [EBX] EBX MOV 8B1B
4757B9 0 [EBP] EBX ADD 035D00
4757BC 4 # EBP ADD 83C504
4757BF RET C3
There are 11 instructions in TTT, 1 each in .AAA and .BBB, and 7 in
the DOES> part of FIELD (called by both .AAA and .BBB). This results
in 11+1+1+7+7 = 27 instructions executed. By comparison, when my
version of FIELD is used, TTT looks like this:
see ttt
47577F 4 # EBP SUB 83ED04
475782 EBX 0 [EBP] MOV 895D00
475785 0 [EBX] EBX MOV 8B1B
475787 0 [EBP] EAX MOV 8B4500
47578A EBX 0 [EBP] MOV 895D00
47578D EAX EBX MOV 8BD8
47578F 4 # EBX ADD 83C304
475792 0 [EBX] EBX MOV 8B1B
475794 0 [EBP] EBX ADD 035D00
475797 4 # EBP ADD 83C504
47579A RET C3
Now there are only 11 instructions executed. This is less than half of
what the CREATE DOES> version requires. This is for SwiftForth on the
Pentium, but results should be similar with other compilers and
processors. The speed difference is much greater than the 27/11 ratio
implies though. As a rule of thumb, what primarily kills the speed on
microprocessors are jumps, calls and returns. On processors with a
prefetch-queue (such as the 8088), jumps empty out the prefetch-queue.
On modern processors that concurrently execute instructions, jumps
prevent the concurrent execution of instructions. This is also why the
MACRO: word improves the speed --- because it gets rid of one CALL and
one RET instruction. See Michael Abrash's books on 8088 and Pentium
assembly-language for a more in-depth discussion of this effect. The
upshot that the novice needs to remember, is that on most modern
processors the CREATE DOES> version of FIELD will be an order of
magnitude slower than my version.
CREATE DOES> is obsolete and should never be used under any
circumstance. Any programmer who uses CREATE DOES> is a novice. She
may have 30 years of experience programming Forth, but she is still a
novice so long as she continues to use CREATE DOES>.
Very odd. Why not have a word:
: m-array ( dimN dimN-1... dim1 N <name> -- ) ... ;
Using CREATE...DOES> you abstract out the common functionality of all
'array' type items, you have one word which creates any number and
dimension of array...
Having special accessors for each specific type of array is foolish
and a waste of code. Have "array-zero", etc, as words which operate
on *any* kind of array. i.e.:
30 30 2 m-array my-2d-array
my-2d-array array-zero ( now my-2d-array has been zeroed out)
1 4 my-2d-array array@ ( just got my-2d-array[1][4] )
..etc...
Since the "CREATE" part would store away the dimensions of the array
in that object, each array knows just how big it is. The accessory
words operate on that knowledge so that you don't have to rewrite the
world with each new array size.
> CREATE DOES> is obsolete and should never be used under any
> circumstance. Any programmer who uses CREATE DOES> is a novice. She
> may have 30 years of experience programming Forth, but she is still
> a novice so long as she continues to use CREATE DOES>.
Really. I've been programming professionally for 25+ years in a wide
variety of languages... and yet, I often use CREATE...DOES>
when I write FORTH code !
Perhaps it would be wiser for you to acquire more programming
experience before rendering judgment on others?
>My package also includes the word FIELD. Most Forth programmers would
>define FIELD like this:
>
>: field ( offset size -- new-offset )
> create
> over , +
> does> ( record -- field-adr )
> @ + ;
>By comparison, when my
>version of FIELD is used, TTT looks like this:
>
>see ttt
>47577F 4 # EBP SUB 83ED04
>475782 EBX 0 [EBP] MOV 895D00
>475785 0 [EBX] EBX MOV 8B1B
>475787 0 [EBP] EAX MOV 8B4500
>47578A EBX 0 [EBP] MOV 895D00
>47578D EAX EBX MOV 8BD8
>47578F 4 # EBX ADD 83C304
>475792 0 [EBX] EBX MOV 8B1B
>475794 0 [EBP] EBX ADD 035D00
>475797 4 # EBP ADD 83C504
>47579A RET C3
>Now there are only 11 instructions executed. This is less than half of
>what the CREATE DOES> version requires. This is for SwiftForth on the
>Pentium, but results should be similar with other compilers and
>processors.
Be careful what you wish for! Using VFX Forth out of the box:
0 ok-1
cell field .aaa ok-1
cell field .bbb ok-1
constant rrr ok
ok
create sss rrr allot ok
: t dup .aaa @ swap .bbb @ + ; ok
ok
dis t
T
( 004BEC60 8B13 ) MOV EDX, 0 [EBX]
( 004BEC62 035304 ) ADD EDX, [EBX+04]
( 004BEC65 8BDA ) MOV EBX, EDX
( 004BEC67 C3 ) NEXT,
( 8 bytes, 4 instructions )
ok
>CREATE DOES> is obsolete and should never be used under any
>circumstance. Any programmer who uses CREATE DOES> is a novice. She
>may have 30 years of experience programming Forth, but she is still a
>novice so long as she continues to use CREATE DOES>.
I therefore conclude that using CREATE ... DOES> is three times
as good as your method. My conclusion is as well justified as your
assertion.
Stephen
--
Stephen Pelc, steph...@mpeforth.com
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691
web: http://www.mpeforth.com - free VFX Forth downloads
> I therefore conclude that using CREATE ... DOES> is three times
> as good as your method. My conclusion is as well justified as your
> assertion.
Nice optimizer :)
With the CREATE...DOES> definition of FIELD? How does VFX adjust this
code when I do
4 ' .aaa >body !
When I use vfxlin 4.30 RC1 [build 0324], I see:
see t
T
( 080B74B0 8BD3 ) MOV EDX, EBX
( 080B74B2 031D00740B08 ) ADD EBX, [080B7400]
( 080B74B8 031530740B08 ) ADD EDX, [080B7430]
( 080B74BE 8B0A ) MOV ECX, 0 [EDX]
( 080B74C0 030B ) ADD ECX, 0 [EBX]
( 080B74C2 8BD9 ) MOV EBX, ECX
( 080B74C4 C3 ) NEXT,
( 21 bytes, 7 instructions )
That's much easier to adjust to changes in the data (just store the
data in the body of .aaa or .bbb, and this code will perform as it
should).
Now the same source code, except that the definition of FIELD is
replaced with:
: field ( offset size -- new-offset )
>r >r : r@ postpone literal postpone + postpone ; r> r> + ;
see t
T
( 080B7490 8B13 ) MOV EDX, 0 [EBX]
( 080B7492 035304 ) ADD EDX, [EBX+04]
( 080B7495 8BDA ) MOV EBX, EDX
( 080B7497 C3 ) NEXT,
( 8 bytes, 4 instructions )
If there is no intention to change the data, colon definitions allow
the generation of more faster code than CREATE...DOES>; but they
probably also consume more space.
- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2009: http://www.euroforth.org/ef09/
>When I use vfxlin 4.30 RC1 [build 0324], I see:
As the release notes suggest v4.3 was a "rapid technology
development" release and not for production use. All but final
versions of v4.4 are on the web/ftp sites. Please upgrade and
repeat the test. The official v4.4 release will be announced
when we have finished a thrash test.
>If there is no intention to change the data, colon definitions allow
>the generation of more faster code than CREATE...DOES>; but they
>probably also consume more space.
With tokenising of DOES> clauses enabled, that ain't necessarily
so.
This is wildly over-engineered. Instead of this you could just use
['] AAA BASE ( -- adr )
['] AAA LIMIT ( -- adr )
['] AAA ZERO ( -- )
['] AAA DIM ( -- dim1 )
AAA
So, rather than five new words you'd have just one, the array.
Andrew.
When I have the time.
But in the meantime, maybe you could answer my question:
|How does VFX adjust this code when I do
|
|4 ' .aaa >body !
And what does SEE T show then?
> But in the meantime, maybe you could answer my question:
> |How does VFX adjust this code when I do
> |4 ' .aaa >body !
> And what does SEE T show then?
VFX is not alone. iForth has had a tokenizer since 2.00, although it is quite
conservative.
: FIELD ( offset size -- new-offset )
CREATE OVER , +
DOES> ( record -- field-adr ) @ + ;
0
cell FIELD .aaa
cell FIELD .bbb
CONSTANT rrr
CREATE sss rrr ALLOT
: t DUP .aaa @ SWAP .bbb @ + ;
: uu sss t 2* 33 + . ;
( Standard conservative solution that allows changing data fields with >body ! )
SEE uu
\ Flags: ANSI
\ $005C8E40 : uu
\ $005C8E48 mov eax, $005C8040 dword-offset
\ $005C8E4E mov ebx, $005C8060 dword-offset
\ $005C8E54 mov eax, [eax $005C8900 +] dword
\ $005C8E5A add eax, [ebx $005C8900 +] dword
\ $005C8E60 lea ebx, [eax*2 #33 +] dword
\ $005C8E67 push ebx
\ $005C8E68 jmp .+8 ( $004E90F8 ) offset NEAR
( The alternative code with a colon definition is slow because it uses LITERAL )
( The below code is not really a good idea when done by hand )
: FIELDI ( offset size -- new-offset )
>R >R : R@ (H.) S" + ; " $+ EVALUATE
R> R> + ;
0
cell FIELDI .aaai
cell FIELDI .bbbi
CONSTANT rrri
CREATE sssi rrri ALLOT
: ti DUP .aaai @ SWAP .bbbi @ + ;
: uui sssi ti 2* 33 + . ;
SEE uui
\ Flags: ANSI
\ $005CD0C0 : uui
\ $005CD0C8 mov ebx, $005CCC00 dword-offset
\ $005CD0CE add ebx, $005CCC04 dword-offset
\ $005CD0D4 lea ebx, [ebx*2 #33 +] dword
\ $005CD0DB push ebx
\ $005CD0DC jmp .+8 ( $004E90F8 ) offset NEAR
\ $005CD0E1 ;
-marcel
>|How does VFX adjust this code when I do
>|
>|4 ' .aaa >body !
>
>And what does SEE T show then?
Version: 4.40 [build 2992]
0 ok-1
cell field .aaa ok-1
cell field .bbb ok-1
constant rrr ok
8 ' .aaa >body ! ok
: t dup .aaa @ swap .bbb @ + ; ok
dis t
T
( 004BF290 8BD3 ) MOV EDX, EBX
( 004BF292 8B4A04 ) MOV ECX, [EDX+04]
( 004BF295 034B08 ) ADD ECX, [EBX+08]
( 004BF298 8BD9 ) MOV EBX, ECX
( 004BF29A C3 ) NEXT,
( 11 bytes, 5 instructions )
ok
Yes, the first instruction is redundant and the fetches are
reordered. In essence the action of children of field when
compiling is:
... DOES> @ postpone literal postpone + ;
VFX knows that DOES> returns a constant address (literal)
and so can things with that literal when it needs to.
Show us some code.
I agree that "rewriting the world" for each new array size is not very
robust. On the other hand though, it was pretty easy to do, and now
that it is written it doesn't need to be done again. This approach
makes more sense to me than trying to automate the generation of code
--- that would be more realistic if there were dozens or hundreds of
similar words. With only six words to write, it seemed easier to just
write the words by hand.
You didn't provide us with a definition of FIELD. Did you use the
FIELD definition involving CREATE DOES> that I mentioned, or does VFX
provide FIELD "out of the box?"
> > Very odd. Why not have a word:
> >
> > : m-array ( dimN dimN-1... dim1 N <name> -- ) ... ;
> >
> > Using CREATE...DOES> you abstract out the common functionality of
> > all'array' type items, you have one word which creates any number
> > and dimension of array...
> >
> > Having special accessors for each specific type of array is foolish
> > and a waste of code. Have "array-zero", etc, as words which operate
> > on *any* kind of array. i.e.:
> >
> > 30 30 2 m-array my-2d-array
> >
> > my-2d-array array-zero ( now my-2d-array has been zeroed out)
> > 1 4 my-2d-array array@ ( just got my-2d-array[1][4] )
> > ..etc...
> >
> > Since the "CREATE" part would store away the dimensions of the array
> > in that object, each array knows just how big it is. The accessory
> > words operate on that knowledge so that you don't have to rewrite
> > the world with each new array size.
>
> I agree that "rewriting the world" for each new array size is not very
> robust. On the other hand though, it was pretty easy to do, and now
> that it is written it doesn't need to be done again. This approach
> makes more sense to me than trying to automate the generation of code
> --- that would be more realistic if there were dozens or hundreds of
> similar words. With only six words to write, it seemed easier to just
> write the words by hand.
I have mixed feelings about this.
On the one hand, it seems absurd to build giant complex mechanisms to do
lots of different single things.
On the other hand, it gets tiresome to repeatedly define essentially the
same thing over and over. Every time I publish sample code that has an
array in it I wind up redefining ARRAY and the new versions are no
better than the old ones, it's just boilerplate.
ARRAY may be a little too high-level to standardise, particularly when
some people have their own distinctive syntax that they prefer to any
possible standard version. But everybody uses arrays and there's no
standard. People tend to use the syntax that popular Forth systems
provide, but if you want to port your code and you don't want to depend
on some other system to provide it, then you're left defining it
yourself. And if you do that you might as well do it as simply as you
can so that it does what you need it to, rather than copy your system
code.
It's easy to quote Chuck Moore's philosophy. And on the other hand, if
you're actually going to have a lot of different types of arrays, then
it makes sense to factor the code that creates them, and the factored
code viewed as a unit will look like a giant machine that does a lot of
different things. And in practice if you want to publish portable code
there's no elegant way to do it.
I guess there's something to be said for just writing 7 ARRAY with a
note that it's defined the usual way, and let people who want to port
your code deal with it. Most of them have an ARRAY that can do what
yours does, and it's likely as easy for them to use it as it is for them
to use it and comment out your version. After all, unless you put your
ARRAY into a special wordlist it will replace their sophisticated ARRAY
in things they compile after they compile your code, and pretty often
that will not be what they want.
I use two array classes, one that checks that accesses are in bounds,
and a faster one that doesn't. I use the first until I'm satisfied that
there are no out-of-bounds accesses, then I switch to the second. For
compactness, I use 1ARRAY, 2ARRAY, etc. the code to accommodate n
dimensions turns out to be too large.
Jerry
--
Engineering is the art of making what you want from things you can get.
�����������������������������������������������������������������������
I can't think of any reason why you would want to have distinctive
syntax between one application and another. My arrays allow any size
of record, which makes them work with any application. I also allow
them to be defined either with a name from the stack, or from the
input stream. That pretty much covers what people want from their
arrays.
The big weakness of my arrays is that they only provide the five
actions that I mentioned above. You might want to have more actions
than this. For example, a word that traverses through a 2-dimensional
array diagonally. In an OOP language, it is easy to derive a new type
from the basic array type, and give it some new member functions. With
my system, this can't be done. The only way to add new actions is to
paste them into the array definer with :NAME --- the same as how I
wrote the five actions that I did provide. Cut and paste programming
has a bad reputation though --- it is very error-prone. All in all, my
arrays aren't very robust. Their primary virtue is speed --- they
should be an order of magnitude faster than CREATE DOES> defined
arrays on most systems.
The problem with CREATE DOES> is that all of the parameters are
comma'd into memory, and they have to be fetched out during run-time
and then used in arithmetic. That is slow. In my system, all of these
parameters are literals inside of a colon definition. That makes for
very fast code. Given a reasonably good optimizing compiler, it is
going to be as fast as hand-coded assembly-language. I really can't
imagine any way that my system could be improved in regard to speed.
CREATE DOES> code can also be difficult to read. You may have comma'd
quite a lot of parameters into memory after the CREATE. With a six-
dimensional array, this is a lot of data! In the code after DOES>, you
have to fetch those parameters out. This can be very confusing and you
can get your parameters tangled up pretty easily. By comparison, in my
system all of the parameters become literals in a colon definition.
They all are local variables in the defining word, so the result is
quite readable. Take a look at my <6ARRAY> word. It is pretty long,
but it is quite readable. You could easily write a <7ARRAY> word using
<6ARRAY> as a guide. Imagine what a mess this would be if it were
written using CREATE DOES>. Better yet, try writing 6ARRAY using
CREATE DOES> --- good luck with that! Also note, that you *can't*
write <6ARRAY> using CREATE DOES> because CREATE necessarily obtains
its name from the input stream. There is no way to write a CREATE
DOES> word that obtains its name from the stack the way that <6ARRAY>
does.
With my system, you set the value BOUNDS-CHECK to TRUE if you want
bounds checking, or to FALSE when you are ready for a production
compile. There is no need to modify your source code at all --- just
set BOUNDS-CHECK correctly before you compile.
That's not quite what I meant. What I meant was:
: field ( offset size -- new-offset )
create
over , +
does> ( record -- field-adr )
@ + ;
0
cell field .aaa
cell field .bbb
constant rrr
create sss 5 , 7 ,
: t dup .aaa @ swap .bbb @ + ;
sss t .
see t
4 ' .aaa >body !
sss t .
see t
>In essence the action of children of field when
>compiling is:
> ... DOES> @ postpone literal postpone + ;
If FIELD is defined as above, this fetches too early.
I have thought a little about how one might fetch early and still be
correct: When compiling such code for a child A of a CREATE...DOES>
word, remember that you compiled A there. If >BODY is applied to the
execution token of A, invalidate all the code compiled for A, and
recompile it in the conservative manner (fetching at run-time).
This makes >BODY a bit slower for all uses (because you need a check
or indirect jump to be prepared for words like A), and a lot more
expensive in the case where the recompilation is needed, but makes
many children of DOES> a bit faster. It probably still results in a
speedup, because children of DOES> are executed more frequently than
>BODY.
One can do variations on that: E.g., instead of triggering the
conservative recompile on >BODY, one could trigger it on providing an
xt to the user program (through ticking, FIND, or SEARCH-WORDLIST);
that would avoid the cost on >BODY, and would shift it to ' and
friends, but these are slow operations anyway; the downside is that
this would make the children of DOES> slower even in cases where the
xt is not used with >BODY, but with, e.g., EXECUTE.
Or one can make >BODY fast in those cases where it is known that it is
not applied to an optimized child of DOES>.
The question is: Is this optimization worth the complexity it entails?
I don't think so. The better way is to tell the compiler what we want
instead of having it guess and correcting the guess later. E.g., we
can use : instead of CREATE to avoid having to fetch the offset from
memory. And if we want to stay with something more like
CREATE...DOES>, there are two (currently non-standard) approaches:
1) IIRC you told me that VFX allows the pogram to declare a range of
memory as unchangeable or somesuch, so the compiler can move the @
around as it likes. I guess one would use such a declaration in the
CREATE part.
2) A little more removed from CREATE-DOES>, there is CONST-DOES>,
implemented in Gforth
<http://www.complang.tuwien.ac.at/forth/gforth/Docs-html/Const_002ddoes_003e.html>.
>create sss 5 , 7 ,
>: t dup .aaa @ swap .bbb @ + ;
>sss t .
>see t
>4 ' .aaa >body !
>sss t .
>see t
>
>>In essence the action of children of field when
>>compiling is:
>> ... DOES> @ postpone literal postpone + ;
>
>If FIELD is defined as above, this fetches too early.
What you're doing is changing the offset of .aaa. Why would I want to
do this? If one had used the Forth200x +FIELD, would you consider it
correct to permit changes to the offset of a single field in a
structure? It may be an interesting quibble for discussion, but if I
saw such a thing in production code I would regard it as a crash
waiting to happen. Although the standard and usage documentation
may imply that the data item is the offset, nowhere does it say
that it must be so. In all practical terms a child of FIELD is
syntactic sugar for
<const> +
and should be treated in the same way as a child of CONSTANT.
Phrases such as
' <foo> >body !
are only really useful when written by the author of the
defining word and should be in syntactic sugar very close
to the defining word in the source code. To do otherwise
leads to maintenance nightmares.
>1) IIRC you told me that VFX allows the pogram to declare a range of
>memory as unchangeable or somesuch, so the compiler can move the @
>around as it likes. I guess one would use such a declaration in the
>CREATE part.
We do support disambiguation. However, we do not support reordering
in embedded systems because of memory mapped peripherals, nor do
we eliminate phrases such as
<addr> @ drop
which may be used as triggers.
You are probably referring to the SECTION directives which allow
a cross compiler to specify read/write and initialisation
attributes of regions of memory.
>2) A little more removed from CREATE-DOES>, there is CONST-DOES>,
>implemented in Gforth
><http://www.complang.tuwien.ac.at/forth/gforth/Docs-html/Const_002ddoes_003e.html>.
Hmmm ...
> Show us some code.
I don't have time to write a whole example before shabbat starts, but
here is a quick outline of how it might work (this is not tested at
all):
: array ( dimN dimN-1... dim1 N size <name> -- )
create
2dup , , ( 0=size 1=N )
swap
0 do
\ dimN... dimX size
over ,
*
loop
\ size of array data, allot (could allocate just as easily)
allot
does> \ do nothing; use the address given for accessors
;
: array-dim ( a -- n ) cell+ @ ;
: array-size ( a -- m ) @ ;
The 'array@' word has to do a little more work to get to the exact
location of a particular item, but it's not terribly difficult.
As an alternative, the 'does>' of the array could access the item
desired; but then one has to know how a create...does> word is layed
out in order to get the data fields (in Reva it would probably not be
the same as in gforth, for example).
Maybe you wouldn't, but the compiler cannot know this and has to work
correctly if you do.
>If one had used the Forth200x +FIELD, would you consider it
>correct to permit changes to the offset of a single field in a
>structure?
There is no mechanism defined in Forth200x for changing the offset of
a field defined with +FIELD. If a system provides a mechanism for
changing the offset (e.g., by permitting >BODY ! for such fields and
guaranteeing that this changes the behaviour of the field everwhere),
using that mechanism would be non-portable, but correct for that
system. If a system provides no such mechanism, it can produce faster
code. It's very similar to CONSTANT.
For the FIELD defined with CREATE...DOES>, there is a standard
(Forth-94 and Forth200x) mechanism for changing the offset. And every
system that claims to be standard has to support it. Just as the only
difference between CONSTANT and VALUE defined with standard
CREATE...DOES> is the name.
>Phrases such as
> ' <foo> >body !
>are only really useful when written by the author of the
>defining word and should be in syntactic sugar very close
>to the defining word in the source code. To do otherwise
>leads to maintenance nightmares.
The compiler does not know who wrote the code it compiles, and should
not care. And even if some usage may lead to maintenance nightmares,
the compiler still has to produce code for it that behaves as
specified.
If we want to avoid a run-time @ in fields, we should use a mechanism
that allows avoiding it, instead of wishing that >BODY ! did not
exist; and we should certainly not write compilers as if this wish was
true.
I think this idea should be explored further. Factor has "immutable"
data, and a lot of other functional LISP-derived languages do too.
Think Erlang. If we had some way of declaring data to be immutable
after being initialized, this would open the door to quite a lot of
compiler optimization and compile-time verification of data.
BTW, you use the acronym IIRC all of the time, and I don't know what
that means. I'm not very experienced with internet communication. :-)
One other thing --- the word I mentioned previously is supposed to be
BOUNDS-CHECK? rather than BOUNDS-CHECK --- don't forget the query mark
at the end.
> You didn't provide us with a definition of FIELD. Did you use the
> FIELD definition involving CREATE DOES> that I mentioned, or does VFX
> provide FIELD "out of the box?"
You are not really interested in how FIELD is implemented. When VFX
uses CREATE DOES> it must have been written by a novice. With your
experience, you are way beyond that.
--
Coos
CHForth, 16 bit DOS applications
http://home.hccnet.nl/j.j.haak/forth.html
It doesn't make sense to present the machine-code of a word using
FIELD, and to not present the source-code for FIELD. That is why I
asked to see how FIELD was implemented.
Also, it doesn't make sense to compare VFX code to SwiftForth code. I
am not promoting SwiftForth. It may very well be that VFX generates
better machine-code than SwiftForth does. My argument is against
CREATE DOES>. I wrote an example function TTT using both my own FIELD
and a more typical FIELD written with CREATE DOES>, and I compared the
results. A similar comparison could be made using VFX or any other
compiler. I think that it is reasonable to expect that my method will
be found to be more efficient than CREATE DOES> on *every* Forth
system in existence. In most systems it should be about an order of
magnitude more efficient, which is what I found in regard to
SwiftForth.
"more efficent" is a nebulous concept. In code space? In runtime?
In programmer-hours to create? In programmer-hours to maintain?
Define "efficient" as you intend it, please.
>There is no mechanism defined in Forth200x for changing the offset of
>a field defined with +FIELD. If a system provides a mechanism for
>changing the offset (e.g., by permitting >BODY ! for such fields and
>guaranteeing that this changes the behaviour of the field everwhere),
>using that mechanism would be non-portable, but correct for that
>system. If a system provides no such mechanism, it can produce faster
>code. It's very similar to CONSTANT.
So it doesn't matter how it is implemented, even with CREATE ...
DOES>. What are appear to be arguing for is documentation
clarity, which I fully support.
Remember that the OP asked for code generation examples. I
provided one for VFX Forth's equivalent. Internally that
implementation may use CREATE ... DOES>, but nowhere does the
documentation say so.
>The compiler does not know who wrote the code it compiles, and should
>not care. And even if some usage may lead to maintenance nightmares,
>the compiler still has to produce code for it that behaves as
>specified.
This only applies when the user can assume that the word is defined
using CREATE ... DOES>.
>If we want to avoid a run-time @ in fields, we should use a mechanism
>that allows avoiding it, instead of wishing that >BODY ! did not
>exist; and we should certainly not write compilers as if this wish was
>true.
Are you argung for CREATE-CONST or some such?
When I said that my method was an order of magnitude more efficient
than CREATE DOES> I was referring to run-time execution speed.
I used FIELD as my example. I chose this mostly because it is simple
enough to make for short disassembly listings. I didn't want to have a
post with pages of disassembly, as nobody would bother to read it all.
On the other hand, a more complicated example might be interesting. I
would suggest that somebody who likes CREATE DOES> and who also
considers himself to be past the novice level, should write a defining
word for six-dimensional arrays using CREATE DOES>. A comparison can
then be made between that code and my own code, using SwiftForth and
various other Forth compilers. Of course, for the comparison to be
meaningful, all of the code has to be ANS-Forth. Good luck!
In regard to code size, it is also going to be better with my method,
but not necessarily a full order of magnitude. In the SwiftForth code
my method had 11 instructions compared to 27 for CREATE DOES>. Code
size isn't very important because nobody cares about a few words of
memory being used on a machine with megabytes of RAM. Code size is
only interesting because it provides a loose prediction of execution
speed, which does matter.
In regard to the more nebulous metric of programmer-hours to create
and maintain, I think that my method is also better than CREATE DOES>.
The parameters are referenced at compile-time by local variables with
names, and they become literals in the run-time code. This results in
much clearer code than what we see in CREATE DOES> in which the
parameters are comma'd into memory after the CREATE and are then
fetched out of memory by the run-time DOES> code. If there are a lot
of parameters, the DOES> code can be confusing --- it is not at all
clear which parameter is being fetched out. This code can be error-
prone because it is easy for the programmer to get his parameters
tangled up. Measuring readability and reliability is very subjective
though, so it would be meaningless to say that it is "an order of
magnitude" better.
For the +FIELD provided by the system, it does not matter how it is
implemented. For that +FIELD it matters how it is specified.
However, if the user himself implements +FIELD (or CONSTANT or VALUE),
then it does matter how it is implemented. If the user implements it
with CREATE, then he can use >BODY ! to change the offset or value,
and all uses of words defined with such defining words have to see the
change; the name of the defining word does not matter in that case.
>Remember that the OP asked for code generation examples. I
>provided one for VFX Forth's equivalent. Internally that
>implementation may use CREATE ... DOES>, but nowhere does the
>documentation say so.
He provided a defining word that used CREATE...DOES>, and used that
defining word; and the examples I posted all included that definition
or a replacement for it. In no case was the system's +FIELD used in
my examples.
>This only applies when the user can assume that the word is defined
>using CREATE ... DOES>.
If the user has defined the defining word himself using
CREATE...DOES>, he can assume that.
>>If we want to avoid a run-time @ in fields, we should use a mechanism
>>that allows avoiding it, instead of wishing that >BODY ! did not
>>exist; and we should certainly not write compilers as if this wish was
>>true.
>
>Are you argung for CREATE-CONST or some such?
Mainly I am arguing for correct compilers.
Concerning the mechanism that allows avoiding @ at run-time, we have
one: Colon definitions with LITERALs. An example I gave was:
: field ( offset size -- new-offset )
>r >r : r@ postpone literal postpone + postpone ; r> r> + ;
I don't think we need another mechanism badly enough to standardize
it, but if someone else dislikes both the colon definition and the
CREATE...DOES> approach strongly enough, then Gforth's CONST-DOES>
could be an alternative.
If anybody were to examine my novice.4th file (nobody has yet AFAIK),
you would find this definition of FIELD:
: +, ( -- ) \ runtime: a b -- a+b
postpone + ;
: lit, ( val -- ) \ runtime: -- val
postpone literal ;
: lit+, ( addend -- ) \ runtime: n -- n+addend
lit, +, ;
: <field> ( offset -- ) \ run-time: struct-adr -- field-adr
?dup if >r
: state@, if, r@ lit, postpone lit+,
else, r@ lit+, then, ;, immediate
r> drop
else
: ;, immediate \ the first field does nothing
then ;
: field ( offset size -- new-offset )
over <field> + ;
The above definition includes a few words (IF, ;, etc.) that have been
left out of this post for brevity.
They are in: http://www.rosycrew.com/novice.4th
My definition has several advantages over yours:
1.) My field words work in interpretive mode, which is very important
for debugging.
2.) I don't compile colon words that are executed at run-time. I
compile macros that will (at compile-time) compile code into the colon
word that they are being used in. This is much faster because there is
no CALL and RETURN executed at run-time --- as I have said elsewhere,
getting rid of CALL and RETURN is the most important speed-improvement
technique available.
3.) I don't compile any code at all when the offset is zero. By
comparison, you generate a colon word whose only purpose is to add
zero to the address on the stack --- it accomplishes *nothing* at all!
It is abundantly obvious that you never looked at my code, or you
would not have posted your own code that is grossly inferior in every
way. My enthusiasm for posting code on C.L.F. is really dwindling.
Nobody ever looks at it! Apparently, on C.L.F. what people hate and
fear the most is source-code.
novice --- A programmer who uses CREATE DOES>, and who does not use
POSTPONE.
intermediate --- A programmer who no longer uses CREATE DOES>, but who
uses POSTPONE instead.
advanced --- A programmer who uses POSTPONE on words that themselves
use POSTPONE (see my <FIELD> that uses POSTPONE on LIT+,).
Let everybody on C.L.F. determine for themselves which category they
are in. :-)
: postpone| ( "many word|" -- )
BEGIN >in @ bl word count s" |" compare WHILE
>in ! postpone postpone
REPEAT drop ; immediate
"Hugh Aguilar" <hugoa...@rosycrew.com> a �crit dans le message de
news:
063186d6-0ca0-44f4...@n35g2000yqm.googlegroups.com...
A slightly more common syntax for this kind of stuff is:
]] ... [[
It's available in Gforth, and recently
<2009Nov2...@mips.complang.tuwien.ac.at> I announced an
implementation in standard Forth as part of the compat library
<http://www.complang.tuwien.ac.at/forth/compat.zip>.
0) You posted it without inserting quoted-printable encoding junk,
whereas you inserted it in my code.
>1.) My field words work in interpretive mode
So do mine. If you define a word with
4 4 field foo
the resulting word FOO is equivalent to
: foo 4 + ;
and of course you can interpret FOO.
>2.) I don't compile colon words that are executed at run-time. I
>compile macros that will (at compile-time) compile code into the colon
>word that they are being used in.
You generate STATE-smart colon definitions that one better should not
tick or POSTPONE to avoid trouble down the line. No thanks.
>This is much faster because there is
>no CALL and RETURN executed at run-time --- as I have said elsewhere,
>getting rid of CALL and RETURN is the most important speed-improvement
>technique available.
And because it is somewhat important, Forth systems written for
performance (such as VFX) perform inlining. And since you care so
much for performance, you use such a system, no?
Looking at the code generated by VFX for the T example
<2009Dec...@mips.complang.tuwien.ac.at>, VFX successfully inlines
the resulting colon definitions and optimizes the resulting code,
producing exactly the same code as it does for words generated through
(I guess) its built-in +FIELD. Try your code on any Forth system, and
report if the result is any shorter than what VFX produced with my
FIELD.
>3.) I don't compile any code at all when the offset is zero. By
>comparison, you generate a colon word whose only purpose is to add
>zero to the address on the stack --- it accomplishes *nothing* at all!
It's easy to add such special-casing, but at least on VFX there is no
difference in the result, so the special-casing just complicates the
code without benefit.
>It is abundantly obvious that you never looked at my code
I have not. If you want your code to be discussed, post it. And even
posting code will often generate few responses if the code is long.
Posting a link produces the occasional comment, but not often; that's
more for when you just want people to download and use the code.
I don't know what you are talking about here.
> >1.) My field words work in interpretive mode
>
> So do mine.
I was wrong when I said that your field words can't be used
interpretively.
I was right however, when I said that your field words are slow.
Consider this code (compiled under SwiftForth):
: field ( offset size -- new-offset )
>r >r : r@ postpone literal
postpone + postpone ; r> r> + ;
0
w field .aaa
w field .bbb
constant rrr
create sss rrr allot
: ttt ( record -- aaa+bbb )
dup .aaa @ swap .bbb @ + ;
see ttt
476B7F 4 # EBP SUB 83ED04
476B82 EBX 0 [EBP] MOV 895D00
476B85 476AFF ( .aaa ) CALL E875FFFFFF
476B8A 0 [EBX] EAX MOV 8B03
476B8C 0 [EBP] EBX MOV 8B5D00
476B8F EAX 0 [EBP] MOV 894500
476B92 476B1F ( .bbb ) CALL E888FFFFFF
476B97 0 [EBX] EBX MOV 8B1B
476B99 0 [EBP] EBX ADD 035D00
476B9C 4 # EBP ADD 83C504
476B9F RET C3
see .aaa
476AFF 0 # EBX ADD 83C300
476B02 RET C3
see .bbb
476B1F 4 # EBX ADD 83C304
476B22 RET C3
TTT has 11 instructions, .AAA and .BBB have two each, for a total of
15 instructions. As I said before though, what primarily kills the
speed on processors is jumps, calls and returns. Your two CALL
instructions are what hurts the speed the most. The CREATE DOES>
version was maybe an order of magnitude slower than my code. Your
version is not that bad --- maybe about twice as slow as mine. Here is
my disassembly again:
see ttt
476B3F 4 # EBP SUB 83ED04
476B42 EBX 0 [EBP] MOV 895D00
476B45 0 [EBX] EBX MOV 8B1B
476B47 0 [EBP] EAX MOV 8B4500
476B4A EBX 0 [EBP] MOV 895D00
476B4D EAX EBX MOV 8BD8
476B4F 4 # EBX ADD 83C304
476B52 0 [EBX] EBX MOV 8B1B
476B54 0 [EBP] EBX ADD 035D00
476B57 4 # EBP ADD 83C504
476B5A RET C3
> >It is abundantly obvious that you never looked at my code
>
> I have not. If you want your code to be discussed, post it. And even
> posting code will often generate few responses if the code is long.
> Posting a link produces the occasional comment, but not often; that's
> more for when you just want people to download and use the code.
I do want people to download and use my code. I also want people to
criticize my code, but only after they download and use it --- or at
least look at it.
In regard to VFX, gforth, iForth, etc., I would be interested in
seeing a comparison made between my FIELD, your FIELD and the CREATE
DOES> version of FIELD. In order for this to happen though, somebody
in possession of one or more of those systems is going to have to
actually download my code and compile it, and do a disassembly of the
result.
25 years ago was 1984. I was a senior in H.S. and I preferred Forth
over all other programming languages that I knew (Basic, Pascal and
6502 assembly). I remember that I wrote a paper for some class in
which I compared all of the popular programming languages of the day
and made a prediction for what would be popular in the future. I
predicted was that C++ and Lisp would dominate in desktop programming,
with Lisp being used for anything with an A.I. aspect, and C++ being
used for more mundane software in which speed is the primary issue. I
predicted that Forth would dominate in micro-controller programming. I
predicted that C and Pascal would die quickly and without lamentation,
as neither of them are as good as C++ for desktop programming, or as
good as Forth for micro-controller programming.
That was 1984. Now it is 2009 and GCC dominates in micro-controller
programming. Meanwhile, the Forth community is still struggling to
figure out how to write FIELD. The most common implementation of FIELD
is done with CREATE DOES>, which is about an order of magnitude slower
than my implementation posted in this thread. This thread would have
made sense in 1984, but it is passing bizarre in 2009. It is like
Groundhog Day --- we continue to ponder how to write FIELD a quarter
of a century after we should have figured out the problem and moved on
to bigger and better things.
I think that CREATE DOES>, more than anything else, was what killed
Forth. A lot of C programmers have tried to learn Forth, but they only
got as far as CREATE DOES>. Contrary to popular opinion, C programmers
aren't idiots. They look at the DOES> code and they ask: "Why is it
necessary to do all of this work at run-time, when the parameters are
known at compile-time?" They ask: "Why should I have to fetch these
parameters out of memory, and risk getting them tangled up, when I
know what they are and could just write them as literals?" There are
no good answers to these questions. The result is that the C
programmers give up on Forth and go back to using C.
When I wrote that paper in 1984, I didn't know what OOP was. All I
knew was that everybody was describing OOP as the greatest thing since
sliced bread. I considered CREATE DOES> to be pretty cool at that
time. The idea of attaching an action to a data type made sense to me
and was a big part of why I liked Forth better than Pascal. It didn't
occur to me that attaching multiple actions to a data type was what
OOP was all about. CREATE DOES> was invented in the 1970s prior to the
rise of OOP. By 1970s standards, it was pretty cool. By 2009 standards
however, it is extremely limited and primitive. Even by 1983 the
writing was on the wall in regard to OOP and the Forth community
should have dropped CREATE DOES> in favor of a more OOP-like solution,
although they can be forgiven for sticking with 1970s technology.
There can be no forgiveness for the perpetrators of the ANS-Forth
fiasco however. By 1994 the whole world had moved on. It was time to
jettison CREATE DOES> along with bell-bottom pants, heavy blue
eyeshadow, mega-churches, and every other weird idea that came out of
the 1970s. For us to be still using CREATE DOES> in 2009 is
ridiculous.
Since this seems to be the nostalgia hour, I'll tell another story
about 1984. I had a Basic program from a book that simulated a
predator/prey environment. Each creature's reality could be described
by eight parameters. Each creature had an eight-dimensional array
associated with it, and this array described *all* of the possible
predicaments that the creature could find itself in. The array
contained good responses to every situation. For predators, good meant
getting closer to a prey animal (vice-versa for the prey). Over time,
as the creatures experienced more and more of the world, the creatures
would gradually fill their array with data. This resulted in
remarkably life-like "learning" behavior. I was fascinated by this
because I knew that there was no A.I. involved; certainly nobody
considers a big array to be A.I.! Myself and another guy implemented
this program in Forth on the Vic-20 in order to obtain a huge speed
increase, which made the program a lot more fun to watch. The problem
was implementing an eight-dimensional array. Doing this with CREATE
DOES> didn't work at all; I became confused and I couldn't figure it
out. I ended up implementing the array in a manner remarkably similar
to the method you see now in novice.4th, although not as general-
purpose. Also, I had to use constants because I didn't have local
variables. Still though, it was pretty similar. That is the origin for
my dislike for CREATE DOES> --- I have a long history of disliking
CREATE DOES>, and I don't expect to ever be converted into a CREATE
DOES> believer.
>The most common implementation of FIELD
>is done with CREATE DOES>, which is about an order of magnitude slower
>than my implementation posted in this thread.
Utter rubbish. I posted an example with VFX that was 3 or 4 times
faster than yours on the same CPU. The FIELD definition is a
CREATE ... DOES> definition with bells on.
What you completely ignore is that modern compilers such as VFX and
iForth know how to tokenise/inline the DOES> clauses of children of
defining words. VFX has done this for over ten years.
In terms of how VFX does it, you just have to read the classical
compiler books such as Aho and Ullman, Fischer and LeBlanc, Fraser
and Hanson and so on. Yes, we bought a lot of books and spent
a great deal of time and money on the VFX design and prototyping
it for three architectures. There is only one algorithm particular
to Forth in VFX, and even that now has analogues in some C compilers.
>: field ( offset size -- new-offset )
> >r >r : r@ postpone literal
> postpone + postpone ; r> r> + ;
>
>0
> w field .aaa
> w field .bbb
>constant rrr
>
>create sss rrr allot
>
>: ttt ( record -- aaa+bbb )
> dup .aaa @ swap .bbb @ + ;
Under VFX v4.4 after compiling the snippet above
see .aaa
.AAA
( 004BF230 C3 ) NEXT,
( 1 bytes, 1 instructions )
ok
see .bbb
.BBB
( 004BF250 83C304 ) ADD EBX, 04
( 004BF253 C3 ) NEXT,
( 4 bytes, 2 instructions )
ok
see ttt
TTT
( 004BF2E0 8B13 ) MOV EDX, 0 [EBX]
( 004BF2E2 035304 ) ADD EDX, [EBX+04]
( 004BF2E5 8BDA ) MOV EBX, EDX
( 004BF2E7 C3 ) NEXT,
( 8 bytes, 4 instructions )
ok
As I posted before, the result using the VFX version of FIELD
was:
: t dup .aaa @ swap .bbb @ + ; ok
ok
dis t
T
( 004BEC60 8B13 ) MOV EDX, 0 [EBX]
( 004BEC62 035304 ) ADD EDX, [EBX+04]
( 004BEC65 8BDA ) MOV EBX, EDX
( 004BEC67 C3 ) NEXT,
( 8 bytes, 4 instructions )
ok
This demonstrates that there need be no difference in performance
between using CREATE ... DOES> and your technique. Especially when
testing compilers, measurement beats hand-waving every time.
Why do so many try to enlighten Hugo? He touts the superiority of
flintlocks vis-a-vis matchlocks in a time of assault rifles. His
opinions are worthless, his "facts" orthogonal to reality, and his ego
keeps him from seeing any of that. Ignore him.
>Ignore him.
"The baseless optimism that so degrades the human condition"
Graham Greene (?)
I disagree with that - no especially about it. Measurement beats hand-
waving every time from studying the Yanomami to checking the
financials of supposedly "solid, well managed" firms through testing
compilers and then all the way through all the hard sciences.
You are still not providing the source-code for your FIELD! This
thread is about the my technique compared to CREATE DOES>. If your
FIELD was not written using CREATE DOES> and/or is not ANS-Forth
compatible then it is irrelevant to the thread. Also, the code that
you compared to above is not my technique, it is Anton Ertl's. My
technique is provided in novice.4th, which you apparently haven't
looked at yet.
On Dec 8, 3:34 pm, stephen...@mpeforth.com (Stephen Pelc) wrote:
> On Tue, 8 Dec 2009 13:23:28 -0800 (PST), Hugh Aguilar
>
> <hugoagui...@rosycrew.com> wrote:
> >The most common implementation of FIELD
> >is done with CREATE DOES>, which is about an order of magnitude slower
> >than my implementation posted in this thread.
>
> Utter rubbish. I posted an example with VFX that was 3 or 4 times
> faster than yours on the same CPU. The FIELD definition is a
> CREATE ... DOES> definition with bells on.
Your claim that your technique is 3 to 4 times faster than mine makes
no sense. My disassembly comes from SwiftForth and yours from VFX.
This thread is not a comparison of SwiftForth versus VFX however, it
is a comparison of my FIELD versus a FIELD written with CREATE DOES>.
This comparison is only meaningful if both techniques are compiled on
the *same* compiler. I have done this with SwiftForth already. I would
still be interested in seeing this done in VFX, iForth and gforth.
What you have provided so far has just been a lot of hand-waving. You
aren't showing us the source-code for your FIELD (but intriguingly
hint at the existence of "bells and whistles") and you aren't
compiling the source-code for my FIELD that I did provide --- and you
accuse *me* of hand-waving! LOL
> In terms of how VFX does it, you just have to read the classical
> compiler books such as Aho and Ullman, Fischer and LeBlanc, Fraser
> and Hanson and so on. Yes, we bought a lot of books...
Why tell me about all of the books that you bought? Are you planning
on throwing them at me? Some of those tomes are heavy hardcovers, so
this might be your best chance at defeating me. Besides that, Aho and
Ullman's book has a dragon on the cover --- that might scare me!
I have performed the task you asked of me, and installed the latest
version of VFX that I found on your server:
Version: 4.40 [build 0404]
Build date: 7 December 2009
I am pleased to report that it produces correct code for the version
of FIELD that uses CREATE...DOES>:
: field ( offset size -- new-offset )
create
over , +
does> ( record -- field-adr )
@ + ;
0
cell field .aaa
cell field .bbb
constant rrr
create sss rrr allot
: t dup .aaa @ swap .bbb @ + ;
see t
outputs:
T
( 080B9E20 8BD3 ) MOV EDX, EBX
( 080B9E22 031D709D0B08 ) ADD EBX, [080B9D70]
( 080B9E28 0315A09D0B08 ) ADD EDX, [080B9DA0]
( 080B9E2E 8B0A ) MOV ECX, 0 [EDX]
( 080B9E30 030B ) ADD ECX, 0 [EBX]
( 080B9E32 8BD9 ) MOV EBX, ECX
( 080B9E34 C3 ) NEXT,
( 21 bytes, 7 instructions )
I am annoyed because this is exactly the same code that vfxlin 4.30
RC1 [build 0324] produces; and when I reported that, you advised me to
install a newer version of VFX.
I conclude that you had not read or not understood what I had written.
This is not the first time I had that impression, and such behavious
certainly makes serious discussion hard. We have another such
participant in this group: Hugh Aguilar. Maybe you two could talk
past each other in the future, and the rest of us might be wise enough
to leave you two to this pastime.
>I conclude that you had not read or not understood what I had written.
>This is not the first time I had that impression, and such behavious
>certainly makes serious discussion hard.
4.3 was a "rapid technology development" release, and we no longer
use it in-house.
>We have another such participant in this group: Hugh Aguilar.
Ooh, nasty!
The reason you get the results you display is that you have done
nothing to tell the compiler that DOES> @ represents a constant,
i.e. the data will not be changed. Since we're not so worried by
interpretation speed, we use a notation to tell the system that
the DOES> clause is for interpretation/execution of the child.
Compilation action is defined by setting a compiler.
Taking your example:
: field ( offset size -- new-offset )
create
over , +
does> ( record -- field-adr )
@ + ;
Now add a compiler for it:
: FieldChildComp, \ xt --
>body @ ?dup
if postpone literal postpone + then ;
: field ( offset size -- new-offset )
create
over , +
['] FieldChildComp, set-compiler
interp> ( record -- field-adr )
@ + ;
set-compiler ( xt -- ) causes the system to use the given xt
as the compiler for the children of the previous CREATE.
Now that we have multiple systems that separate interpretation
and compilation behaviours without involving IMMEDIATE (a side
effect of your "state-smart words are evil" campaign), we
need common practise to deal with it. I'm not yet ready to
propose a wordset. So far INTERP> and SET-COMPILER are enough
for our purposes. To avoid changing the definition of CREATE
it may be sufficient to introduce <CONST as a synonym of CREATE
but indicating that later use of >BODY ! is invalid. Tests with
our version of FIELD (not quite as above) indicate that >BODY !
works as expected, providing that you do not expect the system
to backpatch code that has already been compiled.
I am quite pleased to find out what code VFX produces for CREATE DOES>
words, which is what I've repeatedly asked to see. By c.l.f.
standards, actually compiling source-code is a pretty big
accomplishement! I'm also pleased to note that the generated code is
about twice as slow as the code generated by my FIELD, which is what I
predicted. This is not the order of magnitude difference that I was
expecting, but it is still significant. Most likely, the order of
magnitude difference is only going to be seen on compilers such as
SwiftForth that do minimal code-optimization, and there will be a less
drastic difference on compilers such as VFX that do more thorough code-
optimization.
> I conclude that you had not read or not understood what I had written.
> This is not the first time I had that impression, and such behavious
> certainly makes serious discussion hard. We have another such
> participant in this group: Hugh Aguilar.
I consider this to be a rather nasty jibe --- to compare me to Stephen
Pelc! Mr. Pelc has consistently refused to show us the source-code to
his FIELD (making us suspect that it is not ANS-Forth standard). By
comparison, I do provide ANS-Forth source-code, and I compile it, and
I provide the disassembly. Mr. Pelc's arguments are based upon hand-
waving and concealment, whereas mine are backed up by posted code.
Earlier you provided the disassembly of machine-code, but remarkably
did NOT provide the source-code. This made me highly suspicious that
you were desperately trying to conceal the fact that your source-code
was not ANS-Forth. Thank you very much for finally confirming my
suspicion.
All of the code in novice.4th is ANS-Forth. On a public forum such as
comp.lang.forth, it is necessary to post only ANS-Forth code so that
the code can be used by everybody no matter what Forth system they are
using. I could have written novice.4th in hand-coded assembly-language
and produced code that is extremely fast, but my effort would
necessarily be tied to a specific compiler, which would exclude
everybody who uses other compilers. For example, if I wrote my
assembly-language under SwiftForth, this would exclude everybody who
is unwilling to pay the $500 that Forth Inc. asks for SwiftForth. The
same thing is true about posting VFX-specific code, as VFX is also a
commercial product. I think that everybody can agree that using
comp.lang.forth as a marketing platform for a commercial compiler is
inappropriate.
In the future, please only post ANS-Forth code. It is okay to be a
salesman for VFX, but only in regard to showing how VFX compiles ANS-
Forth code. VFX actually does better code-optimization than
SwiftForth, and it costs less money, so you are on pretty solid ground
in a VFX versus SwiftForth comparison. When you become enthusiastic
and begin posting non-ANS-Forth VFX-specific code however, then you
are wasting everybody's time.
Is SwiftForth a flintlock or a matchlock? Either way, that is bad news
for me considering that I paid $500 for my copy of SwiftForth. You
were clumsily trying to insult me, and you only succeeded in insulting
Elizabeth Rather. I think that you need to apologize. Don't bother
apologizing to me however, I don't care about flames such as this ---
apologize to Elizabeth Rather and everybody else at Forth Inc..
BTW, why are you being Stephen Pelc's sycophant? Are you one of his
customers? Are you one of his employees? Do you want to be his friend?
Which is it?
I am aware of the problems with state-smart words in regard to
obtaining their xt. See the "LC53 Statistics" thread for my thoughts
on a solution to this problem. In the meantime though, this is the
code that I provided.
I am also aware that some compilers (VFX, for example) perform
inlining. Not everybody on comp.lang.forth can afford a commercial
compiler unfortunately. Because of this, I feel obliged when posting
code such as novice.4th to aim for the lowest-common-denominator, and
assume that the user has a simple compiler that doesn't perform
inlining.
If I were writing for myself, and I knew that my program would be
compiled on a good compiler such as VFX, then I would have written
FIELD the way that you did, rather than the way that I did. The result
would be simpler code and an avoidance of the state-smart problem. I
wrote novice.4th for general use (hence the name), and so I felt
obliged to make novice.4th useful to as many people as possible. I
don't think that it is fair of you to put me down for this decision.
It is certainly unfair to imply that I don't know how to program,
considering that the solution I presented required me to write more
advanced code than the solution that you provided. You are relying on
having an advanced compiler such as VFX available to you --- VFX has
become your crutch --- by comparison, my source-code generates hella-
fast code on *any* Forth compiler, including the primitive ones. I am
doing everything that I can to help the novice generate quality code,
but all I get is criticism from elitists such as yourself.
> Now that we have multiple systems that separate interpretation and
> compilation behaviours without involving IMMEDIATE (a side effect of
> your "state-smart words are evil" campaign), we need common practise
> to deal with it.
> I'm not yet ready to propose a wordset. So far INTERP> and
> SET-COMPILER are enough for our purposes. To avoid changing the
> definition of CREATE it may be sufficient to introduce <CONST as a
> synonym of CREATE but indicating that later use of >BODY ! is
> invalid.
I agree. A smart compiler should be able to figure it all out, given
the information that the offset is constant.
We could do it without introducing any new words with
: FIELD: \ n1 n2 "name" -- n3 ; addr -- addr+n1
OVER CONSTANT + DOES> @ + ;
This works on may Forths already. If we need more than a single-cell
constant then all that is needed is a word that creates multi-cell
constants.
Andrew.
This rests on the assumption that CONSTANT is made by CREATE DOES>
I do have an implementation that does not work with your example.
It looks something like this
: CONSTANT >R : R> POSTPONE LITERAL POSTPONE ; ;
--
Coos
CHForth, 16 bit DOS applications
http://home.hccnet.nl/j.j.haak/forth.html
> > Stephen Pelc <steph...@mpeforth.com> wrote:
> >
> >> Now that we have multiple systems that separate interpretation and
> >> compilation behaviours without involving IMMEDIATE (a side effect of
> >> your "state-smart words are evil" campaign), we need common practise
> >> to deal with it.
> >
> >> I'm not yet ready to propose a wordset. So far INTERP> and
> >> SET-COMPILER are enough for our purposes. To avoid changing the
> >> definition of CREATE it may be sufficient to introduce <CONST as a
> >> synonym of CREATE but indicating that later use of >BODY ! is
> >> invalid.
> >
> > I agree. A smart compiler should be able to figure it all out, given
> > the information that the offset is constant.
> >
> > We could do it without introducing any new words with
> >
> >: FIELD: \ n1 n2 "name" -- n3 ; addr -- addr+n1
> > OVER CONSTANT + DOES> @ + ;
> >
> > This works on [many] Forths already. If we need more than a
> > single-cell constant then all that is needed is a word that
> > creates multi-cell constants.
>
> This rests on the assumption that CONSTANT is made by CREATE DOES>
No, it just assumes that the value of the constant is in its data
field.
> I do have an implementation that does not work with your example.
> It looks something like this
> : CONSTANT >R : R> POSTPONE LITERAL POSTPONE ; ;
I'm sure there are many implementations that don't work today, but
that's fixable. The behaviour of DOES> is not specified with anything
other than CREATE, so extending it to allow CONSTANT as well wouldn't
break existing code.
Andrew.
But in this case it's difficult, the colon definition that is built
with my CONSTANT is shorter than one built with DOES>. So DOES> would
write over the next definition's code.
I certainly hope that Stephen ignores this. I am a person that
cares a lot about ANSForth-compatible source, but I am also very
interested in seeing under the hood techniques. Surely you
don't intend to say that it would be inappropriate to post code
implementing structures in say, colorforth? [Leaving aside the
question whether structures are all that popular in colorforth.
:-) ]
Most of us are capable of ignoring occasional mild commercials
from Elizabeth and Stephen, if that's appropriate. IMO they
don't cross the line, and serve as constructive indications of
Forth health, which is important to all of us.
-- David
Sure, but no *code* would be broken. Existing standard code would
continue to run on your system and other standard systems. New code that
depended on finding the data field of CONSTANTs would break on your
system.
You could of course write a FIELD: that checks whether the word was made
by CREATE or by CONSTANT and then finds the correct offset, assuming
that your system allows you to write into the code laid down by : .
> > Coos Haak <chf...@hccnet.nl> wrote:
> >> Op Thu, 10 Dec 2009 04:18:43 -0600 schreef Andrew Haley:
> >
> >>>
> >>> We could do it without introducing any new words with
> >>>
> >>>: FIELD: \ n1 n2 "name" -- n3 ; addr -- addr+n1
> >>> OVER CONSTANT + DOES> @ + ;
> >>>
> >
> >>> This works on [many] Forths already. If we need more than a
> >>> single-cell constant then all that is needed is a word that
> >>> creates multi-cell constants.
> >>
> >> This rests on the assumption that CONSTANT is made by CREATE DOES>
> >
> > No, it just assumes that the value of the constant is in its data
> > field.
> >
> >> I do have an implementation that does not work with your example.
> >> It looks something like this
> >> : CONSTANT >R : R> POSTPONE LITERAL POSTPONE ; ;
> >
> > I'm sure there are many implementations that don't work today, but
> > that's fixable. The behaviour of DOES> is not specified with anything
> > other than CREATE, so extending it to allow CONSTANT as well wouldn't
> > break existing code.
> >
> But in this case it's difficult, the colon definition that is built
> with my CONSTANT is shorter than one built with DOES>. So DOES> would
> write over the next definition's code.
I don't really understand how that can be. Your CONSTANT has just
been CREATEd, so there is nothing following it in the dictionary.
It's not legal to create any words between CREATE and DOES> . So,
just back up the dictionary pointer and rewrite it.
Andrew.
I fail to see the connection to my statement.
>Taking your example:
>: field ( offset size -- new-offset )
> create
> over , +
> does> ( record -- field-adr )
> @ + ;
>
>Now add a compiler for it:
>
>: FieldChildComp, \ xt --
> >body @ ?dup
> if postpone literal postpone + then ;
>
>: field ( offset size -- new-offset )
> create
> over , +
> ['] FieldChildComp, set-compiler
> interp> ( record -- field-adr )
> @ + ;
>
>set-compiler ( xt -- ) causes the system to use the given xt
>as the compiler for the children of the previous CREATE.
>
>Now that we have multiple systems that separate interpretation
>and compilation behaviours without involving IMMEDIATE (a side
>effect of your "state-smart words are evil" campaign), we
>need common practise to deal with it.
I am not convinced that this is a worthwhile feature, even though I
have provided INTERPRET/COMPILE: and CREATE-INTERPRET/COMPILE in
Gforth
<http://www.complang.tuwien.ac.at/forth/gforth/Docs-html/Combined-words.html>.
In the present case one can write a standard definition of FIELD that
produces the same code quality on VFX:
: field ( offset size -- new-offset )
>r >r : r@ postpone literal postpone + postpone ; r> r> + ;
or in the macros syntax from
<http://www.complang.tuwien.ac.at/forth/compat.zip>:
: field ( offset size -- new-offset )
>r >r : r@ ]]L + ; [[ r> r> + ;
or with locals:
: field { offset size -- new-offset }
: offset ]]L + ; [[ offset size + ;
One advantage this approach has over the approaches that specify
separate interpretation and compilation behaviours is that there is no
redundancy (double specification of equivalent behaviour); redundancy
can lead to errors.
If we are not happy with the colon definition approach for some
reason, we should find a way to address that reason without requiring
redundancy.
>To avoid changing the definition of CREATE
>it may be sufficient to introduce <CONST as a synonym of CREATE
>but indicating that later use of >BODY ! is invalid.
Yes, that would be better than the approaches that require redundancy.
The implementation of this may be complex, though, because the
compiler would have to analyse which uses of @ in the DOES> part refer
to data belonging to the word defined with <CONST, and which uses of @
belong to other words. Ok, a simple approach to that would be good
enough in most cases, and the slowdown for the other cases would be
not that bad.
The legality of creating words between CREATE and DOES> is neither
here nor there when CONSTANT itself does not call CREATE.
(EG: it compiles into the dictionary the equivalent of
LDA #>val
LDY #<val
JMP DOCONSTANT
for, in that peculiar setting, saving over 50% of the execution time
of a CREATEd constant at a cost of two bytes.)
More to the point is that the code <i>itself</i> does not call any
code writing in the dictionary, so a "DOES>" that read a variable to
see whether the last thing created was a CREATE or a CONSTANT would
work with *that* code.
Of course, then that variable would have to be set, crudely something
like
VARIABLE entry-type
1 CONSTANT CREATE-entry#
2 CONSTANT CONSTANT-entry#
...
: CREATE CREATE-entry# entry-type ! CREATE ;
: CONSTANT CONSTANT-entry# entry-type ! CONSTANT ;
: DOES> entry-type @
DUP CREATE-entry# = IF DROP DOES> EXIT THEN
DUP CREATE-entry# = IF DROP DOES-CONSTANT> EXIT THEN
...
; IMMEDIATE
... all in the implementation-specific prelude, of course.
It's just that DOES> does not work on my CONSTANT or colon definition
for that matter. Both are not made by CREATE. They don't have a 'code
field', the 'parameter field' is right at the xt. Only words defined
by CREATE have a code field followed by the parameters.
As a reference, look at Lennart Benschop's SOD32 compiler.
My 16 bit implementation of CONSTANT does not have this behaviour and
its children can be patched by DOES>.
I'm not opposed to people posting non-ANS-Forth code so long as they
are upfront about it. The problem here was that I started this thread
with the title, "don't use CREATE DOES>," and Stephen Pelc presented
disassembly of some code with no source-code provided that he claimed
refuted my argument against CREATE DOES>. Eventually he produced the
source-code and we discovered that it didn't use CREATE DOES> and it
wasn't ANS-Forth compliant. His argument was highly deceptive. He was
wasting everybody's time, as the essential facts were bound to come
out sooner or later. I get told that my opinion is "worthless," but
this is all just intimidation. I'm like the boy who said, "The emperor
has no clothes!," and everybody begins screaming insults at me
deriding me as crazy. This is futile though. In my own book I
described myself as an "insane man living on an insane planet," so how
likely is it that I'm going to be intimidated by screaming fools? I
don't really believe in anything. What other people call common-sense
I call superstition, and what other people call righteousness I
recognize as fear. To paraphrase Agent Smith from the Matrix movie: "I
can't stand this place; its the smell, if there is such a thing."
I remember at one time bringing up the subject of Factor and Anton
Ertl suggested that Factor-related posts should have the tag [FACTOR]
in the title so that they could be ignored by people uninterested in
Factor. Assuming that the number of these posts doesn't become
overwhelming, this would allow two completely different languages to
be discussed on the same forum without people getting in each other's
way. I agree with this proposal. The same thing can be done with VFX-
related posts, that they can have the tag [VFX] in the title. I didn't
put any such tag on the title of this thread because the code that I
posted was ANS-Forth --- and I expected people responding to the
thread to also restrict themselves to ANS-Forth code.
I am in favor of a lot more posting of code, and a lot less hand-
waving. I would really prefer that the code be ANS-Forth when possible
in order to make it accessible to as many people as possible. I don't
much like the ANS-Forth standard, but it is all that we've got at this
time.
> > > But in this case it's difficult, the colon definition that is built
> > > with my CONSTANT is shorter than one built with DOES>. So DOES> would
> > > write over the next definition's code.
> > I don't really understand how that can be. Your CONSTANT has just
> > been CREATEd, so there is nothing following it in the dictionary.
> > It's not legal to create any words between CREATE and DOES> . So,
> > just back up the dictionary pointer and rewrite it.
> The legality of creating words between CREATE and DOES> is neither
> here nor there when CONSTANT itself does not call CREATE.
> (EG: it compiles into the dictionary the equivalent of
> LDA #>val
> LDY #<val
> JMP DOCONSTANT
> for, in that peculiar setting, saving over 50% of the execution time
> of a CREATEd constant at a cost of two bytes.)
But as soon as we hit DOES> , we can just back up the dictionary
pointer and overwrite that code.
For example,
: foo constant does> @ bar ;
foo myname
At the point CONSTANT executes, we have
LDA #>val
LDY #<val
JMP DOCONSTANT
Then, when DOES> executes, it deletes all that code and replaces it with
blah
JMP DODOES
Andrew.
ANS-ColorForth not exits? mmmm.. better no
>Earlier you provided the disassembly of machine-code, but remarkably
>did NOT provide the source-code. This made me highly suspicious that
>you were desperately trying to conceal the fact that your source-code
>was not ANS-Forth. Thank you very much for finally confirming my
>suspicion.
You are wrong again. What I previously posted was an implementation
of Anton's example. The definition of FIELD in the VFX kernel is:
: field \ n <"name"> -- ; Exec: addr -- 'addr
\ *G Create a new field within a structure definition of size n bytes.
create
over , +
does>
@ +
;
It is later redefined as:
: field \ addr n -- addr n' ; optimising version of FIELD
field ['] o_field set-compiler
;
We simply find the INTERP> notation better than having an
auxiliary definition, e.g. the first of the two FIELDs in this
post.
> On Wed, 9 Dec 2009 20:15:56 -0800 (PST), wrote:
[..]
> You are wrong again. What I previously posted was an implementation
> of Anton's example. The definition of FIELD in the VFX kernel is:
Well, instead of counting meaningless bytes (on desktop machines at least),
it might be instructive to really execute some code and actually measure
run-times. Attached is a simple benchmark for all the three variants
that I found worth saving.
On iForth (and I'm sure also VFX) there is no measurable gain to be had
from trying to smarten up CREATE .. DOES>. Without measuring, one will
even postpone one's foot pretty badly, I think.
-marcel
-- field2.frt -----------------------------------------------------------
ANEW -fields2
: FIELD1 ( offset size -- new-offset )
CREATE OVER , +
DOES> ( record -- field-adr ) @ + ;
0
cell FIELD1 .aaa1
cell FIELD1 .bbb1
CONSTANT rrr
: field2 ( offset size -- new-offset )
>r >r : r@ postpone literal postpone + postpone ; r> r> + ;
0
cell FIELD2 .aaa2
cell FIELD2 .bbb2
CONSTANT rrr
: FIELD3 ( offset size -- new-offset )
>R >R : R@ (H.) S" + ; " $+ EVALUATE
R> R> + ;
0
cell FIELD3 .aaa3
cell FIELD3 .bbb3
CONSTANT rrr
CREATE sss rrr ALLOT
: t1 DUP .aaa1 @ SWAP .bbb1 @ + 2* 33 + ;
: t2 DUP .aaa2 @ SWAP .bbb2 @ + 2* 33 + ;
: t3 DUP .aaa3 @ SWAP .bbb3 @ + 2* 33 + ;
: TEST-FIELDS ( #iters -- )
LOCALS| #iters |
CR TIMER-RESET ." \ field1 : " #iters 0 ?DO sss t1 DROP LOOP .ELAPSED
CR TIMER-RESET ." \ field2 : " #iters 0 ?DO sss t2 DROP LOOP .ELAPSED
CR TIMER-RESET ." \ field3 : " #iters 0 ?DO sss t3 DROP LOOP .ELAPSED ;
FORTH> 10000000 TEST-FIELDS ( iForth 4.0, Intel PIV 3 GHz )
\ field1 : 0.020 seconds elapsed.
\ field2 : 0.114 seconds elapsed.
\ field3 : 0.020 seconds elapsed. ok
FORTH> #1000000000 TEST-FIELDS
\ field1 : 2.018 seconds elapsed.
\ field2 : 11.466 seconds elapsed.
\ field3 : 2.017 seconds elapsed. ok
> For example,
> : foo constant does> @ bar ;
> foo myname
> At the point CONSTANT executes, we have
> LDA #>val
> LDY #<val
> JMP DOCONSTANT
> Then, when DOES> executes, it deletes all that code and replaces it with
> blah
> JMP DODOES
Except you thrown away the value of the constant, since its embedded
in the code. ``LDA #<val; LDA ...'' is sitting where the ``JSR
dodoes'' goes, normally replacing ``JSR doavar'', ``#>val'' is sitting
where the low byte of a variable would be, some opcode is sitting
where the high byte of a variable would be.
A DOES> that knows that it's this constant that has been used could
indeed extract the value and put it into a cell in the right location.
If CREATE stores the xt of NOP in a ``does-handler'' variable, then
CONSTANT could store the xt of the word that fixes up the CONSTANT
code to put the value in the right place, set the dictionary pointer
to the right place, and then DOES> does its work.
> > For example,
> > : foo constant does> @ bar ;
> > foo myname
> > At the point CONSTANT executes, we have
> > LDA #>val
> > LDY #<val
> > JMP DOCONSTANT
> > Then, when DOES> executes, it deletes all that code and replaces it with
> > blah
> > JMP DODOES
> Except you thrown away the value of the constant, since its embedded
> in the code.
It's scarcely beyond the wit of our highly gifted scientists to get
the value of a constant.
Like this:
: does> (was the last word a constant?) if last @ execute ...
... then rewrite it ...
Andrew.
Well, of course the benchmark doesn't run on VFX, because there are a
number of iForth-specific words in it.
So I did a variation on it:
: FIELD1 ( offset size -- new-offset )
CREATE OVER , +
DOES> ( record -- field-adr ) @ + ;
0
cell FIELD1 .aaa1
cell FIELD1 .bbb1
CONSTANT rrr
: field2 ( offset size -- new-offset )
>r >r : r@ postpone literal postpone + postpone ; r> r> + ;
0
cell FIELD2 .aaa2
cell FIELD2 .bbb2
CONSTANT rrr
CREATE sss here 5 - , 5 ,
: t1 DUP .aaa1 @ SWAP .bbb1 @ + ;
: t2 DUP .aaa2 @ SWAP .bbb2 @ + ;
: foo1+ 0 1000000000 0 do sss t1 + loop . ;
: foo2+ 0 1000000000 0 do sss t2 + loop . ;
: foo1drop 0 1000000000 0 do sss t1 drop loop . ;
: foo2drop 0 1000000000 0 do sss t2 drop loop . ;
: foo1dep sss 1000000000 0 do t1 loop . ;
: foo2dep sss 1000000000 0 do t2 loop . ;
The DROP benchmark is closest to your version, but a sufficiently
smart compiler could optimize it away completely. The others are
versions that introduce data flow from one iteration to the next. In
the + variant, each invocation of T1/T2 is still completely
independent of the previous invocation, only the + depends on the
result of the + in the previous iteration. In the DEP variant, the
input to T1/T2 itself depends on the previous iteration.
I measured these benchmarks with
for i in 1 2; do for j in drop + dep; do /usr/bin/time -f "%U foo$i$j" vfxlin "include field2.frt foo"$i$j" bye" >/dev/null; done; done
for i in 1 2; do for j in drop + dep; do /usr/bin/time -f "%U foo$i$j" iforth-2.1 "1 cells constant cell include field2.frt foo"$i$j" bye" >/dev/null; done; done
On a 2 GHz Opteron 270 the results with vfxlin 4.40 [build 0404] are as
follows:
benchmark field1 field2
drop 4.06 4.44
+ 4.06 4.06
dep 3.53 3.25
On a 3 GHz Xeon 5450 (like a Core 2 Quad) the results with vfxlin 4.40
[build 0404] are as follows:
benchmark field1 field2
drop 2.01 1.69
+ 2.01 1.73
dep 2.90 2.25
Apparently the Opteron has enough underutilized load units so it can
absorb the overhead of the additional fetches, while the Xeon (with
its single load port) benefits from having to perform fewer loads.
On the same 3GHz Xeon 5450 the results with iForth 2.1.2541 are as
follows:
benchmark field1 field2
drop 2.04 7.74
+ 2.04 7.57
dep 2.82 11.29
So VFX is not slow on FIELD1 (it's as fast there as iforth-2.1), and
the speedup of VFX on FIELD2 is genuine, not an artifact of some
compiler limitation.
Just for completeness, here are the results from iforth 2.1.2541 on a
2.26GHz Pentium 4:
benchmark field1 field2
drop 3.67 15.52
+ 3.97 15.07
dep 3.01 13.81
The inner loop of FOO1DEP executes in 6.8 cycles on averag and is:
$08167920 mov ecx, ebx 8BCB .K
$08167922 add ecx, $081667C0 dword-ptr
030DC0671608 ..@g..
$08167928 add ebx, $081667E0 dword-ptr
031DE0671608 ..`g..
$0816792E mov ecx, [ecx 0 +] dword 8B09 ..
$08167930 add ecx, [ebx 0 +] dword 030B ..
$08167932 mov ebx, ecx 8BD9 .Y
$08167934 add [ebp 0 +] dword, 1 b#
83450001 .E..
$08167938 add [ebp 4 +] dword, 1 b#
83450401 .E..
$0816793C jno $08167920 offset NEAR
0F81DEFFFFFF
More than 1 instruction per cycle, not bad for a Pentium 4.
Hmm, looking at the VFX code for FOO1DEP, I see a similar
implementation of LOOP:
( 080BA148 8BD3 ) MOV EDX, EBX
( 080BA14A 031D809D0B08 ) ADD EBX, [080B9D80]
( 080BA150 0315B09D0B08 ) ADD EDX, [080B9DB0]
( 080BA156 8B0A ) MOV ECX, 0 [EDX]
( 080BA158 030B ) ADD ECX, 0 [EBX]
( 080BA15A 83042401 ) ADD [ESP], 01
( 080BA15E 8344240401 ) ADD [ESP+04], 01
( 080BA163 8BD9 ) MOV EBX, ECX
( 080BA165 71E1 ) JNO 080BA148
Some years ago I did an experiment where a memory cell was read,
modified, and written in a tight loop, and the CPUs at that time (IIRC
Pentium 3 and Athlon) took surprisingly long for that (significantly
longer than just adding up the usual latencies given in the books).
Maybe current CPUs are better at this kind of code, but using such a
loop might introduce a lower bound on the results.
So let's do another experiment:
: foo1dep2 sss 1000000000 0 do t1 t1 loop . ;
: foo2dep2 sss 1000000000 0 do t2 t2 loop . ;
: foo1dep3 sss 1000000000 0 do t1 t1 t1 loop . ;
: foo2dep3 sss 1000000000 0 do t2 t2 t2 loop . ;
: foo1dep4 sss 1000000000 0 do t1 t1 t1 t1 loop . ;
: foo2dep4 sss 1000000000 0 do t2 t2 t2 t2 loop . ;
On the 3GHz Xeon 5450 with vfxlin:
benchmark field1 field2
dep2 4.85 3.64
dep3 7.00 5.66
dep4 9.18 7.14
The difference from the foo1dep experiment on vfxlin is:
field1 field2
dep2-dep 1.95 1.39
dep3-dep2 2.15 2.02
dep4-dep3 2.18 1.48
Hmm, some part of this might be explained by the loop overlapping more
than one invocation of t1/t2, but I have no explanation for the
difference between the FIELD2 outcomes of dep3-dep2 and dep4-dep3.
Let's see how the 2GHz Opteron is affected:
benchmark field1 field2
dep 3.54 3.36
dep2 6.04 4.00
dep3 8.56 6.50
dep4 11.04 7.98
The FIELD2 results show a pattern similar to the Xeon, but I still
have no explanation for that.
>m...@iae.nl (Marcel Hendrix) writes:
>>Well, instead of counting meaningless bytes (on desktop machines at least),
>>it might be instructive to really execute some code and actually measure
>>run-times. Attached is a simple benchmark for all the three variants
>>that I found worth saving.
>>On iForth (and I'm sure also VFX) there is no measurable gain to be had
>>from trying to smarten up CREATE .. DOES>.
> Well, of course the benchmark doesn't run on VFX, because there are a
> number of iForth-specific words in it.
That's life.
Here are your tests (at least what was originally intended to be
tested -- constant vs variable field offsets) for an Intel PIV and
a for a Core i7 920 at 2.66 GHz. On somewhat less challenging code
( e.g. the mm.frt benchmark) the i7 is twice faster than the PIV.
On a 3 GHz PIV the results with iForth32 4.0 are as follows:
benchmark field1 field2
foo + : 2044 2014
foo drop : 2022 2011
foo dep : 2044 2018
On a 3 GHz PIV the results with iForth32 4.0 are as follows:
benchmark field1 field2
dep2 : 3376 6724
dep3 : 4722 8752
dep4 : 6188 12429 ok
On a 2.66 GHz Core i7 920 the results with iForth64 4.0 are as follows:
benchmark field1 field2
foo + : 1808 1788
foo drop : 1825 1428
foo dep : 2671 2128
On a 2.66 GHz Core i7 920 the results with iForth64 4.0 are as follows:
benchmark field1 field2
dep2 : 4751 8314
dep3 : 7422 13061
dep4 : 9498 16621 ok
The i7 is in pretty bad shape here. Your dep tests unearth a new class
of problems for this CPU variant. The only nice thing is that there was
absolutely no difference between 32 and 64-bit runtimes.
-marcel
-- --------------
ANEW -fields3
#1000000000 \ DROP #10000000
CONSTANT #iters
: .MS ( -- ) MS? 5 U.R 2 SPACES ;
: FIELD1 ( offset size -- new-offset )
CREATE OVER , +
DOES> ( record -- field-adr ) @ + ;
0
cell FIELD1 .aaa1
cell FIELD1 .bbb1
CONSTANT rrr
0 [IF]
: field2 ( offset size -- new-offset )
>r >r : r@ postpone literal postpone + postpone ; r> r> + ;
[ELSE]
: FIELD2 ( offset size -- new-offset )
>R >R : R@ (H.) S" + ; " $+ EVALUATE
R> R> + ;
[THEN]
0
cell FIELD2 .aaa2
cell FIELD2 .bbb2
CONSTANT rrr
CREATE sss here 5 - , 5 ,
#4096 CELLS ALLOT
: t1 DUP .aaa1 @ SWAP .bbb1 @ + ;
: t2 DUP .aaa2 @ SWAP .bbb2 @ + ;
: foo1+ CR ." foo + : " TIMER-RESET 0 #iters 0 do sss t1 + loop DROP .MS ;
: foo2+ TIMER-RESET 0 #iters 0 do sss t2 + loop DROP .MS ;
: foo1drop CR ." foo drop : " TIMER-RESET 0 #iters 0 do sss t1 drop loop DROP .MS ;
: foo2drop TIMER-RESET 0 #iters 0 do sss t2 drop loop DROP .MS ;
: foo1dep CR ." foo dep : " TIMER-RESET sss #iters 0 do t1 loop DROP .MS ;
: foo2dep TIMER-RESET sss #iters 0 do t2 loop DROP .MS ;
64BIT? 0= [IF]
CR .( On a 3 GHz PIV the results with iForth32 4.0 are as follows: )
CR
CR .( benchmark field1 field2 )
foo1+ foo2+
foo1drop foo2drop
foo1dep foo2dep
[ELSE]
CR .( On a 2.66 GHz Core i7 920 the results with iForth64 4.0 are as follows: )
CR
CR .( benchmark field1 field2 )
foo1+ foo2+
foo1drop foo2drop
foo1dep foo2dep
[THEN]
\ another experiment:
: foo1dep2 CR ." dep2 : " TIMER-RESET sss #iters 0 do t1 t1 loop DROP .MS ;
: foo2dep2 sss #iters 0 do t2 t2 loop DROP .MS ;
: foo1dep3 CR ." dep3 : " TIMER-RESET sss #iters 0 do t1 t1 t1 loop DROP .MS ;
: foo2dep3 sss #iters 0 do t2 t2 t2 loop DROP .MS ;
: foo1dep4 CR ." dep4 : " TIMER-RESET sss #iters 0 do t1 t1 t1 t1 loop DROP .MS ;
: foo2dep4 sss #iters 0 do t2 t2 t2 t2 loop DROP .MS ;
64BIT? 0= [IF]
CR
CR .( On a 3 GHz PIV the results with iForth32 4.0 are as follows: )
CR
CR .( benchmark field1 field2 )
foo1dep2 foo2dep2
foo1dep3 foo2dep3
foo1dep4 foo2dep4
[ELSE]
CR
CR .( On a 2.66 GHz Core i7 920 the results with iForth64 4.0 are as follows: )
CR
CR .( benchmark field1 field2 )
foo1dep2 foo2dep2
foo1dep3 foo2dep3
foo1dep4 foo2dep4
[THEN]
> Like this:
> : does> (was the last word a constant?) if last @ execute ...
> ... then rewrite it ...
Now that's what I said, wasn't it? "A DOES> that knows that it's this
constant that has been used could indeed extract the value and put it
into a cell in the right location."
... in response to the advice that "But as soon as we hit DOES> , we
can just back up the dictionary pointer and overwrite that code,"
which under the above the highly gifted scientists wisely ignore,
first checking to see if just backing up while throw away information,
and then retrieving that information.