About stack access profundity

Pablo Hugo Reda

unread,

Jan 27, 2013, 3:11:54 PM1/27/13

to

For two definition who make the same, the less distance in stack access is more optimal, need less register when compiler, etc. Is convenient make definition with less profundity access en the stack
.
I choose limit the word PICK and make it static, I have only PICK2,PICK3,PICK4 and no more.
Whell, until now, not need more, but now I found a word who need more PICK's.

I reeimplement the Quadratic Bezier Curve word and try to draw Cubic Bezier Curves now.
I use recursion, only integers and add,shift and compare.

Because I use recursion, I need 6 parameters (3 2D points) and the rutine access to 8 position (I need PICK8 definition).

I think about workaround the problem:

if compress the 2dvector for use one place, the word only need PICK4, work in the current system!
but this add the time for change representation (2 cells to 1 cell..) or extact (address to x y..).

but when PICK8 the solution is more direct.

I need advice..
thank's

humptydumpty

unread,

Jan 28, 2013, 3:59:47 AM1/28/13

to

Hi!
Could you split P(X,Y) problem to problem for X and problem for Y
and solve it one a time?
Have a nice day,
humptydumpty

Andrew Haley

unread,

Jan 28, 2013, 4:10:26 AM1/28/13

to

I think the mistake you're making is keeping everything on the stack.
As you say, objects deep on the stack are hare to access.

Instead, use a structure with named offsets for your co-ordinates and
vectors. You'll find the code is much easier to read and write.

Andrew.

Mark Wills

unread,

Jan 28, 2013, 4:56:30 AM1/28/13

to

Pass pointers, rather than the parameters directly?

Pablo Hugo Reda

unread,

Jan 28, 2013, 10:17:52 AM1/28/13

to

>
>
>
> Hi!
>
> Could you split P(X,Y) problem to problem for X and problem for Y
>
> and solve it one a time?
>

you say...
Y have X Y X Y X Y and tranform to X X X Y Y Y, ok work until I need the distance to cut the recursion.

thank's

Pablo Hugo Reda

unread,

Jan 28, 2013, 10:23:53 AM1/28/13

to

>
>
> I think the mistake you're making is keeping everything on the stack.
>
> As you say, objects deep on the stack are hare to access.
>
>
>
> Instead, use a structure with named offsets for your co-ordinates and
>
> vectors. You'll find the code is much easier to read and write.
>
>

the stack if the more fast access and I need here because the recursion use this estructure

> Pass pointers, rather than the parameters directly?

but with address need a @ and ! to load store values.

when the compiler optimice the best place is the stack

look the cuadric bezier
-------------------------------------------------------------------
:sp-dist | x y xe ye -- x y xe ye dd
pick3 pick2 - abs pick3 pick2 - abs + ;

:sp-c | fx fy cx cy -- fx fy cx cy xn yn ; xn=(cx*2+fx+px)/4
pick3 pick2 2* + px + 2 >>
pick3 pick2 2* + py + 2 >> ;

:spl | fx fy cx cy --
sp-c sp-dist
4 <? ( drop line 2drop line ; ) drop
>r >r
pick3 pick2 + 2/ pick3 pick2 + 2/ | fx fy cx cy c2 c2
2swap | fx fy c2 c2 cx cy
py + 2/ swap px + 2/ swap | fx fy c2 c2 c1 c1
r> r> 2swap
spl spl ;

-------------------------------------------------------------------

Andrew Haley

unread,

Jan 28, 2013, 11:46:47 AM1/28/13

to

Pablo Hugo Reda <pabl...@gmail.com> wrote:
>>

>> I think the mistake you're making is keeping everything on the stack.

>> As you say, objects deep on the stack are [hard] to access.

>>
>> Instead, use a structure with named offsets for your co-ordinates and
>> vectors. You'll find the code is much easier to read and write.
>
> the stack if the more fast access and I need here because the
> recursion use this estructure
>
>> Pass pointers, rather than the parameters directly?
>
> but with address need a @ and ! to load store values.
> when the compiler optimice the best place is the stack

That really isn't a good enough reason. You'll end up with very
hard-to-maintain code, like this:

> look the cuadric bezier
> -------------------------------------------------------------------
> :sp-dist | x y xe ye -- x y xe ye dd
> pick3 pick2 - abs pick3 pick2 - abs + ;
>
> :sp-c | fx fy cx cy -- fx fy cx cy xn yn ; xn=(cx*2+fx+px)/4
> pick3 pick2 2* + px + 2 >>
> pick3 pick2 2* + py + 2 >> ;
>
> :spl | fx fy cx cy --
> sp-c sp-dist
> 4 <? ( drop line 2drop line ; ) drop
> >r >r
> pick3 pick2 + 2/ pick3 pick2 + 2/ | fx fy cx cy c2 c2
> 2swap | fx fy c2 c2 cx cy
> py + 2/ swap px + 2/ swap | fx fy c2 c2 c1 c1
> r> r> 2swap
> spl spl ;
>
> -------------------------------------------------------------------

Experienced Forthers don't much PICK, and almost never ROLL. I think
you need to concentrate on making things easy to read and write.
Optimize later, once you have working code.

In addition, there is no reason that a decent optimizer will produce
much worse code for @ than for PICK .

Andrew.

Anton Ertl

unread,

Jan 29, 2013, 3:20:47 AM1/29/13

to

There's a good reason why decent compilers will produce worse code for
@ than for PICK. Whether the worseness is "much" or not can be
debated forever without resolution, so let's not go there.

We have a Forth language feature that makes it relatively easy for
compilers to produce as fast native code as when using PICK: locals.
I am not sure if there are Forth compilers yet that are up to that.
There are quite a number of locals-haters around here, some arguing
that locals are foreign (as in "from another programming language"),
some that they make it possible to get away with badly factored code.

For this application, the badly-factored code argument may be
applicable, but have we seen better-factored code? If we have
better-factored code that produces slower native code, we could think
about whether some new language feature might reconcile good factoring
with good native code.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2012: http://www.euroforth.org/ef12/

Mark Wills

unread,

Jan 29, 2013, 3:57:09 AM1/29/13

to

This code is crying out for locals! Use locals. All your PICKs will go
away :-)

Andrew Haley

unread,

Jan 29, 2013, 4:25:18 AM1/29/13

to

Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
> Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>>Pablo Hugo Reda <pabl...@gmail.com> wrote:
>>In addition, there is no reason that a decent optimizer will produce
>>much worse code for @ than for PICK .
>
> There's a good reason why decent compilers will produce worse code for
> @ than for PICK. Whether the worseness is "much" or not can be
> debated forever without resolution, so let's not go there.

OK, so let's not mention it then, whatever it may be.

> We have a Forth language feature that makes it relatively easy for
> compilers to produce as fast native code as when using PICK: locals.
> I am not sure if there are Forth compilers yet that are up to that.
> There are quite a number of locals-haters around here, some arguing
> that locals are foreign (as in "from another programming language"),
> some that they make it possible to get away with badly factored code.
>
> For this application, the badly-factored code argument may be
> applicable, but have we seen better-factored code? If we have
> better-factored code that produces slower native code, we could think
> about whether some new language feature might reconcile good factoring
> with good native code.

I suspect that it's not to do with language features but code
generators.

Andrew.

Anton Ertl

unread,

Jan 29, 2013, 4:47:48 AM1/29/13

to

Andrew Haley <andr...@littlepinkcloud.invalid> writes:

>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>> For this application, the badly-factored code argument may be
>> applicable, but have we seen better-factored code? If we have
>> better-factored code that produces slower native code, we could think
>> about whether some new language feature might reconcile good factoring
>> with good native code.
>
>I suspect that it's not to do with language features but code
>generators.

Please elaborate on that. What better-factored code do you have in
mind? And how do code generators help?

Andrew Haley

unread,

Jan 29, 2013, 5:08:41 AM1/29/13

to

Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
> Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>>> For this application, the badly-factored code argument may be
>>> applicable, but have we seen better-factored code? If we have
>>> better-factored code that produces slower native code, we could think
>>> about whether some new language feature might reconcile good factoring
>>> with good native code.
>>
>>I suspect that it's not to do with language features but code
>>generators.
>
> Please elaborate on that. What better-factored code do you have in
> mind? And how do code generators help?

You said "better-factored code that produces slower native code." My
response is that if better-factored code produces slower native code,
it's not necessarily a language issue. Maybe a more elaborate (or
just better) compiler is all that's needed.

Andrew.

Stephen Pelc

unread,

Jan 29, 2013, 5:08:22 AM1/29/13

to

On Tue, 29 Jan 2013 08:20:47 GMT, an...@mips.complang.tuwien.ac.at
(Anton Ertl) wrote:

>There's a good reason why decent compilers will produce worse code for
>@ than for PICK. Whether the worseness is "much" or not can be
>debated forever without resolution, so let's not go there.

Please explain this.

Stephen

--
Stephen Pelc, steph...@mpeforth.com
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691
web: http://www.mpeforth.com - free VFX Forth downloads

Pablo Hugo Reda

unread,

Jan 29, 2013, 10:10:28 AM1/29/13

to steph...@invalid.mpeforth.com

>
> >There's a good reason why decent compilers will produce worse code for
>
> >@ than for PICK. Whether the worseness is "much" or not can be
>
> >debated forever without resolution, so let's not go there.
>

stack operations disappears in the compiled code (for me implementation)
but memory access not (for now)

Anton Ertl

unread,

Jan 29, 2013, 10:17:18 AM1/29/13

to

steph...@mpeforth.com (Stephen Pelc) writes:
>On Tue, 29 Jan 2013 08:20:47 GMT, an...@mips.complang.tuwien.ac.at
>(Anton Ertl) wrote:
>
>>There's a good reason why decent compilers will produce worse code for
>>@ than for PICK. Whether the worseness is "much" or not can be
>>debated forever without resolution, so let's not go there.
>
>Please explain this.

A decent compiler will produce a load-from-memory for a @, while stack
items can be kept in registers and thus don't require a load (PICK and
ROLL with non-constant u are an exception, but that's not what the OP
did.

Anton Ertl

unread,

Jan 29, 2013, 10:21:33 AM1/29/13

to

That depends on the way in which the code is factored. The one that I
had in mind was your suggestion of storing the stuff in memory and
passing addresses; not sure if it results in better-factored code, but
a decent compiler will produce slower code for that than for a version
using constant PICKs.

Anton Ertl

unread,

Jan 29, 2013, 10:30:37 AM1/29/13

to

I published a paper that discusses various ways to deal with needing
to deal with too much data at once.

@InProceedings{ertl11euroforth,
author = {M. Anton Ertl},
title = {Ways to Reduce the Stack Depth},
crossref = {euroforth11},
pages = {36--41},
url = {http://www.complang.tuwien.ac.at/papers/ertl11euroforth.ps.gz},
url2 = {http://www.complang.tuwien.ac.at/anton/euroforth/ef11/papers/ertl.pdf},
OPTnote = {not refereed},
abstract = {Having to deal with many different data can lead to
problems in Forth: The data stack is the preferred
place to store data; on the other hand, dealing with
too many data stack items is cumbersome and usually
bad style. This paper presents and discusses ways to
unburden the data stack; some of them are used
widely, others are almost unknown or new.}

Andrew Haley

unread,

Jan 29, 2013, 10:57:48 AM1/29/13

to

Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
> Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>>> Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>>>>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>>>>> For this application, the badly-factored code argument may be
>>>>> applicable, but have we seen better-factored code? If we have
>>>>> better-factored code that produces slower native code, we could think
>>>>> about whether some new language feature might reconcile good factoring
>>>>> with good native code.
>>>>
>>>>I suspect that it's not to do with language features but code
>>>>generators.
>>>
>>> Please elaborate on that. What better-factored code do you have in
>>> mind? And how do code generators help?
>>
>>You said "better-factored code that produces slower native code." My
>>response is that if better-factored code produces slower native code,
>>it's not necessarily a language issue. Maybe a more elaborate (or
>>just better) compiler is all that's needed.
>
> That depends on the way in which the code is factored. The one that I
> had in mind was your suggestion of storing the stuff in memory and
> passing addresses; not sure if it results in better-factored code, but
> a decent compiler will produce slower code for that than for a version
> using constant PICKs.

Maybe. I guess you're assuming that a decent compiler will never
hoist memory references into registers but will convert stack accesses
(even via PICK) into register moves rather than memory references.

If so, I'm going to suggest that adopting a coding style based on the
advantages and disadvantages of a particular style of Forth compiler
isn't really appropriate, and it's certainly not appropriate to add
new language features to work around such compilers.

Andrew.

Pablo Hugo Reda

unread,

Jan 29, 2013, 12:06:53 PM1/29/13

to

Thank's Anton

reading...

Stephen Pelc

unread,

Jan 29, 2013, 12:06:37 PM1/29/13

to

On Tue, 29 Jan 2013 15:21:33 GMT, an...@mips.complang.tuwien.ac.at
(Anton Ertl) wrote:

>That depends on the way in which the code is factored. The one that I
>had in mind was your suggestion of storing the stuff in memory and
>passing addresses; not sure if it results in better-factored code, but
>a decent compiler will produce slower code for that than for a version
>using constant PICKs.

Nearly all PICKs that I have seen have used a literal index.

What happens is that PICKs produce fetches indexed from the data
stack pointer, and (assuming that the code generator gets the
structure address into a register) the structure is accessed as
offsets from a base register. If an item on the stack is DUPped
or OVERred, it's usually a big hint to a compiler that the item
should be in a register.

When we rewrote our PowerView embedded GUI to pass structures rather
than than keep graphics coordinates on the stack, the code (for ARM
and Cortex) became shorter and faster. In our experience, your
assertion does not hold bcause the use of structures considerably
reduces the stack traffic.

I'm prepared to be convinced that there are conditions under which
our experience does not hold.

Anton Ertl

unread,

Jan 29, 2013, 12:01:56 PM1/29/13

to

Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>Maybe. I guess you're assuming that a decent compiler will never
>hoist memory references into registers but will convert stack accesses
>(even via PICK) into register moves rather than memory references.

Yes.

>If so, I'm going to suggest that adopting a coding style based on the
>advantages and disadvantages of a particular style of Forth compiler
>isn't really appropriate,

This assuption is not based on a particular style of Forth compiler,
but on pretty fundamental issues:

In typical Forth code it is relatively easy to map stack items to
registers (there are exceptions, but they are rare and do not occur in
the case in point).

In contrast, many memory accesses cannot be turned into register
accesses, even with compiler heroics like alias analysis (on which
hundreds of papers have been written) or perversities like GCC's
strict aliasing.

Coming back to your statement, how would you suggest that words where
performance matters should be written?

>and it's certainly not appropriate to add
>new language features to work around such compilers.

It's certainly appropriate to add language features that allow
compilers to provide predictable performance, especially in a
low-level language like Forth.

Anton Ertl

unread,

Jan 29, 2013, 12:58:00 PM1/29/13

to

steph...@mpeforth.com (Stephen Pelc) writes:
>What happens is that PICKs produce fetches indexed from the data
>stack pointer,

VFX is better than you give it credit for:

variable A
variable B
variable C
: bla A @ B @ 1 pick + swap drop C ! ;
see bla

shows

( 080BF3A0 8B153C240A08 ) MOV EDX, [080A243C]
( 080BF3A6 031540240A08 ) ADD EDX, [080A2440]
( 080BF3AC 891544240A08 ) MOV [080A2444], EDX
( 080BF3B2 C3 ) NEXT,

i.e., no fetch indexed from the data stack pointer (no reference to
the data stack at all). VFX recognizes that PICK accesses a stack
element in a register and optimizes it away.

>When we rewrote our PowerView embedded GUI to pass structures rather
>than than keep graphics coordinates on the stack, the code (for ARM
>and Cortex) became shorter and faster. In our experience, your
>assertion does not hold bcause the use of structures considerably
>reduces the stack traffic.

I would have to look at the concrete code (before and after the
change) to give a proper comment on that. But if all that happens is
that you replace a "5 PICK" (which should produce a register reference
or, if there are not enough registers, a memory reference to the
memory part of the stack) with something like "DUP .X @", "OVER .X @",
"R@ .X @" (which should produce at least one memory reference, for the
@), the use of structures should not be shorter and faster.

I think that the limited scope of VFXs register allocator reduces the
benefit of stack references, but they still should not hurt.

Andrew Haley

unread,

Jan 29, 2013, 1:22:40 PM1/29/13

to

Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
> Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>>Maybe. I guess you're assuming that a decent compiler will never
>>hoist memory references into registers but will convert stack accesses
>>(even via PICK) into register moves rather than memory references.
>
> Yes.
>
>>If so, I'm going to suggest that adopting a coding style based on the
>>advantages and disadvantages of a particular style of Forth compiler
>>isn't really appropriate,
>
> This assuption is not based on a particular style of Forth compiler,
> but on pretty fundamental issues:
>
> In typical Forth code it is relatively easy to map stack items to
> registers (there are exceptions, but they are rare and do not occur in
> the case in point).

I take your point. However, that is assuming a particular style of
Forth implementation. It certainly doesn't apply to traditional
threaded code or even to simple native code Forths; in some systems it
might well be the case that a fetch from a field is faster than a
PICK. We don't know.

> In contrast, many memory accesses cannot be turned into register
> accesses, even with compiler heroics like alias analysis (on which
> hundreds of papers have been written) or perversities like GCC's
> strict aliasing.

Eh? GCC's strict aliasing is type-based alias analysis, like many C
compilers. There's nothing special about it AFAIK.

> Coming back to your statement, how would you suggest that words where
> performance matters should be written?

If it's a non-portable problem, and I suggest it may well be, it can
be solved with non-portable words. Stephen's posting suggests very
strongly that it's not universally a problem with native code
generators.

Dreaming for a moment: there may, I suppose, be some mileage in a hint
to a compiler that some memory references really don't alias to
anything else.

>>and it's certainly not appropriate to add new language features to
>>work around such compilers.

> It's certainly appropriate to add language features that allow
> compilers to provide predictable performance, especially in a
> low-level language like Forth.

Sure, but it's not worth distorting Forth source to do it. As usual,
IMO, YMMV, etc.

Andrew.

Andrew Haley

unread,

Jan 29, 2013, 1:29:09 PM1/29/13

to

Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:

> steph...@mpeforth.com (Stephen Pelc) writes:
>
>>When we rewrote our PowerView embedded GUI to pass structures rather
>>than than keep graphics coordinates on the stack, the code (for ARM
>>and Cortex) became shorter and faster. In our experience, your
>>assertion does not hold bcause the use of structures considerably
>>reduces the stack traffic.
>
> I would have to look at the concrete code (before and after the
> change) to give a proper comment on that. But if all that happens is
> that you replace a "5 PICK" (which should produce a register reference
> or, if there are not enough registers, a memory reference to the
> memory part of the stack) with something like "DUP .X @", "OVER .X @",
> "R@ .X @" (which should produce at least one memory reference, for the
> @), the use of structures should not be shorter and faster.

It's unlikely to be just "5 PICK", though. There will be writes too,
and that's either POKE (aargh) or lots of stack thrashing to get the
data into position.

Andrew.

Bernd Paysan

unread,

Jan 29, 2013, 2:17:54 PM1/29/13

to

Anton Ertl wrote:
> I would have to look at the concrete code (before and after the
> change) to give a proper comment on that. But if all that happens is
> that you replace a "5 PICK" (which should produce a register reference
> or, if there are not enough registers, a memory reference to the
> memory part of the stack) with something like "DUP .X @", "OVER .X @",
> "R@ .X @" (which should produce at least one memory reference, for the
> @), the use of structures should not be shorter and faster.

Not convinced. VFX doesn't do that too well:

begin-structure point ok-2
field: .x ok-2
field: .y ok-2
end-structure ok
: test ( addr -- ) >r r@ .x @ r@ .y @ 2dup + r@ .x ! - r> .y ! ; ok
see test
TEST
( 080BC940 53 ) PUSH EBX
( 080BC941 8B1424 ) MOV EDX, [ESP]
( 080BC944 8B4A04 ) MOV ECX, [EDX+04]
( 080BC947 030B ) ADD ECX, 0 [EBX]
( 080BC949 8B0424 ) MOV EAX, [ESP]
( 080BC94C 8D6DF4 ) LEA EBP, [EBP+-0C]
( 080BC94F 894D00 ) MOV [EBP], ECX
( 080BC952 8B4A04 ) MOV ECX, [EDX+04]
( 080BC955 894D04 ) MOV [EBP+04], ECX
( 080BC958 8B13 ) MOV EDX, 0 [EBX]
( 080BC95A 895508 ) MOV [EBP+08], EDX
( 080BC95D 8BD8 ) MOV EBX, EAX
( 080BC95F 8B5500 ) MOV EDX, [EBP]
( 080BC962 8913 ) MOV 0 [EBX], EDX
( 080BC964 8B5D08 ) MOV EBX, [EBP+08]
( 080BC967 2B5D04 ) SUB EBX, [EBP+04]
( 080BC96A 5A ) POP EDX
( 080BC96B 895A04 ) MOV [EDX+04], EBX
( 080BC96E 8B5D0C ) MOV EBX, [EBP+0C]
( 080BC971 8D6D10 ) LEA EBP, [EBP+10]
( 080BC974 C3 ) NEXT,
( 53 bytes, 21 instructions )

That's 21 instructions, clearly not what I would have written by hand.

Compare that to bigForth, using the current object pointer OOP:

debugging class point ok
cell var .x ok
cell var .y ok
how: ok
public: : test >o .x @ .y @ 2dup + .x ! - .y ! o> ; ok
disw test Adresse : 268670656
100396C0: push EDI 57
100396C1: mov EDI,EAX 8BF8
100396C3: lodsd AD
100396C4: xchg ESP,ESI 87F4
100396C6: push EAX 50
100396C7: push DWORD PTR $04[EDI]
FF7704
100396CA: mov EAX,$08[EDI] 8B4708
100396CD: mov EDX,[ESP] 8B1424
100396D0: push EAX 50
100396D1: add EAX,EDX 03C2
100396D3: push EAX 50
100396D4: pop DWORD PTR $04[EDI]
8F4704
100396D7: pop EAX 58
100396D8: pop EDX 5A
100396D9: xchg EDX,EAX 92
100396DA: sub EAX,EDX 2BC2
100396DC: push EAX 50
100396DD: pop DWORD PTR $08[EDI]
8F4708
100396E0: pop EAX 58
100396E1: xchg ESP,ESI 87F4
100396E3: pop EDI 5F
100396E4: ret C3

22 instructions, room for improvement, because bigForth isn't an
analytical compiler. What I would expect is that apart from the struct
memory accesses, everything would fit into the registers.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/

Marcel Hendrix

unread,

Jan 29, 2013, 2:44:40 PM1/29/13

to

Bernd Paysan <bernd....@gmx.de> writes Re: About stack access profundity

> Anton Ertl wrote:
>> I would have to look at the concrete code (before and after the
>> change) to give a proper comment on that. But if all that happens is
>> that you replace a "5 PICK" (which should produce a register reference
>> or, if there are not enough registers, a memory reference to the
>> memory part of the stack) with something like "DUP .X @", "OVER .X @",
>> "R@ .X @" (which should produce at least one memory reference, for the
>> @), the use of structures should not be shorter and faster.

> Not convinced. VFX doesn't do that too well:

[..]

iForth64 ...

FORTH> begin-structure point ok
[3]FORTH> field: .x ok
[3]FORTH> field: .y ok
[3]FORTH> end-structure ok
FORTH> ok
FORTH> : test ( addr -- ) >r r@ .x @ r@ .y @ 2dup + r@ .x ! - r> .y ! ; ok
FORTH> see test
Flags: TOKENIZE, ANSI
: test >R R@ .x @ R@ .y @ 2DUP + R@ .x ! - R> .y ! ; ok
FORTH> ' test idis
$01404300 : [trashed]
$0140430A pop rbx
$0140430B mov rdi, [rbx] qword
$0140430E add rdi, [rbx 8 +] qword
$01404312 mov rax, [rbx] qword
$01404315 mov rdx, [rbx 8 +] qword
$01404319 mov [rbx] qword, rdi
$0140431C sub rax, rdx
$0140431F mov [rbx 8 +] qword, rax
$01404323 ;
FORTH> create ape 2 cells allot ok
FORTH> : tt ape test ; ok
FORTH> see tt
Flags: TOKENIZE, ANSI
: tt ape test ; ok
FORTH> ' tt idis
$01404BC0 : [trashed]
$01404BCA mov rbx, $01404780 qword-offset
$01404BD1 add rbx, $01404788 qword-offset
$01404BD8 mov rdi, $01404780 qword-offset
$01404BDF mov rax, $01404788 qword-offset
$01404BE6 mov $01404780 qword-offset, rbx
$01404BED sub rdi, rax
$01404BF0 mov $01404788 qword-offset, rdi
$01404BF7 ;

-marcel

Bernd Paysan

unread,

Jan 29, 2013, 4:14:45 PM1/29/13

to

Marcel Hendrix wrote:
> iForth64 ...

Great! That's pretty close to what I would write by hand.

Hand-code (let's assume rax is tos):

mov rbx, [rax]
mov rcx, [rax+8]
lea rdx, [rbx+rcx]
sub rbx, rcx
mov [rax], rdx
mov [rax+8], rbx

Approach: Don't load values twice, though on x86, you have quite a lot
of load units. Use lea for add when you need a three operand add.

Not sure why your compiler generates

mov rax, [rbx] qword

mov rdx, [rbx 8 +] qword

sub rax, rdx

instead of

mov rax, [rbx] qword
sub rax, [rbx 8 +] qword

Mark Wills

unread,

Jan 30, 2013, 2:12:29 AM1/30/13

to

On Jan 29, 5:58 pm, an...@mips.complang.tuwien.ac.at (Anton Ertl)
wrote:

Exactly the point I was making on another thread. An obsessive
programmer might second guess the compiler, and instead of writing
plain, vanilla, straight-ahead code, come up with something that he
*perceives* to be faster, but instead, it's slower because the
optimiser can't identify what the programmers real intention was.

Unless you are using a dumb ITC compiler, just write the code.

Anton Ertl

unread,

Jan 30, 2013, 11:29:42 AM1/30/13

to

Bernd Paysan <bernd....@gmx.de> writes:
>Anton Ertl wrote:
>> I would have to look at the concrete code (before and after the
>> change) to give a proper comment on that. But if all that happens is
>> that you replace a "5 PICK" (which should produce a register reference
>> or, if there are not enough registers, a memory reference to the
>> memory part of the stack) with something like "DUP .X @", "OVER .X @",
>> "R@ .X @" (which should produce at least one memory reference, for the
>> @), the use of structures should not be shorter and faster.
>
>Not convinced.

Of what are you are not convinced?

Yes, so VFX is not as great as we might like, but that does not tell
us anything about whether it does better for PICKing or for @ing code.

But let's try it. I wanted to use the original example
<61d42b93-0ac8-4df3...@googlegroups.com> for this, but
it's unclear to me what it does, in particular the line

4 <? ( drop line 2drop line ; ) drop

and PX and PY.

So I fell back to the good old rectangle example:

begin-structure point
field: point-x
field: point-y
end-structure

defer line ( p1 p2 -- )
defer make-point ( x y -- p )
defer free-point ( p -- )

: line-line ( p1 p2 p3 -- )
\ draw a line between p1 and p2
\ and one between p2 and p3
over line line ;

: rect-mem ( ll ur -- )
over point-x @ over point-y @ make-point
( ll ur ul )
>r 2dup r@ swap line-line r> free-point
over point-y @ over point-x @ swap make-point
( ll ur lr )
>r 2dup r@ swap line-line r> free-point ;

defer line-stack ( x1 y1 x2 y2 -- )

: rect-local {: x1 y1 x2 y2 -- :}
x1 y1 x1 y2 line-stack
x1 y2 x2 y2 line-stack
x2 y2 x2 y1 line-stack
x2 y1 x1 y1 line-stack ;

: rect-stack ( x1 y1 x2 y2 -- )
3 pick 3 pick over 3 pick line-stack
3 pick over 3 pick over line-stack
over over over 3 pick line-stack
over 3 pick 3 pick 3 pick line-stack
2drop 2drop ;

see rect-mem
see rect-local
see rect-stack

and SEEing the results showed:

RECT-MEM
...
( 147 bytes, 46 instructions )

RECT-LOCAL
...
( 166 bytes, 54 instructions )

RECT-STACK
...
( 121 bytes, 37 instructions )

If you want a rect-mem2 that calls line-stack and is thus closer to
rect-stack and rect-local in what it does internally, feel free to
post it and I'll run it through VFX.

Anton Ertl

unread,

Jan 30, 2013, 12:22:57 PM1/30/13

to

Well, in Pablo Reda's code, and that's what we are talking about,
there were only PICKs (for the memory variant, @s), no STICKs/POKEs,
or ROLLs.

>and that's either POKE (aargh) or lots of stack thrashing to get the
>data into position.

Yes, if Stephen's code did that, that might explain the larger code.

Pablo Hugo Reda

unread,

Jan 30, 2013, 12:58:00 PM1/30/13

to

> but
>
> it's unclear to me what it does, in particular the line
>
>
>
> 4 <? ( drop line 2drop line ; ) drop
>
>
>
> and PX and PY.
>
>

sorry for have a forth dialect, not at ans-forth
I take the ideas from colorforth, I not use STATE, not use DOES>, etc.

PX and PY are variables, the initial point for the curve, LINE draw a line and update the PX and PY.

4 <? (..

go inside (..) when the Top of stack is <4 (and consume 4)

here is the code generated by the compiler, in FASM syntax.
I remove the comments because have 200 lines with this.
"uso" is the stack profundity and "dD" is the stack variation
------------------------------------------------------------
w24: ; ::: sp-dist ::: uso:-4 dD:1
mov ebx,dword [esi+8]
sub ebx,dword [esi]
mov edx,ebx
sar edx,31
add ebx,edx
xor ebx,edx
mov ecx,dword [esi+4]
sub ecx,eax
mov edx,ecx
sar edx,31
add ecx,edx
xor ecx,edx
add ebx,ecx
lea esi,[esi-4]
mov dword [esi],eax
mov eax,ebx
ret

w25: ; ::: sp-c ::: uso:-4 dD:2
mov ebx,dword [esi]
sal ebx,1
mov ecx,dword [esi+8]
add ecx,ebx
add ecx,dword [w9]
sar ecx,$2
mov ebx,eax
sal ebx,1
mov edx,dword [esi+4]
add edx,ebx
add edx,dword [wA]
sar edx,$2
lea esi,[esi-8]
mov dword [esi+4],eax
mov dword [esi],ecx
mov eax,edx
ret

w26: ; ::: spl ::: uso:-4 dD:-4
call w25
call w24
cmp eax,$4
jge _46
lea esi,[esi+4]
mov eax,dword [esi-4]
call w23
lea esi,[esi+8]
mov eax,dword [esi-4]
jmp w23
_46:
push dword [esi]
push dword [esi+4]
mov eax,dword [esi+20]
add eax,dword [esi+12]
sar eax,1
mov ebx,dword [esi+16]
add ebx,dword [esi+8]
sar ebx,1
mov ecx,dword [esi+8]
add ecx,dword [wA]
sar ecx,1
mov edx,dword [esi+12]
add edx,dword [w9]
sar edx,1
pop edi
mov [esi+8],eax
pop eax
lea esi,[esi-4]
xchg dword [esi+12],ebx
mov dword [esi+8],edi
mov dword [esi+4],eax
mov dword [esi],edx
mov dword [esi+16],ebx
mov eax,ecx
call w26
jmp w26
----------------------------------------------------

sure Hendrix and others compiler is better but I not finish. have more ideas for improvement.

I don't have the solution, but thank's for all for comment.

Anton Ertl

unread,

Jan 30, 2013, 12:40:13 PM1/30/13

to

Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:

>> This assuption is not based on a particular style of Forth compiler,
>> but on pretty fundamental issues:
>>
>> In typical Forth code it is relatively easy to map stack items to
>> registers (there are exceptions, but they are rare and do not occur in
>> the case in point).
>
>I take your point. However, that is assuming a particular style of
>Forth implementation.

No, it's a statement about what is possible in an implementation.

> It certainly doesn't apply to traditional
>threaded code or even to simple native code Forths;

These don't make use of that property, then. Does it matter? If a
user wants high performance, will he use such a system? Probably not.
I see little point in optimizing for systems that are intentionally
slow, except if that is your particular target platform (but then we
are into platform-specific, not general optimizations).

>in some systems it
>might well be the case that a fetch from a field is faster than a
>PICK. We don't know.

Sure, it's possible to write a PICK that's so slow that this is true.
But why would you optimize for such an intentionally-slow system?

>> In contrast, many memory accesses cannot be turned into register
>> accesses, even with compiler heroics like alias analysis (on which
>> hundreds of papers have been written) or perversities like GCC's
>> strict aliasing.
>
>Eh? GCC's strict aliasing is type-based alias analysis, like many C
>compilers. There's nothing special about it AFAIK.

Just because there are other perverts does not make it any less
perverted.

>> Coming back to your statement, how would you suggest that words where
>> performance matters should be written?
>
>If it's a non-portable problem, and I suggest it may well be, it can
>be solved with non-portable words.

Like what? Pablo Reda's code obviously is specific to his system, but
there is nothing system-specific about cubic bezier curves, and such a
program can be written in standard Forth.

More generally, is your suggestion is that we should not use Forth if
we care for performance? What should we use?

>Stephen's posting suggests very
>strongly that it's not universally a problem with native code
>generators.

What is not universally a problem?

My rectangle example shows that Stephen's code generator produces
smaller code for code using PICK than for the memory-based solution.

>Dreaming for a moment: there may, I suppose, be some mileage in a hint
>to a compiler that some memory references really don't alias to
>anything else.

Hey, Andrew Haley suggests a language feature!

I am not sure if it's a good solution, though, and I don't want to
discuss this suggestion in detail at the moment, but currently I don't
have any other solution, either.

>> It's certainly appropriate to add language features that allow
>> compilers to provide predictable performance, especially in a
>> low-level language like Forth.
>
>Sure, but it's not worth distorting Forth source to do it.

What do you mean with "distorting Forth source"? Using your compiler
hint would certainly be visible in the source code, no?

Paul Rubin

unread,

Jan 30, 2013, 2:49:59 PM1/30/13

to

an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>>might well be the case that a fetch from a field is faster than a
>>PICK. We don't know.
>
> Sure, it's possible to write a PICK that's so slow that this is true.
> But why would you optimize for such an intentionally-slow system?

Stack-oriented hardware that is super fast at traditional stack
operations but slow at reaching into the stack? Such cpu's have figured
significantly into Forth history.

Andrew Haley

unread,

Jan 30, 2013, 5:32:40 PM1/30/13

to

Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
> Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>>> This assuption is not based on a particular style of Forth compiler,
>>> but on pretty fundamental issues:
>>>
>>> In typical Forth code it is relatively easy to map stack items to
>>> registers (there are exceptions, but they are rare and do not occur in
>>> the case in point).
>>
>>I take your point. However, that is assuming a particular style of
>>Forth implementation.
>
> No, it's a statement about what is possible in an implementation.

Eh? Of course not. Hoisting memory references is far from
impossible.

>> It certainly doesn't apply to traditional
>>threaded code or even to simple native code Forths;
>
> These don't make use of that property, then. Does it matter?

Yes, of course. We're talking about contorting perfectly reasonable
Forth code (using PICKs) because doing it some other way that's easier
to write and understand (field accesses) is slower.

> If a user wants high performance, will he use such a system?
> Probably not. I see little point in optimizing for systems that are
> intentionally slow, except if that is your particular target
> platform (but then we are into platform-specific, not general
> optimizations).

I don't think that threaded-code implementations are intentionally
slow; they're optimizing for other things.

>>in some systems it might well be the case that a fetch from a field
>>is faster than a PICK. We don't know.
>
> Sure, it's possible to write a PICK that's so slow that this is true.
> But why would you optimize for such an intentionally-slow system?
>
>>> In contrast, many memory accesses cannot be turned into register
>>> accesses, even with compiler heroics like alias analysis (on which
>>> hundreds of papers have been written) or perversities like GCC's
>>> strict aliasing.
>>
>>Eh? GCC's strict aliasing is type-based alias analysis, like many C
>>compilers. There's nothing special about it AFAIK.
>
> Just because there are other perverts does not make it any less
> perverted.

Um, yes it does, by definition.

>>> Coming back to your statement, how would you suggest that words where
>>> performance matters should be written?
>>
>>If it's a non-portable problem, and I suggest it may well be, it can
>>be solved with non-portable words.
>
> Like what? Pablo Reda's code obviously is specific to his system, but
> there is nothing system-specific about cubic bezier curves, and such a
> program can be written in standard Forth.
>
> More generally, is your suggestion is that we should not use Forth if
> we care for performance? What should we use?

No, that is not my suggestion. Obviously.

I am certainly not convinced that caring for performace involves
always squeezing the last microsecond out of every operation. That
way leads to the enormous complexity of today's compilers for other
languages. I don't think that's appropriate for Forth. Forth gets
its performance from clarity and simplicity, by eschewing much of that
complexity.

>>Stephen's posting suggests very strongly that it's not universally a
>>problem with native code generators.
>
> What is not universally a problem?

The performance problem that causes PICK to be faster than field
accesses.

> My rectangle example shows that Stephen's code generator produces
> smaller code for code using PICK than for the memory-based solution.
>
>>Dreaming for a moment: there may, I suppose, be some mileage in a hint
>>to a compiler that some memory references really don't alias to
>>anything else.
>
> Hey, Andrew Haley suggests a language feature!

Eh? This discussion is becoming surreal.

> I am not sure if it's a good solution, though, and I don't want to
> discuss this suggestion in detail at the moment, but currently I don't
> have any other solution, either.

Indeed.

>>> It's certainly appropriate to add language features that allow
>>> compilers to provide predictable performance, especially in a
>>> low-level language like Forth.
>>
>>Sure, but it's not worth distorting Forth source to do it.
>
> What do you mean with "distorting Forth source"? Using your compiler
> hint would certainly be visible in the source code, no?

By "contorting" I mean replacing field acesses with picks and pokes.

Andrew.

Stephen Pelc

unread,

Jan 31, 2013, 7:32:16 AM1/31/13

to

On Tue, 29 Jan 2013 20:17:54 +0100, Bernd Paysan <bernd....@gmx.de>
wrote:

The code from the LEA up to but not including the SUB is a stack
shuffle (we ran out of registers) followed by a store. Although it's
possible to do it without the shuffle, VFX's "rip up and retry"
rules are not clever enough here.

Marcel's iForth version is for 64 bit code with 16 registers. This
means more registers, so no shuffle, and because there are more
registers, some can be used for the return stack. Nice code.

Bernd Paysan

unread,

Jan 31, 2013, 9:37:22 AM1/31/13

to

Stephen Pelc wrote:
> The code from the LEA up to but not including the SUB is a stack
> shuffle (we ran out of registers) followed by a store. Although it's
> possible to do it without the shuffle, VFX's "rip up and retry"
> rules are not clever enough here.

You need four free registers (including TOS) to fit this code in. In
bigForth, I have three free registers, because I have SP, RP, UP, OP
(object pointer) and an index for loops in a register. VFX doesn't have
an OP, and doesn't waste another register for the index, so it should be
possible to fit it into the available registers. The main thing you
need to do is to try hard to eliminate redundancy - all active values
should only occupy one single register, unless a copy is dearly needed.

> Marcel's iForth version is for 64 bit code with 16 registers. This
> means more registers, so no shuffle, and because there are more
> registers, some can be used for the return stack. Nice code.

Yes, x64 is a much better target, because you have 8 really free
registers (in x86, there is no register without a special role, though
EBX is only used in xlat, and xlat is really superfluous - however, you
can argue that x86 has 8 special purpose registers, and no general
purpose register at all).

This is also impacting C compilers. The string instructions use up ECX,
ESI, and EDI, multiplication uses up EAX and EDX; -fomit-frame-pointer
gives you EBP, so you have EBP and EBX, no more. In Gforth, we
therefore moved all string operations into real subroutines (even though
that makes them a bit slower), and only then the C compiler is able to
fit in the most important things.

Anton Ertl

unread,

Jan 31, 2013, 11:02:06 AM1/31/13

to

Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>> Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>>>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>>>> This assuption is not based on a particular style of Forth compiler,
>>>> but on pretty fundamental issues:
>>>>
>>>> In typical Forth code it is relatively easy to map stack items to
>>>> registers (there are exceptions, but they are rare and do not occur in
>>>> the case in point).
>>>
>>>I take your point. However, that is assuming a particular style of
>>>Forth implementation.
>>
>> No, it's a statement about what is possible in an implementation.
>
>Eh? Of course not. Hoisting memory references is far from
>impossible.

Eliminating memory accesses is possible in some cases, but there are
other cases where it is impossible; and whether a compiler actually
does eliminate the memory access in a case where it is possible is
hard to predict; also, I would not expect a Forth compiler to do it
because of the costs in complexity and compilation time (and AFAIK no
Forth compiler does it).

So, in my performance model for general use, a memory access on the
programming language level costs a memory access on the machine level;
if you are doing supercomputer-type programming, you may use a more
complex performance model, but that's too complex for most uses.

>>> It certainly doesn't apply to traditional
>>>threaded code or even to simple native code Forths;
>>
>> These don't make use of that property, then. Does it matter?
>
>Yes, of course. We're talking about contorting perfectly reasonable
>Forth code (using PICKs) because doing it some other way that's easier
>to write and understand (field accesses) is slower.

You are talking about that, but the OP actually has code that uses
PICKs and no code that uses memory, so he obviously has not contorted
memory-using code.

For the rectangle example, I had the memory and locals variants from
the paper
<http://www.complang.tuwien.ac.at/anton/euroforth/ef11/papers/ertl.pdf>
and derived the stack variant from the locals variant, not the memory
variant. You might consider the PICKing variant more contorted than
the locals variant, but the locals-haters may disagree.

BTW, I would expect picking and locals to be equally fast. Both can
be mapped to registers.

>> Just because there are other perverts does not make it any less
>> perverted.
>
>Um, yes it does, by definition.

The definition of "pervert" is not "singleton".

>I am certainly not convinced that caring for performace involves
>always squeezing the last microsecond out of every operation. That
>way leads to the enormous complexity of today's compilers for other
>languages. I don't think that's appropriate for Forth. Forth gets
>its performance from clarity and simplicity, by eschewing much of that
>complexity.

We seem to be on the same page here. And certainly, when doing the
the Bezier thing for real, one would note that the LINE word uses so
much time that the time for dealing with the coordinates probably does
not matter (whether stack, locals, or memory).

But there are other cases where performance does matter, and there one
needs a performance model. Having that, we can consider if the
cleaner solution is expected to be faster (good), equally fast (good),
or slower, and in the latter case, which concern is more important.

And on a different level, if we find that there are many cases where
speed conflicts with clarity and simplicity, maybe there could be a
new language feature that allows us to reconcile both.

>>>Stephen's posting suggests very strongly that it's not universally a
>>>problem with native code generators.
>>
>> What is not universally a problem?
>
>The performance problem that causes PICK to be faster than field
>accesses.

His posting does not suggest that very much. He compared two
unspecified pieces of code, and for all we know (and as you
suggested), there might be a lot of differences other than just
between PICKs and field accesses.

>>>Dreaming for a moment: there may, I suppose, be some mileage in a hint
>>>to a compiler that some memory references really don't alias to
>>>anything else.
>>
>> Hey, Andrew Haley suggests a language feature!
>
>Eh? This discussion is becoming surreal.

Hmm? Ok, we started discussing your dreams, but it's a somewhat
realistic dream.

Pablo Hugo Reda

unread,

Feb 3, 2013, 9:22:51 AM2/3/13

to

Well, at last

using some vars and reorganize I get a version without more picks
there is the test version

#x1 #y1 #x2 #y2 #bx #by

:curve3 | xf yf x2 y2 x1 y1
pick3 pick2 + 2/ pick3 pick2 + 2/ 'by ! 'bx !
'y1 ! 'x1 !
pick3 pick2 + 2/ pick3 pick2 + 2/ 2swap
'y2 ! 'x2 !
over bx + 2/ over by + 2/
over x2 - abs over y2 - abs + >r
px x1 + 2/ py y1 + 2/
over bx + 2/ over by + 2/
over x1 - abs over y1 - abs + >r
2swap >r >r
pick3 pick2 + 2/ pick3 pick2 + 2/
2swap r> r>
r> 3 <? ( drop 4drop 2dup 'py ! 'px ! line )( drop curve3 )
r> 3 <? ( drop 4drop 2dup 'py ! 'px ! line ; )
drop curve3 ;

Mark Wills

unread,

Feb 3, 2013, 10:25:06 AM2/3/13

to

This looks truly horrible to me! Utterly unreadable :-\

humptydumpty

unread,

Feb 3, 2013, 2:26:21 PM2/3/13

to

On Sunday, January 27, 2013 10:11:54 PM UTC+2, Pablo Hugo Reda wrote:
> For two definition who make the same, the less distance in stack access is more optimal, need less register when compiler, etc. Is convenient make definition with less profundity access en the stack
>
> .
>
> I choose limit the word PICK and make it static, I have only PICK2,PICK3,PICK4 and no more.
>
> Whell, until now, not need more, but now I found a word who need more PICK's.
>
>
>
> I reeimplement the Quadratic Bezier Curve word and try to draw Cubic Bezier Curves now.
>
> I use recursion, only integers and add,shift and compare.
>
>
>
> Because I use recursion, I need 6 parameters (3 2D points) and the rutine access to 8 position (I need PICK8 definition).
>
>
>
> I think about workaround the problem:
>
>
>
> if compress the 2dvector for use one place, the word only need PICK4, work in the current system!
>
> but this add the time for change representation (2 cells to 1 cell..) or extact (address to x y..).
>
>
>
> but when PICK8 the solution is more direct.
>
>
>
> I need advice..
>
> thank's

Hi!

An ideea could be: derive from points coordinates a function
of parameter t that is between 0 and a maximum number.
Use that function for recursion.

Here an implementation under `thisforth'-what I have
on my hands now. On gforth use `]]',`[[' for MACRO portions.
( I'm fully aware of sensitiveness of compiling environment
for `thisforth' macros )

\ Quadratic spline
--> stack \ load stack tools
\ --- HELPER WORDS ---
: underdup
>r dup r> ;
: |<2^n ( n -- 2^m ; minimal 2^m > n )
1 BEGIN 2dup > WHILE 2* REPEAT nip ;
: s*/
over IF */ ELSE drop nip THEN ; \ safe `*/' on thisforth
: [sqp] ( ctime: c b a m -- ; compile square polynomial ; MACRO )
dup >R rot rot R> swap ( c m b m a )
PLEASE " dup >R literal literal s*/ literal + R> literal s*/ literal + " ; IMMEDIATE
\ ---
8 HERE STACK Q \ temporary stack to pass parameters
: >Q Q push ; \ to use at compile time
: Q> Q top Q pop ;
\ Maximum value of distance between same coordinates EndPoints
VARIABLE EPMAX 0 EPMAX !
: >epmax ( n -- ; store max. distance )
abs dup EPMAX @ > IF EPMAX ! ELSE drop THEN ;
VARIABLE STEPS \ Number of steps
\ Calculate square polynomial coefs and Store maximum distance
: coef ( p0 p2 p1 -- c b a )
tuck - \ p0 p1 p2-p1
dup >R >epmax
underdup - \ p0 p0-p1
dup dup >epmax \ p0 p0-p1 p0-p1
R> + >R \ p0 p0-p1
negate 2* R> ; \ p0 2[p1-p0] [p2-2p1+p0]
: pre ( xstart xend xcontrol ystart yend ycontrol -- )
coef >Q >Q >Q coef >Q >Q >Q
EPMAX @ 2* |<2^n STEPS ! ;
\ Leave an XT - function of parameter T between 0 and STEPS, from coordinates points
: newspline ( xstart xend xcontrol ystart yend ycontrol -- xt[t] steps ; 0<t<steps )
pre
PLEASE " :noname dup >R [ Q> Q> Q> STEPS @ ] [sqp] R> [ Q> Q> Q> STEPS @ ] [sqp] ; "
STEPS @ ;
\ --- TESTS ---
-1 [IF]

4 CONSTANT SF \ Scaling Factor
: scale>R ( n -- ; R: n*2^SF )
PLEASE " SF lshift >R " ; IMMEDIATE
: 6scale ( xs xe xc ys ye yc -- XS XE XC YS YE YC )
scale>R scale>R scale>R scale>R scale>R scale>R
R> R> R> R> R> R> ;
: unscale ( N -- n )
SF rshift ;

70 250 20 250 60 110 6scale newspline swap ( cr .s cr) 1+ 0 DO I over execute swap unscale . unscale . cr loop drop

\ echo 'plot "<aforth qspl.fo" with lines ; pause mouse ' | gnuplot ; pkill aforth

( cr .s cr ) bye
[THEN]

Have a nice day,
humptydumpty

Pablo Hugo Reda

unread,

Feb 3, 2013, 6:49:04 PM2/3/13

to

nice, but the integer (no escale) recursion version be faster (not test but i guess) but the recursion when is real a line or close, not fixed steps.

My system not work with STATE, not have interpretative mode, is like Colorforth.
I like yours advices, very originals.

Pablo Hugo Reda

unread,

Feb 3, 2013, 6:51:26 PM2/3/13

to

for me is clear water :-\

Coos Haak

unread,

Feb 3, 2013, 7:12:05 PM2/3/13

to

Op Sun, 3 Feb 2013 15:51:26 -0800 (PST) schreef Pablo Hugo Reda:

(snipped the dreaded google groups multiple empty lines)

> El domingo, 3 de febrero de 2013 12:25:06 UTC-3, M.R.W Wills escribi�:

>> On Feb 3, 2:22�pm, Pablo Hugo Reda <pablor...@gmail.com> wrote:
>> This looks truly horrible to me! Utterly unreadable :-\
>
> for me is clear water :-\

Perhaps, but why not for once use locals, as Mark Wills suggested earlier?

--
Coos

CHForth, 16 bit DOS applications
http://home.hccnet.nl/j.j.haak/forth.html

Mark Wills

unread,

Feb 4, 2013, 2:10:36 AM2/4/13

to

On Feb 3, 2:22 pm, Pablo Hugo Reda <pablor...@gmail.com> wrote:

I'm surprised that noone has mentioned factoring. This definition
looks to be too long, and is certainly much longer than a definition
following classical practice.

Surely this definition could benefit from factoring?

Andrew Haley

unread,

Feb 4, 2013, 4:26:40 AM2/4/13

to

Mark Wills <forth...@gmail.com> wrote:

> On Feb 3, 2:22?pm, Pablo Hugo Reda <pablor...@gmail.com> wrote:
>> Well, at last
>>
>> using some vars and reorganize I get a version without more picks
>> there is the test version
>>
>> #x1 #y1 #x2 #y2 #bx #by
>>
>> :curve3 | xf yf x2 y2 x1 y1
>> pick3 pick2 + 2/ pick3 pick2 + 2/ 'by ! 'bx !
>> 'y1 ! 'x1 !
>> pick3 pick2 + 2/ pick3 pick2 + 2/ 2swap
>> 'y2 ! 'x2 !
>> over bx + 2/ over by + 2/
>> over x2 - abs over y2 - abs + >r
>> px x1 + 2/ py y1 + 2/
>> over bx + 2/ over by + 2/
>> over x1 - abs over y1 - abs + >r
>> 2swap >r >r
>> pick3 pick2 + 2/ pick3 pick2 + 2/
>> 2swap r> r>

>> r> 3 < ( drop 4drop 2dup 'py ! 'px ! line )( drop curve3 )
>> r> 3 < ( drop 4drop 2dup 'py ! 'px ! line ; )

>> drop curve3 ;
>
> I'm surprised that noone has mentioned factoring.

I'm not. This definition is so tragic it's beyond saving, and the OP
has refused to take any sensible advice because it's not "fast
enough". Why bother?

Andrew.

Anton Ertl

unread,

Feb 4, 2013, 11:39:41 AM2/4/13

to

Paul Rubin <no.e...@nospam.invalid> writes:
>an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>>>might well be the case that a fetch from a field is faster than a
>>>PICK. We don't know.
>>
>> Sure, it's possible to write a PICK that's so slow that this is true.
>> But why would you optimize for such an intentionally-slow system?
>
>Stack-oriented hardware that is super fast at traditional stack
>operations

Not really. Even the fast stack operations (like DUP) take one cycle,
and SWAP is often more expensive.

> but slow at reaching into the stack?

Yes. There have been Forth CPUs that supported fast constant PICK,
but AFAIK they never got very far.

> Such cpu's have figured
>significantly into Forth history.

Whatever you consider history. The have a lot of mindshare in the
Forth community, but the practical relevance is pretty small. The
most-used one seems to be the RTX2000, and that has not been
manufactured for a long time, so AFAIK nobody does new projects with
it.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html

EuroForth 2013: http://www.euroforth.org/ef13/