> >> You are going in the wrong direction, at least for iForth.
> > Thanks for your answer. I'm not too surprised though, because even the
> > two versions of gforth/powerpc (ie: gforth vs. gforth-fast) lead to
> > different "rankings":
> > It all just seems to confirm that manual micro-optimization is a gamble.
> > Cheers,
> > Alex
> Perhaps the difference is due to CPU architecture.
One part of it surely is, but - as i stated - the forth compiler used
plays a major role too.
> AFAIK, the pentium has > a deeper pipeline compared to the PPC?
It used to be this way - my ppc is rather old anyway, so it's pipeline
might look even shorter when comparing with a modern x86 (but i didn't
care for such trends at all lately, so this is just a guess).
> So iForth is slower probably > because of pipeline bubbles due to branch misprediction? -AD
That's where guessing and gambling starts without access to everything
involved (iForth, x86).
In the end, there are a lot of factors, esp. when looking at todays
superscalar CPUs, and the real world might still look *totally*
different than a simplistic benchmark would suggest. (Different input
data, different call pattern, etc.).
To sum it up: Searching the "best" variant of such a function by using a
benchmark is nerd-fun, but that's probably it.
> Arnold Doray <inva...@invalid.com> wrote:
[..]
>> Perhaps the difference is due to CPU architecture.
> One part of it surely is, but - as i stated - the forth compiler used
> plays a major role too.
[..]
>> So iForth is slower probably >> because of pipeline bubbles due to branch misprediction? -AD
> That's where guessing and gambling starts without access to everything
> involved (iForth, x86).
Would it be so much less of a gamble when *did* have access?
> In the end, there are a lot of factors, esp. when looking at todays
> superscalar CPUs, and the real world might still look *totally*
> different than a simplistic benchmark would suggest. (Different input
> data, different call pattern, etc.).
This exercise has shown that the difference between algorithms is (1) large,
and (2) unpredictable and counterintuitive. I looked again just now, and I would not have expected lcsymbol4 to be the best (in iForth64), nor can can I understand its terrible performance in 32bit mode.
> To sum it up: Searching the "best" variant of such a function by using a
> benchmark is nerd-fun, but that's probably it.
In Forth we can test such things at runtime very easily. Imagine
Forth applications that adapt themselves to their surroundings organically :-)
On Fri, 27 Jul 2012 15:39:37 +0200, Marcel Hendrix wrote:
>> To sum it up: Searching the "best" variant of such a function by using
>> a benchmark is nerd-fun, but that's probably it.
> In Forth we can test such things at runtime very easily. Imagine Forth
> applications that adapt themselves to their surroundings organically :-)
> -marcel
A tongue-in-cheek comment, but still an interesting thought.
JITs do this all the time, but the direction for optimization is unambigious. With multiple algorithms, it would take longer to test for optimality, and for the hypothetical "JIT" to kick in, unless there are some heuristics. A more obvious application is processing data across multiple nodes -- sometimes it's good to just stream the data from the "data node" to the "compute node", at others, to send the computation to be performed on the "data node" itself and relay the results back to the compute node.