Heureka - from the -Ofun department

Leopold Toetsch

unread,

Feb 8, 2006, 1:07:50 PM2/8/06

to Perl 6 Internals

Parrot runs the ackermann benchmark faster than C.
leo

Ofun.txt

Joshua Isom

unread,

Feb 8, 2006, 1:27:07 PM2/8/06

to Leopold Toetsch, Perl 6 Internals

I guess this is one place where being CISC really is better than being
RISC. But how much improvement with outputting to a pbc first? But a
couple notes, there's no --help-optimize like --help-debug, and as far
as I know, there's no way to disable optimizations completely, e.g.
this pir

.sub main :main
$N0 = pow 2.0, 5.0
.end

Is always converted to this.

main:
set N0, 32
end

Which can lead to misleading test results for when pow's actually
broken.

On Feb 8, 2006, at 7:07 AM, Leopold Toetsch wrote:

> Parrot runs the ackermann benchmark faster than C.
> leo

> Heureka - from the -Ofun department
>
> Or - running the ackermann function (and possibly other recursive
> functions) really fast.
>
> $ time ./parrot -Oc -C ack.pir 11
> Ack(3, 11) = 16381
>
> real 0m0.567s
> user 0m0.559s
> sys 0m0.008s
>
> $ time ./ack 11
> Ack(3,11): 16381
>
> real 0m0.980s
> user 0m0.978s
> sys 0m0.002s
>
> Parrot is svn head (not optimized BTW), the C version is compiled -O3
> with 3.3.5. Below is the C [1] and the PIR source [2]. The ack
> function is renamed to __pic_test - currently all the checks that
> would enable this optimization are missing, e.g. if only I and N
> registers are used and not too many, if there is no introspection and
> so on.
>
> The created JIT code for the ack function is also shown [3]. This is
> actually only the recursive part of it, there is a small stub that
> moves the arguments into registers and fetches the return result.
>
> ./parrot -Oc -C turns on tail-recursion optimization (recursive tail
> calls are converted to loops). -C is the CGP or direct-threaded
> runcore, which is able to recompile statically known stuff into faster
> versions by e.g. caching lookups and such.
>
> Have fun,
> leo

Jonathan Worthington

unread,

Feb 8, 2006, 1:55:39 PM2/8/06

to Joshua Isom, Leopold Toetsch, Perl 6 Internals

"Joshua Isom" <lon...@ritalin.shout.net> wrote:
>I guess this is one place where being CISC really is better than being
>RISC. But how much improvement with outputting to a pbc first? But a
>couple notes, there's no --help-optimize like --help-debug, and as far as I
>know, there's no way to disable optimizations completely, e.g. this pir
>
> .sub main :main
> $N0 = pow 2.0, 5.0
> .end
>
> Is always converted to this.
>
> main:
> set N0, 32
> end
>

I think that doesn't class as an optimization as such, but rather a way to
avoid having to have opcodes that take two constant arguments. Over a
number of instructions, that's a worthwhile saving. Ops cost space in the
Parrot executable, which means we fit less of Parrot in the CPU caches,
which is bad.

> Which can lead to misleading test results for when pow's actually broken.
>

Hopefully we have tests with just one constant argument to detect breakage
of this kind? It does bring out an interesting issue in so far as if we
don't promise the same behaviour for pow then you could end up with a hybrid
of the behaviour on the system the PBC was generated on and the system you
run it on, but I think we're aiming for consistent behaviour.

Jonathan

Leopold Toetsch

unread,

Feb 8, 2006, 3:23:19 PM2/8/06

to Joshua Isom, Perl 6 Internals

Joshua Isom wrote:
> I guess this is one place where being CISC really is better than being
> RISC.

It really depends on the hardare you are running. E.g.

add I0, I1, 20

translates either to something like:

lea %ecx, 20(%edx) # not yet but in case ..

or

ori r11, r31, 20 # r31 is 0
add r3, r4, r11 # a 2nd oris is needed for constants > 0xffff

If x86 were not the leading instruction set, I'd toss the opcode
variants with constants almost all.

> ... But how much improvement with outputting to a pbc first?

If you are gonna running the code multiple times, you save a few cycles
compilation time. JIT code is still created at runtime.

> ... But a

> couple notes, there's no --help-optimize like --help-debug, and as far
> as I know,

perldoc docs/runnning.pod - but these optimizations aren't settled nor
finished. You can ignore them for now.

> there's no way to disable optimizations completely, e.g. this
> pir
>
> .sub main :main
> $N0 = pow 2.0, 5.0
> .end
>
> Is always converted to this.
>
> main:
> set N0, 32
> end

This isn't an optimization. The 'pow' opcode is just run at compiletime.

> Which can lead to misleading test results for when pow's actually broken.

If 'pow' is broken then it's borken. The only case where that would
matter is, when you are compiling a PBC on a machine with a broken 'pow'
and ship that PBC.

leo

Leopold Toetsch

unread,

Feb 8, 2006, 4:16:45 PM2/8/06

to Joshua Isom, Perl 6 Internals

Leopold Toetsch wrote:

>
> ori r11, r31, 20 # r31 is 0
> add r3, r4, r11 # a 2nd oris is needed for constants > 0xffff

Well that's actually a bad example as there exists addi and addis
instructions.

But have a look at src/jit/arm/jit_emit.h emit_load_constant() and
follow the functions it's calling.

leo

Joshua Isom

unread,

Feb 8, 2006, 9:28:34 PM2/8/06

to Leopold Toetsch, Perl 6 Internals

But with jit enabled on x86/freebsd/openbsd, I was having problems with
some of the pow functions. The rt number is #38382. Because of the
compile time optimization, it made it trickier to work with because the
compile time was ok, but the jit runtime wasn't, and it took me a
little while to realize the compile time optimization. I'm not saying
no optimizations should be the default, but an option to disable
compile time optimizations would help with the testing the interpreter
instead of the compiler.

Leopold Toetsch

unread,

Feb 8, 2006, 10:35:20 PM2/8/06

to Joshua Isom, Perl 6 Internals

On Feb 8, 2006, at 22:28, Joshua Isom wrote:

> but an option to disable compile time optimizations would help with
> the testing the interpreter instead of the compiler

It's not an optimization and it can't be turned off, as there are no
such opcodes like 'pow_i_ic_ic'. And again - the evaluation of that
constant is using the parrot *runtime* (compilation is runtime, just
earlier ;) . And if a JITted program behaves differently that's still
another case.

See also compilers/imcc/optimizer.c:eval_ins and especially line 655.

leo

Leopold Toetsch

unread,

Feb 9, 2006, 11:11:02 AM2/9/06

to Markus Laire, Perl 6 Internals

Markus Laire wrote:

> Is there a reason why this can't be "turned off" like this:
> convert

> $N0 = pow 2.0, 5.0

> to
> $N9999 = 2.0
> $N0 = pow N9999, 5.0

That's exactly what is executed. Registers are filled with the value of
the constants and the 'pow' opcode is executed. The little difference is
that it is executed a bit earlier, that's all.

> Markus Laire

leo

Markus Laire

unread,

Feb 9, 2006, 9:49:39 AM2/9/06

to Leopold Toetsch, Perl 6 Internals

On 2/9/06, Leopold Toetsch <l...@toetsch.at> wrote:
>
> On Feb 8, 2006, at 22:28, Joshua Isom wrote:
>
> > but an option to disable compile time optimizations would help with
> > the testing the interpreter instead of the compiler
>
> It's not an optimization and it can't be turned off, as there are no
> such opcodes like 'pow_i_ic_ic'. And again - the evaluation of that
> constant is using the parrot *runtime* (compilation is runtime, just
> earlier ;) . And if a JITted program behaves differently that's still
> another case.

Is there a reason why this can't be "turned off" like this:
convert

$N0 = pow 2.0, 5.0

to
$N9999 = 2.0
$N0 = pow N9999, 5.0

where N9999 is an otherwise unused register.
If the only problem is that pow can't be called with 2 constants, this
would trivially solve it.

ps. The syntax might be wrong, as I don't program in parrot. I'm just
following the conversation.

--
Markus Laire

Andy Dougherty

unread,

Feb 10, 2006, 5:34:14 PM2/10/06

to Perl 6 Internals

On Wed, 8 Feb 2006, Leopold Toetsch wrote:

> Parrot runs the ackermann benchmark faster than C.
>

> $ time ./parrot -Oc -C ack.pir 11
> Ack(3, 11) = 16381
>
> real 0m0.567s
> user 0m0.559s
> sys 0m0.008s
>
> $ time ./ack 11
> Ack(3,11): 16381
>
> real 0m0.980s
> user 0m0.978s
> sys 0m0.002s

This looked like fun, so I tried it on Solaris/SPARC. Alas, I didn't
have such great luck. First, though the -C runcore made about a factor
of 2 difference in timings (gcc/SPARC), it's only only available (as
far as I can tell) with gcc. Still, you might as well use it if you
can.

On SPARC, I found Parrot took 12 times as long:

C: time ./ack 11
Ack(3,11): 16381

real 1m7.62s
user 1m7.36s
sys 0m0.05s

Parrot: time ./parrot -Oc -C ack.pir 11
Ack(3, 11) = 16381

real 11m56.24s
user 11m52.12s
sys 0m0.14s

Thinking it might have something to do with the SPARC architecture,
I tried it on x86, where Parrot took 80 times as long:

C: time ./ack 11
Ack(3,11): 16381

real 0m0.759s
user 0m0.758s
sys 0m0.002s

Parrot: time ./parrot -Oc -C ../tmp/ack.pir 11
Ack(3, 11) = 16381

real 1m1.211s
user 1m1.087s
sys 0m0.021s

Something's obviously very goofy there, but I don't know what.

--
Andy Dougherty doug...@lafayette.edu

Leopold Toetsch

unread,

Feb 10, 2006, 10:35:01 PM2/10/06

to Andy Dougherty, Perl 6 Internals

On Feb 10, 2006, at 18:34, Andy Dougherty wrote:

> On Wed, 8 Feb 2006, Leopold Toetsch wrote:
>
>> Parrot runs the ackermann benchmark faster than C.

> This looked like fun, so I tried it on Solaris/SPARC.

Solaris/SPARC doesn't have a working JIT runcore rurrently and I can't
test it - no chance yet.
The mentioned optimizations are currently ok for x86 and ppc.

> I tried it on x86, where Parrot took 80 times as long:

Rafl was also reporting it not to be working, which is really strange,
because he could see (and trace) genereated optimized JIT code. It's
still unclear what's going on here.

Anyway, with the official (and a bit slower shootout code):

$ time ./parrot -Oc -Cj examples/shootout/ack.pir 11
Ack(3, 11) = 16381

real 0m0.748s

$ time ./parrot -Oc -Sj examples/shootout/ack.pir 11
Ack(3, 11) = 16381

real 0m0.800s

The optimized PIR from the email is 0.5 sec, the C code takes 0.9 secs,
all with AMD X2@2000.
Parrot Revision: 11505

leo

Matt Diephouse

unread,

Feb 11, 2006, 5:52:32 AM2/11/06

to Andy Dougherty, Perl 6 Internals

Andy Dougherty <doug...@lafayette.edu> wrote:
> Thinking it might have something to do with the SPARC architecture,
> I tried it on x86, where Parrot took 80 times as long:
>
> C: time ./ack 11
> Ack(3,11): 16381
>
> real 0m0.759s
> user 0m0.758s
> sys 0m0.002s
>
> Parrot: time ./parrot -Oc -C ../tmp/ack.pir 11
> Ack(3, 11) = 16381
>
> real 1m1.211s
> user 1m1.087s
> sys 0m0.021s
>
> Something's obviously very goofy there, but I don't know what.

Leo forgot the -j flag, which enables JIT, in his post. It makes quite
a difference:

Parrot w/o JIT: time ./parrot -C -Oc examples/shootout/ack.pir 11
Ack(3, 11) = 16381

real 7m11.365s
user 7m11.066s
sys 0m0.168s

Parrot w/ JIT: time ./parrot -Cj -Oc examples/shootout/ack.pir 11
Ack(3, 11) = 16381

real 0m3.502s
user 0m3.477s
sys 0m0.021s

GCC 4.0: time ./ack 11
Ack(3,11): 16381

real 0m1.960s
user 0m1.948s
sys 0m0.003s

I didn't use the custom PIR he posted (which is faster), so Parrot
didn't beat the GCC code.

--
matt diephouse
http://matt.diephouse.com