.sub main :main
$N0 = pow 2.0, 5.0
.end
Is always converted to this.
main:
set N0, 32
end
Which can lead to misleading test results for when pow's actually
broken.
On Feb 8, 2006, at 7:07 AM, Leopold Toetsch wrote:
> Parrot runs the ackermann benchmark faster than C.
> leo
> Heureka - from the -Ofun department
>
> Or - running the ackermann function (and possibly other recursive
> functions) really fast.
>
> $ time ./parrot -Oc -C ack.pir 11
> Ack(3, 11) = 16381
>
> real 0m0.567s
> user 0m0.559s
> sys 0m0.008s
>
> $ time ./ack 11
> Ack(3,11): 16381
>
> real 0m0.980s
> user 0m0.978s
> sys 0m0.002s
>
> Parrot is svn head (not optimized BTW), the C version is compiled -O3
> with 3.3.5. Below is the C [1] and the PIR source [2]. The ack
> function is renamed to __pic_test - currently all the checks that
> would enable this optimization are missing, e.g. if only I and N
> registers are used and not too many, if there is no introspection and
> so on.
>
> The created JIT code for the ack function is also shown [3]. This is
> actually only the recursive part of it, there is a small stub that
> moves the arguments into registers and fetches the return result.
>
> ./parrot -Oc -C turns on tail-recursion optimization (recursive tail
> calls are converted to loops). -C is the CGP or direct-threaded
> runcore, which is able to recompile statically known stuff into faster
> versions by e.g. caching lookups and such.
>
> Have fun,
> leo
> Which can lead to misleading test results for when pow's actually broken.
>
Hopefully we have tests with just one constant argument to detect breakage
of this kind? It does bring out an interesting issue in so far as if we
don't promise the same behaviour for pow then you could end up with a hybrid
of the behaviour on the system the PBC was generated on and the system you
run it on, but I think we're aiming for consistent behaviour.
Jonathan
It really depends on the hardare you are running. E.g.
add I0, I1, 20
translates either to something like:
lea %ecx, 20(%edx) # not yet but in case ..
or
ori r11, r31, 20 # r31 is 0
add r3, r4, r11 # a 2nd oris is needed for constants > 0xffff
If x86 were not the leading instruction set, I'd toss the opcode
variants with constants almost all.
> ... But how much improvement with outputting to a pbc first?
If you are gonna running the code multiple times, you save a few cycles
compilation time. JIT code is still created at runtime.
> ... But a
> couple notes, there's no --help-optimize like --help-debug, and as far
> as I know,
perldoc docs/runnning.pod - but these optimizations aren't settled nor
finished. You can ignore them for now.
> there's no way to disable optimizations completely, e.g. this
> pir
>
> .sub main :main
> $N0 = pow 2.0, 5.0
> .end
>
> Is always converted to this.
>
> main:
> set N0, 32
> end
This isn't an optimization. The 'pow' opcode is just run at compiletime.
> Which can lead to misleading test results for when pow's actually broken.
If 'pow' is broken then it's borken. The only case where that would
matter is, when you are compiling a PBC on a machine with a broken 'pow'
and ship that PBC.
leo
>
> ori r11, r31, 20 # r31 is 0
> add r3, r4, r11 # a 2nd oris is needed for constants > 0xffff
Well that's actually a bad example as there exists addi and addis
instructions.
But have a look at src/jit/arm/jit_emit.h emit_load_constant() and
follow the functions it's calling.
leo
> but an option to disable compile time optimizations would help with
> the testing the interpreter instead of the compiler
It's not an optimization and it can't be turned off, as there are no
such opcodes like 'pow_i_ic_ic'. And again - the evaluation of that
constant is using the parrot *runtime* (compilation is runtime, just
earlier ;) . And if a JITted program behaves differently that's still
another case.
See also compilers/imcc/optimizer.c:eval_ins and especially line 655.
leo
> Is there a reason why this can't be "turned off" like this:
> convert
> $N0 = pow 2.0, 5.0
> to
> $N9999 = 2.0
> $N0 = pow N9999, 5.0
That's exactly what is executed. Registers are filled with the value of
the constants and the 'pow' opcode is executed. The little difference is
that it is executed a bit earlier, that's all.
> Markus Laire
leo
Is there a reason why this can't be "turned off" like this:
convert
$N0 = pow 2.0, 5.0
to
$N9999 = 2.0
$N0 = pow N9999, 5.0
where N9999 is an otherwise unused register.
If the only problem is that pow can't be called with 2 constants, this
would trivially solve it.
ps. The syntax might be wrong, as I don't program in parrot. I'm just
following the conversation.
--
Markus Laire
> Parrot runs the ackermann benchmark faster than C.
>
> $ time ./parrot -Oc -C ack.pir 11
> Ack(3, 11) = 16381
>
> real 0m0.567s
> user 0m0.559s
> sys 0m0.008s
>
> $ time ./ack 11
> Ack(3,11): 16381
>
> real 0m0.980s
> user 0m0.978s
> sys 0m0.002s
This looked like fun, so I tried it on Solaris/SPARC. Alas, I didn't
have such great luck. First, though the -C runcore made about a factor
of 2 difference in timings (gcc/SPARC), it's only only available (as
far as I can tell) with gcc. Still, you might as well use it if you
can.
On SPARC, I found Parrot took 12 times as long:
C: time ./ack 11
Ack(3,11): 16381
real 1m7.62s
user 1m7.36s
sys 0m0.05s
Parrot: time ./parrot -Oc -C ack.pir 11
Ack(3, 11) = 16381
real 11m56.24s
user 11m52.12s
sys 0m0.14s
Thinking it might have something to do with the SPARC architecture,
I tried it on x86, where Parrot took 80 times as long:
C: time ./ack 11
Ack(3,11): 16381
real 0m0.759s
user 0m0.758s
sys 0m0.002s
Parrot: time ./parrot -Oc -C ../tmp/ack.pir 11
Ack(3, 11) = 16381
real 1m1.211s
user 1m1.087s
sys 0m0.021s
Something's obviously very goofy there, but I don't know what.
--
Andy Dougherty doug...@lafayette.edu
> On Wed, 8 Feb 2006, Leopold Toetsch wrote:
>
>> Parrot runs the ackermann benchmark faster than C.
> This looked like fun, so I tried it on Solaris/SPARC.
Solaris/SPARC doesn't have a working JIT runcore rurrently and I can't
test it - no chance yet.
The mentioned optimizations are currently ok for x86 and ppc.
> I tried it on x86, where Parrot took 80 times as long:
Rafl was also reporting it not to be working, which is really strange,
because he could see (and trace) genereated optimized JIT code. It's
still unclear what's going on here.
Anyway, with the official (and a bit slower shootout code):
$ time ./parrot -Oc -Cj examples/shootout/ack.pir 11
Ack(3, 11) = 16381
real 0m0.748s
$ time ./parrot -Oc -Sj examples/shootout/ack.pir 11
Ack(3, 11) = 16381
real 0m0.800s
The optimized PIR from the email is 0.5 sec, the C code takes 0.9 secs,
all with AMD X2@2000.
Parrot Revision: 11505
leo
Leo forgot the -j flag, which enables JIT, in his post. It makes quite
a difference:
Parrot w/o JIT: time ./parrot -C -Oc examples/shootout/ack.pir 11
Ack(3, 11) = 16381
real 7m11.365s
user 7m11.066s
sys 0m0.168s
Parrot w/ JIT: time ./parrot -Cj -Oc examples/shootout/ack.pir 11
Ack(3, 11) = 16381
real 0m3.502s
user 0m3.477s
sys 0m0.021s
GCC 4.0: time ./ack 11
Ack(3,11): 16381
real 0m1.960s
user 0m1.948s
sys 0m0.003s
I didn't use the custom PIR he posted (which is faster), so Parrot
didn't beat the GCC code.
--
matt diephouse
http://matt.diephouse.com