Parrot vs NekoVM

Nicolas Cannasse

unread,

Feb 28, 2006, 6:09:59 AM2/28/06

to perl6-i...@perl.org

Hi list,

Yesterday I did a quick fib(30) benchmark comparing Parrot Win32 daily
build (using jit core) and NekoVM (http://nekovm.org). The results are
showing that Parrot is 5 times slower than Neko (see my blog post on
this point there : http://ncannasse.free.fr/?p=66).

I would like to understand why there is a so much big speed difference
while NekoVM does not have JIT yet. Is there a semantical difference for
integer calculus or is it just a VM implementation issue ?

Best,
Nicolas

Leopold Toetsch

unread,

Feb 28, 2006, 9:12:14 AM2/28/06

to Nicolas Cannasse, perl6-i...@perl.org

On Feb 28, 2006, at 12:09, Nicolas Cannasse wrote:

> Yesterday I did a quick fib(30) benchmark comparing Parrot Win32 daily
> build (using jit core) and NekoVM (http://nekovm.org). The results are
> showing that Parrot is 5 times slower than Neko (see my blog post on
> this point there : http://ncannasse.free.fr/?p=66).

Benchmarks are just damn lies ;)

$ time ./parrot -j fib.pir 30
Fib(30): 1346269

real 0m4.774s

Ok that's slow (AMD X2@2000, unoptimized parrot build), you are right.
But:

$ time ./parrot -Cj fib.pir 30
Fib(30): 1346269

real 0m0.036s

$ time ./parrot -Cj fib.pir 38
Fib(38): 63245986

real 0m0.675s

$ time ./parrot -Cj fibn.pir 38
Fib(38.0): 63245986.0

real 0m1.475s

The rather slow function call performance is coming from a complex call
frame creation and flexible argument passing. The whole code involved
with calls/returns isn't optimized either.

The '-Cj' runtime options tries to compile simple subs to native
assembler code (and obviously succeeds here ;)

leo

Jonathan Worthington

unread,

Feb 28, 2006, 1:37:34 PM2/28/06

to Nicolas Cannasse, perl6-i...@perl.org

"Nicolas Cannasse" <ncan...@motion-twin.com> wrote:
>
> Yesterday I did a quick fib(30) benchmark comparing Parrot Win32 daily
> build (using jit core)

I'm guessing that's the build that I'm to blame for, and it's maybe worth
pointing out that it ain't an optimized build. But I think leo supplied the
real reason for the slowness - calling isn't that fast yet (and isn't
optimized, and where it can be optimized that's not enabled by default -
yet).

Jonathan

Nicolas Cannasse

unread,

Feb 28, 2006, 1:59:40 PM2/28/06

to Leopold Toetsch, perl6-i...@perl.org

>
> On Feb 28, 2006, at 12:09, Nicolas Cannasse wrote:
>
>> Yesterday I did a quick fib(30) benchmark comparing Parrot Win32 daily
>> build (using jit core) and NekoVM (http://nekovm.org). The results are
>> showing that Parrot is 5 times slower than Neko (see my blog post on
>> this point there : http://ncannasse.free.fr/?p=66).
>
>
> Benchmarks are just damn lies ;)

Not exactly lies, but they are quite different from reality ;)

> The '-Cj' runtime options tries to compile simple subs to native
> assembler code (and obviously succeeds here ;)

Yes I understand that there is different cores for Parrot, but what are
the flags appropriate for doing some comparisons ? Some flags might do
very good in some cases and quite bad in some others. They might also
take tremendous time to JIT the code, hence not being usable on larger
applications.

Is there one single config that the Parrot team is focusing on bringing
to 1.0 ?

Nicolas

Joshua Isom

unread,

Feb 28, 2006, 3:10:10 PM2/28/06

to Nicolas Cannasse, Perl 6 Internals

The main flag sets for speed are -C, -Cj, -S, -Sj, -j, and sometimes
adding -Oc as well. On ppc, -C and -Cj are often the fastest. On x86,
-j is most often the fastest. But here's the cavaet, to use JIT, you
of course need someone to port it to that arch. With -C, your compiler
has to support some of the source code features of the CGP core, which
gcc does. In general, as far as I know anyway, -C will be your generic
fast runcore. If your application's actually time-able, as in it won't
continuously run, you can just test various runcores with your
platform. But the way your program is coded has an impact. You can
write a program/subroutine to be very optimized for easy jitting, or
write it in a simpler and quicker to develop manner. And some of the
opcodes used just aren't jitted for one reason or another, which kind
of stops the jitability of that sub. There is no one fastest runcore
for all situations(although -t7 is probably the slowest). It depends
on the PIR and the architecture.

On Feb 28, 2006, at 12:59 PM, Nicolas Cannasse wrote:

> [...]

Joshua Juran

unread,

Mar 1, 2006, 6:11:48 AM3/1/06

to Perl 6 Internals

On Feb 28, 2006, at 1:59 PM, Nicolas Cannasse wrote:

>> On Feb 28, 2006, at 12:09, Nicolas Cannasse wrote:
>>
>>> Yesterday I did a quick fib(30) benchmark comparing Parrot Win32
>>> daily
>>> build (using jit core) and NekoVM (http://nekovm.org). The
>>> results are
>>> showing that Parrot is 5 times slower than Neko (see my blog post on
>>> this point there : http://ncannasse.free.fr/?p=66).
>>
>> Benchmarks are just damn lies ;)

Worse than that...

> Not exactly lies, but they are quite different from reality ;)

...they're statistics. :-)

(With apologies to Mark Twain.)

Josh

Leopold Toetsch

unread,

Mar 4, 2006, 12:02:50 PM3/4/06

to Nicolas Cannasse, perl6-i...@perl.org

On Feb 28, 2006, at 19:59, Nicolas Cannasse wrote:

> Yes I understand that there is different cores for Parrot, but what are
> the flags appropriate for doing some comparisons ? Some flags might do
> very good in some cases and quite bad in some others. They might also
> take tremendous time to JIT the code, hence not being usable on larger
> applications.

The time needed to JIT some code isn't the big deal, the more that the
plan is to recompile heavily used (or small) subroutines only, and not
whole applications. The currently exisiting JIT code (-j) is compiling
a whole file. The new (and still experimental) -{S,C}j options will
recompile code per subroutine or even per basic block for the direct
threaded run core.

> Is there one single config that the Parrot team is focusing on bringing
> to 1.0 ?

A single option can't cover all possible usage of Parrot. But it
basically boils down to two options:
-Os ... optimize for size (multiple processes of Parrot with shared
code)
-Ot ... use more memory to run faster

> Nicolas

leo

Nicolas Cannasse

unread,

Mar 4, 2006, 12:05:56 PM3/4/06

to Leopold Toetsch, perl6-i...@perl.org

> $ time ./parrot -j fib.pir 30
> Fib(30): 1346269
>
> real 0m4.774s
>
> Ok that's slow (AMD X2@2000, unoptimized parrot build), you are right. But:
>
> $ time ./parrot -Cj fib.pir 30
> Fib(30): 1346269

-Cj does not produce different results than -j on the Win32 build of
Parrot. Is -Cj supported on this architecture ?

Nicolas

Leopold Toetsch

unread,

Mar 5, 2006, 12:20:20 PM3/5/06

to Nicolas Cannasse, perl6-i...@perl.org

On Mar 4, 2006, at 18:05, Nicolas Cannasse wrote:

> -Cj does not produce different results than -j on the Win32 build of
> Parrot. Is -Cj supported on this architecture ?

Yes, it should work. It might depend on, how fib is actually written in
PIR. As said this option is in a rather early state. Compiling a
subroutine to machine code is currently only done, if all Parrot
registers are fitting into CPU registers. The fib function below is
working here on x86/linux.

Mmm - actually -C needs computed goto, which isn't supported by all C
compilers. You can try:

$ ./parrot -Sj fib.pir 38
Fib(38): 63245986

> Nicolas

leo

.sub main :main
.param pmc argv
.local int argc, n
argc = argv
n = 1
unless argc == 2 goto argsok
$S0 = argv[1]
n = $S0
argsok:
$P0 = getinterp
$P0.'recursion_limit'(100000)

.local pmc array
array = new .FixedFloatArray
array = 2

$I1 = FibInt(n)
array[0] = n
array[1] = $I1

$S0 = sprintf <<"END", array
Fib(%d): %d
END
print $S0
.end

.sub FibInt
.param int n
unless n < 2 goto endif
.return(1)
endif:
.local int tmp
tmp = n - 2
$I0 = FibInt(tmp)
tmp = n - 1
tmp = FibInt(tmp)
$I0 += tmp
.return($I0)
.end

Jonathan Worthington

unread,

Mar 5, 2006, 12:33:27 PM3/5/06

to Leopold Toetsch, Nicolas Cannasse, perl6-i...@perl.org

"Leopold Toetsch" <l...@toetsch.at> wrote:
> Mmm - actually -C needs computed goto, which isn't supported by all C
> compilers.

Including the one that I produce the Win32 builds that I believe were being
tested with (MS Visual C++). Shouldn't it give a "we don't have a computed
goto runcore" error though when you supply -C on platforms without it?

Jonathan

Leopold Toetsch

unread,

Mar 5, 2006, 3:02:13 PM3/5/06

to Jonathan Worthington, Perl 6 Internals

Definitely yes. What does happen currently, when running 'parrot -C' or
'parrot -Cj' with msvc?

> Jonathan

leo