Hi,
A long while back someone made a request for people to create parrot
implementations of the shootout tests on Alioth (http://
shootout.alioth.debian.org). I wanted to let you know that I have
updated the shootout build machine with parrot (0.3.1) and can start
running tests if any exist.
Please let me know off-list, or use the project tracker (e.g.,
https://alioth.debian.org/tracker/?
func=add&group_id=30402&atid=411646) to add a tarball with the tests
Let the games begin!
Thanks,
- - -Brent
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (Darwin)
iD8DBQFDl8pqzGDdrzfvUpURAmjHAJ9L16EATSUQiIzhQrCZldzetmQimwCfVeRV
MgciumZq8eyJ13xThOlOri8=
=p+UA
-----END PGP SIGNATURE-----
A few weeks ago I wrote Brent's "Ackermann" benchmark in Amber for
Parrot. Ackermann is a heavily recursive benchmark.
Brent reports benchmarks for Ack(3, 7), Ack(3, 8) and Ack(3, 9).
Unfortunately I could only get to Ack(3, 6) before parrot aborted with
"maximum recursion depth exceeded", at recursion depth 1000.
The recursion limit of 1000 is assigned in inter_create.c and seems
somewhat arbitrary. Is it likely to be raised in the future?
Regards,
Roger Browne
PS: I have appended the source code in case anyone is interested:
-- Computes the value of Ack(3,n), which is 2^(n+3)-3
-- Specify 'n' on the command line (defaults to 1).
-- As used in the "Computer Language Shootout" benchmarks:
--
http://shootout.alioth.debian.org/sandbox/benchmark.php?test=ackermann
-- See also: http://en.wikipedia.org/wiki/Ackermann_function (Wikipedia)
-- Performance results, 2005-11-22 using Amber 0.3.1, Fedora Core 4
Linux
-- on a 3.2GHz Pentium:
-- Ack(3,0): 5 -- in 0.156s
-- Ack(3,1): 13 -- in 0.156s
-- Ack(3,2): 29 -- in 0.168s
-- Ack(3,3): 61 -- in 0.200s
-- Ack(3,4): 125 -- in 0.348s
-- Ack(3,5): 253 -- in 0.876s
-- Ack(3,6): 509 -- in 3.040s
-- Ack(3,7) and above failed: "maximum recursion depth exceeded"
-- Above times include Amber-to-PIR compilation of about 0.1 second.
args := ARGUMENTS
if args.count = 1 then
num := args.item(1) -- Take 'n' from the command line
else
num := 1 -- or use default of 1 for 'n'.
end
print("Ack(3,")
print(num)
print("): ")
print_line(ack(3, num))
private
args -- Command-line arguments
num -- : INTEGER -- Ackermann number
ack(m, n) -- m: INTEGER; n: INTEGER; returns INTEGER
do
result := if m = 0 then
n + 1
elseif n = 0 then
ack(m - 1, 1)
else
ack(m - 1, ack(m, (n - 1)))
end
end
end
> The recursion limit of 1000 is assigned in inter_create.c and seems
> somewhat arbitrary. Is it likely to be raised in the future?
No. But you can raise it in the code ;-)
$P0 = getinterp
$P0."recursion_limit"(10000)
> Regards,
> Roger Browne
>
> PS: I have appended the source code in case anyone is interested:
Optimized build? Which runcore? Currently parrot -C may perform fastest.
leo
As it happens, a recursion limit of 10000 is enough to complete the
Ackermann benchmark.
> Optimized build? Which runcore? Currently parrot -C may perform fastest.
With Parrot 0.4.0, no optimizations, default runcore, Ack(3, 9) takes
233 seconds of CPU time. With "-C" this reduces to 205 seconds. I
haven't tried an optimized Parrot build (so many things to do, so little
time...).
On the same machine, Python 2.4.1 computes Ack(3, 9) in 12 seconds, and
SmartEiffel computes it in something under 0.05 second.
I'm sure that this result reflects poor PIR generation by the Amber
compiler, rather than the performance of the Parrot VM. I'm happy
enough, at this stage, that the program prints the correct result.
Regards,
Roger Browne
Ack(3,6): 509 0.70313 seconds
Ack(3,9): 4093 50.51184 seconds
Still not amazing, but I'm sure there's performance to be squeezed out yet.
> With some adjustments to the compiler I've gotten the times down to:
>
> Ack(3,6): 509 0.70313 seconds
> Ack(3,9): 4093 50.51184 seconds
>
> Still not amazing, but I'm sure there's performance to be squeezed out yet.
Does your compiled code use PMC Integers or native ints? (I'm using
PMCs).
Regards,
Roger Browne
> With Parrot 0.4.0, no optimizations, default runcore, Ack(3, 9) takes
> 233 seconds of CPU time. With "-C" this reduces to 205 seconds. I
> haven't tried an optimized Parrot build (so many things to do, so little
> time...).
>
> On the same machine, Python 2.4.1 computes Ack(3, 9) in 12 seconds, and
> SmartEiffel computes it in something under 0.05 second.
I've now checked in ack.pir and ack.py into examples/benchmarks. And
I've timed Ack(3, 9) with an optimized Parrot build:
Python 13.7
Parrot -j 15.3
Parrot -C 13.8
We are at python speed for function call speed currently.
This is a AMD X2 3800+ running at 2000 Mhz. It's BTW not easy to run
benchmarks on that machine, the first few timings where 23-26 secs,
obviously PowerNow had reduced cpu freq to ~50%.
> Regards,
> Roger Browne
leo
Wow, that's nice tight code for something that's spat out by a compiler.
Well done Shane!
I've appended Amber's generated PIR to this message. No need to tell me
all the ways that it can be improved, because I can already see plenty!
I've shown just the 'ack' method, but the compiler also puts some
library code into the PIR file.
Regards,
Roger Browne
-- ack(m, n)
-- do
-- result := if m = 0 then
-- n + 1
-- elseif n = 0 then
-- ack(m - 1, 1)
-- else
-- ack(m - 1, ack(m, (n - 1)))
-- end
-- end
.sub "ack" :method
.param pmc m
.param pmc n
.check_private()
.local pmc result
$P2 = m
$I0 = find_type 'Amber_INTEGER'
new $P3, $I0, '0'
iseq $I1, $P2, $P3
new $P1, 'Amber_BOOLEAN'
set $P1, $I1
unless $P1, ELSEIF8b14470
$P2 = n
$I0 = find_type 'Amber_INTEGER'
new $P3, $I0, '1'
$P1 = n_add $P2, $P3
goto END8b14470
ELSEIF8b14470:
$P2 = n
$I0 = find_type 'Amber_INTEGER'
new $P3, $I0, '0'
iseq $I1, $P2, $P3
new $P1, 'Amber_BOOLEAN'
set $P1, $I1
unless $P1, ELSE8b14470
$P3 = m
$I0 = find_type 'Amber_INTEGER'
new $P4, $I0, '1'
$P2 = n_sub $P3, $P4
$I0 = find_type 'Amber_INTEGER'
new $P3, $I0, '1'
.set_private()
$P1 = self."ack"($P2, $P3)
.unset_private()
goto END8b14470
ELSE8b14470:
$P3 = m
$I0 = find_type 'Amber_INTEGER'
new $P4, $I0, '1'
$P2 = n_sub $P3, $P4
$P4 = m
$P6 = n
$I0 = find_type 'Amber_INTEGER'
new $P7, $I0, '1'
$P5 = n_sub $P6, $P7
.set_private()
$P3 = self."ack"($P4, $P5)
.unset_private()
.set_private()
$P1 = self."ack"($P2, $P3)
.unset_private()
END8b14470:
result = $P1
.return(result)
.end
>
> Does your compiled code use PMC Integers or native ints? (I'm using
> PMCs).
>
> Regards,
> Roger Browne
My goal is to have the compiled code as simple as possible, so the compiler
uses native ints and strings if it can.
I also upgraded the generated code for 'if' and 'elseif' statements to
optimize to the 'ne', 'eq', 'lt', and 'gt' opcodes if possible (instead of
calling a function to compare two values, storing the result, and testing it
for trueness). I made a similar change where addition/subtraction is done
as well (instead of calling a function to add two values and re-storing the
result, it optimizes to the 'inc' and 'dec' opcodes).
> You can see the generated PIR here: http://theguildforge.com/ack.pir
Indeed nice code already. You can also get rid of "subtract", "concat"
et al by using the n_infix opcodes:
$ cat x.pir
.sub main :main
.local pmc x,y,z,s
x = new .Integer
y = new .Integer
y = -3
z = n_sub x, y # create new lhs
print_item z
s = n_concat x, y # same
print_item s
print_newline
.end
or automatically:
$ cat x.pir
.pragma n_operators 1
.sub main :main
.local pmc x,y,z,s
x = new .Integer
y = new .Integer
y = -3
z = x - y # create new lhs
print_item z
s = x . y # same
print_item s
print_newline
.end
$ ./parrot x.pir
3 0-3
leo
> I've now checked in ack.pir and ack.py into examples/benchmarks. And
> I've timed Ack(3, 9) with an optimized Parrot build:
>
> Python 13.7
> Parrot -j 15.3
> Parrot -C 13.8
With recent parrot (r10436) and mostly due to this patch ...
$ diff ack0.pir ack.pir
39c39,40
< .return ack($I1, 1)
---
> $I0 = 1
> .return ack($I1, $I0)
... i.e. using the same call signature for all ack() calls, I'm now at:
Parrot -C 9.0 secs
Getting a similar optimization into Parrot itself is a bit beyond
current goals (albeit simple), but the numbers are indicating that the
new calling conventions are a lot faster than old.
leo
Anyway, I've attached the file in case anyone else has better luck.
I'm using parrot r10431 at the moment on OS X, and 0.3.1 on FreeBSD.
For smaller numbers, it's usable. With twelve, I get around 30 seconds
to a minute for both versions/archs. From my rough numbers of it, it
should only take about about 15 minutes, but the memory constraint
slows it down for me.
I notice that a lot of the code pertains to lines 42-47, creating an
array and storing values in it. But by far, the biggest part of the
execution is tied in calling subroutines and returning. With an
argument of 8, invokecc(and it's ilk) are called 100701 times. Either
there's a problem with the code that I'm not aware of, or you have a
great new test for debugging the gc...
Actually it took 1.3 GB of RAM to finish. Woot. And Chip is to blame
this time not me :)
PMC_cont_ASSIGN(SELF, new_foo(something));
did evaluate the new_foo twice i.e. each return continuation helper
struct was allocate twice (and freed once).
Now it finishes in 2m44 [1] taking 53 M RAM for N=16. Still too slow
with too much memory used.
I've delete the whole del_tree stuff too, it might just slows things down.
> I notice that a lot of the code pertains to lines 42-47, creating an
> array and storing values in it. But by far, the biggest part of the
> execution is tied in calling subroutines and returning.
Yep. 3 tips:
a) use Parrot -C
b) use {Fixed,Resizable}PMCArray instead of Array
c) always use the same call/return signature
e.g. don't mix ack(x,y) and ack(x, 1)
that is get rid of the constant by using a temp $I1 = 1 for the
2nd case
[1] It takes now 2m11 with c) done too
Attached is my version of biarytree.pir.
leo
> I've timed Ack(3, 9) with an optimized Parrot build:
>
> Python 13.7
> Parrot -j 15.3
> Parrot -C 13.8
Down now (r10445) at:
parrot -C ack.pir 5.7s
parrot -C binarytrees 16 1m14s
That is a little code cleanup was good for 100% speedup ;-)
> This is a AMD X2 3800+ running at 2000 Mhz. It's BTW not easy to run
> benchmarks on that machine, the first few timings where 23-26 secs,
> obviously PowerNow had reduced cpu freq to ~50%.
Well powersave -f helps a lot,
leo
> Leopold Toetsch wrote:
>
>> I've timed Ack(3, 9) with an optimized Parrot build:
>> Python 13.7
>> Parrot -j 15.3
>> Parrot -C 13.8
>
> Down now (r10445) at:
>
> parrot -C ack.pir 5.7s
> parrot -C binarytrees 16 1m14s
./parrot -C ack.pir 4.9s
./parrot -C binarytrees.pir 16 59.1s
I'm stopping the optimization game now, before we are at incredible
speed and/because the low hanging fruit is already consumed.
All tests with Configure.pl --optimize (it doesn't make sense to run
comparable bechmarks w/o -Ox) on linux/x86 with a 2 GHz athlon w 512K
cache.
leo
> ./parrot -C ack.pir 4.9s
> ./parrot -C binarytrees.pir 16 59.1s
And another f'up me: should we collect these shootout benchmarks in a
separate directory, with tests attached (with reduced N aka reduced
runtime)?
Are there plans to submit these tests?
leo
> Unfortunately I could only get to Ack(3, 6) before parrot aborted with
> "maximum recursion depth exceeded", at recursion depth 1000.
Alternative:
#!/usr/bin/perl
use strict;
use warnings;
use Memoize;
{ local ($,, $\) = ("\t", "\n");
sub ack {
return $_[1] +1 if 0 == $_[0];
return ack($_[0] -1, 1) if 0 == $_[1];
return ack($_[0] -1, ack($_[0], $_[1] -1));
}
memoize('ack');
my $base = 3;
print "Ack($base, $_): ", ack($base, $_) for (0..13);
}
--
Affijn, Ruud
"Gewoon is een tijger."
> Roger Browne:
>
>> Unfortunately I could only get to Ack(3, 6) before parrot aborted with
>> "maximum recursion depth exceeded", at recursion depth 1000.
>
>
> Alternative:
> use Memoize;
Sure. And there is a (inline) memoized perl Ack already, which is one
of the fastest of all tests submitted.
The goal of these benchmarks is to run 'as is' with the same algorithm.
leo
From the faq...
Please will you include my favourite language?
Maybe we will when you write 15 of the benchmark programs in your
favourite language, and contribute them to "The Computer Language
Shootout" :-)
From the all benchmarks page...
And remember, languages that implement more benchmarks will move to the
top of the list, and those with many missing benchmarks (No Program,
Error, Timeout) will stay at the bottom!
So why submit until there's a reasonable collection? Oh, attached is
fannkuch, works fairly well. Only about four seconds for me, three
with an optimized build. And given that their machine's faster(and x86
linux as opposed to my ppc OSX), I think it's reasonable to say that
parrot beats smalltalk gst.
I'll probably write more, it's somewhat random which order I go in...
A startup program's easy to do... Case sensitive, etc...
.sub main :main
print "hello world\n"
.end
>>> Unfortunately I could only get to Ack(3, 6) before parrot aborted
>>> with "maximum recursion depth exceeded", at recursion depth 1000.
>>
>> Alternative: use Memoize;
>
> Sure. And there is a (inline) memoized perl Ack already, which is one
> of the fastest of all tests submitted.
>
> The goal of these benchmarks is to run 'as is' with the same
> algorithm.
Define "the same algorithm".
(Some language could memoize when it is not explicitly told so.)
If there would be a "volatile" attribute for sub, to define that the sub
is non-memoizable (or that its runs are not cacheable), than the
'algorithm' would not need the "use Memoize" and "memoize('ack')" lines.
Well, to be honest I've been adding them as you've posted them here
or to SVN. If anyone objects to this, please let me know and I'll
remove them immediately. I assume you are okay with them being
posted to the shootout website under a BSD-style license.
Thanks,
- -Brent
P.S. If you want to see how things look, please visit http://
shootout.alioth.debian.org/sandbox/parrot.php
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (Darwin)
iD8DBQFDnTT9zGDdrzfvUpURAjkxAJ97dnUHNqbDh/JOSWuzqZkO1ZUOnQCeJN5G
qCavfT3O7kQCZFYosiwG/68=
=Cuk3
-----END PGP SIGNATURE-----
> Well, to be honest I've been adding them as you've posted them here or
> to SVN. If anyone objects to this, please let me know and I'll remove
> them immediately. I assume you are okay with them being posted to the
> shootout website under a BSD-style license.
Ah good. But that needs a few tweaks. For most, parrot needs an
appropriate runcore flag:
parrot -C ack.pir 9
-C binarytrees.pir 16
-j fannkuch.pir 9
For some tests it might be desirable to compile to bytecode too (as
e.g. python is doing)
parrot -o fannkuch.pbc fannkuch.pir
parrot -j fannkuch.pbc 9
And: Is that Parrot built optimized?
> Thanks,
>
> - -Brent
>
> P.S. If you want to see how things look, please visit
> http://shootout.alioth.debian.org/sandbox/parrot.php
The fannkuch.pir is here more than 10 times the speed of python (0.3s
vs 3.9s).
leo
> ... Oh, attached is fannkuch, works fairly well. Only about four
> seconds for me, three with an optimized build. And given that their
> machine's faster(and x86 linux as opposed to my ppc OSX), I think it's
> reasonable to say that parrot beats smalltalk gst.
./parrot -j fannkuch.pbc 9 0.29s
python fannkuch.py 9 3.9s
> I'll probably write more, it's somewhat random which order I go in...
Great, thanks.
leo
> Let the games begin!
Another one: examples/shootout/pidigits.pir
leo
Just curious, but why is mono at .38 seconds and 10.00 seconds respectively?
What in the .NET implementation makes recursive calls so fast?
Ed
> Just curious, but why is mono at .38 seconds and 10.00 seconds
> respectively?
> What in the .NET implementation makes recursive calls so fast?
Parrot function call and return sequence isn't really optimized yet.
Currently I'm happy to be faster than fastest_of(perl, python, ruby).
And don't forget that comparing with static languages (that can compile
even function calls to machine code) isn't really fair.
> Ed
leo
> Let the games begin!
Another one has hit the svn repo: examples/shootout/nsieve-bits.pir
Actually there are 2 versions: examples/shootout/nsieve-bits-2.pir
The former cheats a bit by setting bits, the 2nd resets bits as of
specs.
leo
.sub main :main
.local pmc array
array = new .FixedPMCArray
array = 3
.local pmc tmp
tmp = new .String
tmp = "zero"
array[0] = tmp
tmp = new .String
tmp = "one"
array[1] = tmp
tmp = new .String
tmp = "two"
array[2] = tmp
sort sortsub, array
$P0 = array[0] # one
print $P0
print "\n"
$S0 = array[1] # two
print $P0
print "\n"
$S0 = array[2] # zero
print $P0
print "\n"
.end
.sub sortsub
.param pmc a
.param pmc b
cmp $I0, a, b
.return($I0)
.end
Using pmcs would allow things such as sorting arrays of arrays.
Hopefully I can get knucleotide optimized in other ways... But it's
working.
The formatting on this is off. It looks like it's generating the
sequence of digits, but it needs to be split into individual lines
(see the Perl example program): http://shootout.alioth.debian.org/
sandbox/benchmark.php?test=pidigits&lang=perl&id=0
Thanks,
- -Brent
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (Darwin)
iD8DBQFDnlRAzGDdrzfvUpURAiRdAJ9Dn5gVxmeuyb6cA0d2Mm8X3a/dwQCdFGOu
kQn1VNM44oPyNUVsbmsn4x0=
=8yxe
-----END PGP SIGNATURE-----
$ perldoc src/classes/fixedpmcarray.pmc
"METHOD void sort(PMC* cmp_func)"
Sort this array, optionally using the provided cmp_func
The sort is also inherited by ResizableArrayPMC.
$ grep -w sort t/pmc/*array*.t
...
HTH
leo
> On Dec 12, 2005, at 12:23 PM, Leopold Toetsch wrote:
>> Another one: examples/shootout/pidigits.pir
>
> The formatting on this is off. It looks like it's generating the
> sequence of digits, but it needs to be split into individual lines
The program now (r10484) formats the last line too, if N % 10.
> Thanks,
>
> - -Brent
leo