Fwd: Python running fast on .NET

Joe Wilson

unread,

Dec 21, 2003, 10:45:33 PM12/21/03

to perl6-i...@perl.org

Perhaps some of you may be interested in this
entry from Miguel de Icaza's web log:

Python running fast on .NET
http://primates.ximian.com/~miguel/archive/2003/Dec-09.html

Wasn't there supposed to be a Python/Parrot challenge this month?
If so, who won and where are the results?

__________________________________
Do you Yahoo!?
New Yahoo! Photos - easier uploading and sharing.
http://photos.yahoo.com/

Dan Sugalski

unread,

Dec 22, 2003, 10:35:02 AM12/22/03

to Joe Wilson, perl6-i...@perl.org

At 7:45 PM -0800 12/21/03, Joe Wilson wrote:
>Perhaps some of you may be interested in this
>entry from Miguel de Icaza's web log:
>
>Python running fast on .NET
>http://primates.ximian.com/~miguel/archive/2003/Dec-09.html

Yeah, but alas Miguel's mis-informed. A reasonable reimplementation
of core python (without all the grotty bits that arguably throw in
the huge speed hit) should run that benchmark at about 20x python's
base speed, and the parts of Python that will give .NET serious fits
haven't been implemented. Python's method semantics don't match
.NET's semantics in a number of performance-unpleasant ways.

>Wasn't there supposed to be a Python/Parrot challenge this month?

Nope. The python source and bytecode for the challenge is due by the
end of the month, with the actual challenge taking place at OSCON
2004.
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Leopold Toetsch

unread,

Dec 22, 2003, 4:12:32 PM12/22/03

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:

> Nope. The python source and bytecode for the challenge is due by the
> end of the month

... which isn't many days off. Do you have the benchmarks already?

leo

Dan Sugalski

unread,

Dec 22, 2003, 4:20:38 PM12/22/03

to l...@toetsch.at, perl6-i...@perl.org

Nope. I'm waiting on Guido.

Joe Wilson

unread,

Dec 22, 2003, 5:28:03 PM12/22/03

to Dan Sugalski, perl6-i...@perl.org

Grotty bits? Can you be more specific?
What Python features or idioms do you believe Parrot will run faster
than the CLR?

--- Dan Sugalski <d...@sidhe.org> wrote:
> Yeah, but alas Miguel's mis-informed. A reasonable reimplementation
> of core python (without all the grotty bits that arguably throw in
> the huge speed hit) should run that benchmark at about 20x python's
> base speed, and the parts of Python that will give .NET serious fits
> haven't been implemented. Python's method semantics don't match
> .NET's semantics in a number of performance-unpleasant ways.

Dan Sugalski

unread,

Dec 23, 2003, 9:39:56 AM12/23/03

to Joe Wilson, perl6-i...@perl.org

At 2:28 PM -0800 12/22/03, Joe Wilson wrote:
>Grotty bits? Can you be more specific?
>What Python features or idioms do you believe Parrot will run faster
>than the CLR?

Amongst other things, python allows for dynamic addition and deletion
of object attributes (or what we're calling attributes--per-object
slot variables--I think .NET calls them properties) and the dynamic
addition, modification, and deletion of methods.

The method changing stuff is the big spot that'll cause trouble. .NET
allows for methods to be added, but once added they can't be changed
or deleted. The python folks say they do some other really odd
low-level dispatching things that'll cause trouble, but they were a
bit vague when this came up a week or two ago on the python-dev list.

>--- Dan Sugalski <d...@sidhe.org> wrote:
>> Yeah, but alas Miguel's mis-informed. A reasonable reimplementation
>> of core python (without all the grotty bits that arguably throw in
>> the huge speed hit) should run that benchmark at about 20x python's
>> base speed, and the parts of Python that will give .NET serious fits
>> haven't been implemented. Python's method semantics don't match
>> .NET's semantics in a number of performance-unpleasant ways.
>
>
>__________________________________
>Do you Yahoo!?
>New Yahoo! Photos - easier uploading and sharing.
>http://photos.yahoo.com/

Joe Wilson

unread,

Dec 24, 2003, 1:28:09 AM12/24/03

to Dan Sugalski, perl6-i...@perl.org

In order to get the 20x speed gain you seek I assume
that Parrot would have to perform some sort of variable
type inference to distinguish, for example, when a
scalar is really just an integer and use an integer register.
Otherwise, the PMCs in Parrot would perform much the same
as the Python scalars (or whatever Python calls them).

The question is when would Parrot would perform this type
inference and subsequent bytecode transformation? At bytecode
load time or at runtime?

Jeff Clites

unread,

Dec 24, 2003, 2:52:48 AM12/24/03

to Joe Wilson, Dan Sugalski, perl6-i...@perl.org

I don't think that Dan meant that Python-on-Parrot would be 20x faster.
He's saying that it's easy to speed up Python if you leave out the slow
parts, and that the Python-on-.NET implementation has left out the slow
parts so far. So when it's complete, Python-on-.NET will end up slower
than "regular" Python. (That's my interpretation of what he's saying.)

JEff

Dan Sugalski

unread,

Dec 24, 2003, 8:58:06 AM12/24/03

to Joe Wilson, perl6-i...@perl.org

At 10:28 PM -0800 12/23/03, Joe Wilson wrote:
>In order to get the 20x speed gain you seek I assume
>that Parrot would have to perform some sort of variable
>type inference to distinguish, for example, when a
>scalar is really just an integer and use an integer register.
>Otherwise, the PMCs in Parrot would perform much the same
>as the Python scalars (or whatever Python calls them).

No, actually. Most of that speedup can come from a better
runloop--python's core loop's rather inefficient. Full-on type
inferencing and whatnot'd likely get you a larger speedup.

>The question is when would Parrot would perform this type
>inference and subsequent bytecode transformation? At bytecode
>load time or at runtime?
>
>> >--- Dan Sugalski <d...@sidhe.org> wrote:
>> >> Yeah, but alas Miguel's mis-informed. A reasonable reimplementation
>> >> of core python (without all the grotty bits that arguably throw in
>> >> the huge speed hit) should run that benchmark at about 20x python's
>> >> base speed, and the parts of Python that will give .NET serious fits
>> >> haven't been implemented. Python's method semantics don't match
>> >> .NET's semantics in a number of performance-unpleasant ways.
>
>
>
>__________________________________
>Do you Yahoo!?
>New Yahoo! Photos - easier uploading and sharing.
>http://photos.yahoo.com/

Joe Wilson

unread,

Dec 24, 2003, 2:00:56 PM12/24/03

to d...@sidhe.org, perl6-i...@perl.org

Even with a zero overhead runloop a 20 times speed improvement
in running typical non-trivial Python programs is simply not possible.
It's not like the Python opcodes perform no work at all:

Performance Measurements for Pystone
http://zope.org/Members/jeremy/CurrentAndFutureProjects/pystone

The overhead per Python opcode is around 50% or less.
And for the purpose of this discussion we'll ignore Python's
BINARY_MULTIPLY opcode working on 200 digit numbers
where the runloop overhead is insignificant compared to the
calculation itself.

I can see a 2 or 3 times speed improvement at most
with only a better runloop. This is still very good.
But for better timings Parrot would have to use type inference
analysis to avoid the overhead of PMCs in favor of builtin types
wherever possible.

--- Dan Sugalski <d...@sidhe.org> wrote:
> At 10:28 PM -0800 12/23/03, Joe Wilson wrote:
> >In order to get the 20x speed gain you seek I assume
> >that Parrot would have to perform some sort of variable
> >type inference to distinguish, for example, when a
> >scalar is really just an integer and use an integer register.
> >Otherwise, the PMCs in Parrot would perform much the same
> >as the Python scalars (or whatever Python calls them).
>
> No, actually. Most of that speedup can come from a better
> runloop--python's core loop's rather inefficient. Full-on type
> inferencing and whatnot'd likely get you a larger speedup.
>

Dan Sugalski

unread,

Dec 24, 2003, 2:29:04 PM12/24/03

to Joe Wilson, perl6-i...@perl.org

At 11:00 AM -0800 12/24/03, Joe Wilson wrote:
>Even with a zero overhead runloop a 20 times speed improvement
>in running typical non-trivial Python programs is simply not possible.
>It's not like the Python opcodes perform no work at all:
>
> Performance Measurements for Pystone
> http://zope.org/Members/jeremy/CurrentAndFutureProjects/pystone

This'll quickly head down to a "Yes, it can!" "No, it can't!" sort of
argument, so it's probably not worth going much further without
actual benchmarks, but I'll say that I've good reason to not be
surprised at the speedup, and I don't think that it'll be at all
necessary to do type inferencing to get that sort of performance gain
in specialized circumstances, of which pystone definitely is.

>--- Dan Sugalski <d...@sidhe.org> wrote:
>> At 10:28 PM -0800 12/23/03, Joe Wilson wrote:
>> >In order to get the 20x speed gain you seek I assume
>> >that Parrot would have to perform some sort of variable
>> >type inference to distinguish, for example, when a
>> >scalar is really just an integer and use an integer register.
>> >Otherwise, the PMCs in Parrot would perform much the same
>> >as the Python scalars (or whatever Python calls them).
>>
>> No, actually. Most of that speedup can come from a better
>> runloop--python's core loop's rather inefficient. Full-on type
> > inferencing and whatnot'd likely get you a larger speedup.

--

Joe Wilson

unread,

Dec 27, 2003, 3:19:20 PM12/27/03

to perl6-i...@perl.org

I implemented the same variable argument function "varargs_adder"
in both Perl 5 (addit.pl) and Parrot (f4.pasm).
The variable arguments can be strings, integers or floats
(I wanted to excercise dynamic variable behavior).

I called the function 500000 times in a loop to benchmark it.
The results are not what I expected:

Perl 5.6.1: 2.79 seconds
Parrot (non jit) 5.04 seconds
Parrot (jit) 13.65 seconds

Also note that the jitted version produces the wrong result
(21001094.100000 versus 21001097.970000)

I also tried using invokecc/foldup for the varargs function
(not included) but the results were much slower still.

Parrot was built from latest CVS yesterday on Cygwin/Windows.

$ cat addit.pl
#!/usr/bin/perl
#
# addit.pl
#
use strict;
sub varargs_adder {
my $sum = 0;
for (my $a = $#_; $a >= 0; --$a) {
$sum += $_[$a];
}
return $sum
}
my $result = 0;
for (my $x = 500000; $x >= 0; --$x) {
$result = varargs_adder(1000, 7.100, 87, "3.87", "21000000");
}
print "$result\n";

$ time perl addit.pl
21001097.97

real 0m2.790s
user 0m2.796s
sys 0m0.015s

$ cat f4.pasm
#
# f4.pasm
#
_main:
set I9, 500000
AGAIN:
dec I9
lt I9, 0, FIN
new P5, .SArray
set P5, 5
push P5, 1000
push P5, 7.100
push P5, 87
push P5, "3.87"
push P5, "21000000"
bsr _varargs_adder
branch AGAIN
FIN:
print N0
print "\n"
end
_varargs_adder:
new P2, .PerlNum
assign P2, 0
set I1, P5
LOOP:
dec I1
lt I1, 0, DONE
set P1, P5[I1]
add P2, P2, P1
branch LOOP
DONE:
set N0, P2
ret

$ time parrot f4.pasm
21001097.970000

real 0m5.040s
user 0m5.046s
sys 0m0.015s

$ time parrot -j f4.pasm
21001094.100000

real 0m13.652s
user 0m13.655s
sys 0m0.015s

Dan Sugalski

unread,

Dec 27, 2003, 4:16:41 PM12/27/03

to Joe Wilson, perl6-i...@perl.org

At 12:19 PM -0800 12/27/03, Joe Wilson wrote:
>
>I implemented the same variable argument function "varargs_adder"
>in both Perl 5 (addit.pl) and Parrot (f4.pasm).
>The variable arguments can be strings, integers or floats
>(I wanted to excercise dynamic variable behavior).
>
>I called the function 500000 times in a loop to benchmark it.
>The results are not what I expected:
>
>Perl 5.6.1: 2.79 seconds
>Parrot (non jit) 5.04 seconds
>Parrot (jit) 13.65 seconds

Interesting. However... the two programs aren't equivalent. You're
using constant values and putting results onto an existing data
structure in perl, so true 'equivalence' requires a few changes to
the source. With those in place, I get the following:

Daoine:~/parrot dan$ time ./parrot ~/f4.pasm
21001097.970000

real 0m7.760s
user 0m4.710s
sys 0m0.110s
Daoine:~/parrot dan$ time perl ~/f4.pl
21001097.97

real 0m15.924s
user 0m10.460s
sys 0m0.170s

Though I can't say I'm particularly thrilled with Parrot's speed
here. (This is non-JITted, as I'm running it on my iBook and there's
no working JIT for me at the moment for some reason) Even allocating
the array new every time and using that new array gets:

Daoine:~/parrot dan$ time ./parrot ~/f4.pasm
21001097.970000

real 0m11.875s
user 0m2.960s
sys 0m0.720s

Still faster than perl but, again, I'm not happy at all with the results there.

Also, be aware that for these tests, there are two very important
things to note:

1) Perl's data structures (the equivalent of PMCs) been heavily
optimized, to date Parrot's haven't been
2) Parrot's Array and SArray values all accept mixed-type data, which
perl's arrays do *not* do, and as such have some extra speed hits
that perl arrays don't.

Having said that, you've hit on pretty fundamental problem with
SArray and Array. I think I may tackle point #2 in a bit, as it ought
to be reasonably straightforward to do, and see what falls out of it.

Dan Sugalski

unread,

Dec 27, 2003, 4:47:36 PM12/27/03

to Joe Wilson, perl6-i...@perl.org

At 4:16 PM -0500 12/27/03, Dan Sugalski wrote:
>1) Perl's data structures (the equivalent of PMCs) been heavily
>optimized, to date Parrot's haven't been
>2) Parrot's Array and SArray values all accept mixed-type data,
>which perl's arrays do *not* do, and as such have some extra speed
>hits that perl arrays don't.

And on top of that it turns out that the basic array class and
everything that inherits from it uses the sparse list code in list.c,
for even more slowdown. I'm impressed things are running as fast as
they are with all that overhead. All of which is good for the whole
"not dying because someone's done a $foo[time] access" but less good
when looking to move fast.

A more specialized, less flexible base array type is in order here, I think.

Joe Wilson

unread,

Dec 27, 2003, 5:18:11 PM12/27/03

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski:

> 2) Parrot's Array and SArray values all accept mixed-type data, which
> perl's arrays do *not* do, and as such have some extra speed hits
> that perl arrays don't.

What do you mean?
Perl's arrays do indeed accept mixed data types (see example below).

$ cat addit2.pl
#!/usr/bin/perl
#
# addit2.pl

#
use strict;
sub varargs_adder {
my $sum = 0;
for (my $a = $#_; $a >= 0; --$a) {
$sum += $_[$a];
}
return $sum
}
my $result = 0;

my @args;
$args[0] = 1000;
$args[1] = 7.100;
$args[2] = 87;
$args[3] = "3.87";
$args[4] = "21000000";

for (my $x = 500000; $x >= 0; --$x) {

$result = varargs_adder(@args);
}
print "$result\n";

$ time perl addit2.pl
21001097.97

real 0m2.825s
user 0m2.843s
sys 0m0.015s

$ cat f6.pasm
#
# f6.pasm
#
# array element arguments are created before the loop
#
_main:

$ time parrot f6.pasm
21001097.970000

real 0m3.925s
user 0m3.936s
sys 0m0.015s

$ time parrot -j f6.pasm
21001094.100000 (note: wrong result and slower with jit)

real 0m11.999s
user 0m12.015s
sys 0m0.000s

Joe Wilson

unread,

Dec 27, 2003, 5:00:14 PM12/27/03

to perl6-i...@perl.org

Using a recursive version of the fibonacci function (with
the integer 32 as an argument) to test function call overhead
I get these timings for various languages and configurations:

perl 5.6.1 fib.pl 10.93 seconds
python 2.2.2 fib.py 6.76 seconds
parrot f.pasm 2.74 seconds
parrot -j f.pasm 1.53 seconds
parrot fib.imc (*) 22.07 seconds
parrot -j fib.imc (*) 18.04 seconds

Prototyped functions in Parrot have a huge runtime overhead
as compared to normal subroutines. Is this to be expected?

fib.imc is basically like parrot/examples/benchmarks/fib.imc
except for 32 instead or 24 and s/var/pmc/g
The changes were made because fib.imc in CVS did not compile.

CVS parrot from yesterday running on Cygwin.
Full source code and timings below.

$ cat fib.pl
#!/usr/bin/perl -w
use strict;
sub fib {
my $n = shift;
return 1 if ($n < 2);
return fib($n-1) + fib($n-2);
}
my $N = 32;
print fib($N), "\n";

$ time perl fib.pl
3524578

real 0m10.934s
user 0m10.936s
sys 0m0.015s

$ cat fib.py
import sys

def fib(n):
if (n < 2):
return 1
return fib(n-2) + fib(n-1)

def main():
N = 32;
print fib(N)

main()

$ time python fib.py
3524578

real 0m6.765s
user 0m6.749s
sys 0m0.046s

$ cat f.pasm
_main:
set I1, 32
bsr _fibo
print I0
print "\n"
end
_fibo:
ge I1, 2, FIB2
set I0, 1
ret
FIB2:
save I1
dec I1
bsr _fibo
dec I1
save I0
bsr _fibo
restore I3
add I0, I0, I3
restore I1
ret

$ time ../../parrot f.pasm
3524578

real 0m2.743s
user 0m2.749s
sys 0m0.015s

$ time ../../parrot -j f.pasm
3524578

real 0m1.530s
user 0m1.546s
sys 0m0.000s

$ cat fib.imc
.pcc_sub _main prototyped
.param pmc argv
.sym int argc
argc = argv
.sym int N
N = 32
if argc <= 1 goto noarg
$S0 = argv[1]
N = $S0
noarg:
.sym float start
.sym pmc fib
fib = newsub _fib
time start
.pcc_begin prototyped
.arg N
.pcc_call fib
.sym int r
.result r
.pcc_end
.sym float fin
time fin
print "fib("
print N
print ") = "
print r
print " "
sub fin, start
print fin
print "s\n"
end
.end

.pcc_sub _fib prototyped
.param int n
if n >= 2 goto rec
n = 1
.pcc_begin_return
.return n
.pcc_end_return
rec:
.sym int n1
.sym int n2
.sym int r1
.sym int r2
.sym pmc fib
fib = P0
n1 = n - 1
n2 = n - 2
.pcc_begin prototyped
.arg n1
.pcc_call fib
.result r1
.pcc_end
.pcc_begin prototyped
.arg n2
.pcc_call fib
.result r2
.pcc_end
n = r1 + r2
.pcc_begin_return
.return n
.pcc_end_return
.end

$ time ../../parrot fib.imc
fib(32) = 3524578 22.044000s

real 0m22.078s
user 0m22.077s
sys 0m0.030s

$ time ../../parrot -j fib.imc
fib(32) = 3524578 18.017000s

real 0m18.048s
user 0m17.796s
sys 0m0.030s

Luke Palmer

unread,

Dec 27, 2003, 5:31:23 PM12/27/03

to Joe Wilson, Dan Sugalski, perl6-i...@perl.org

Joe Wilson writes:
> Dan Sugalski:
> > 2) Parrot's Array and SArray values all accept mixed-type data, which
> > perl's arrays do *not* do, and as such have some extra speed hits
> > that perl arrays don't.
>
> What do you mean?
> Perl's arrays do indeed accept mixed data types (see example below).

No, they don't. They are arrays of PerlScalars only.

For this:

set P1, P0[35]

The current Array and SArray have to sift through the sparse table to
find the 35th index (that's pretty efficient, but it still needs to
check whether it needs to do that). It then checks whether the 35th
element is indeed a PMC (it could be an int, num, or str), and then sets
P1 to that element.

Perl's array doesn't have sparse handling (the timing of
perl -le '$x[10000000] = 1' should convince you of this), and it doesn't
need to check whether the 35th element is a scalar, because the only
thing it holds are scalars.

Luke

Joe Wilson

unread,

Dec 27, 2003, 5:34:24 PM12/27/03

to Luke Palmer, perl6-i...@perl.org

--- Luke Palmer <fibo...@babylonia.flatirons.org> wrote:
> The overhead you're seeing comes from many things. First, using
> prototyped (or unprototyped) functions from in imcc follows the parrot
> calling conventions. That is, it uses continuation-passing instead of
> bsr, sets a few int registers on the run, and does a savetop/restoretop.
> None of these things were in your pasm file. The largest account of
> this overhead likely comes from the savetop/restoretop.
>
> Not to mention, the imcc version is much less clever as far as getting
> speed out of things.
>
> But nonetheless I feel it's worth putting some serious effort
> into making sub calls very fast on parrot. That might involve some
> minor tweaking of the calling conventions.

I get consistantly much better timings when arguments of prototyped functions
(regardless of type, number of arguments, or whether it is a vararg function)
are simply all passed in a single PMC array unconditionally.

This approach has a number of advantages:
it is much simpler, it frees up scratch registers (no need for save, restore,
savetop, restoretop and friends), all arguments are in order (counts do not
convey argument order when different argument types are involved),
no need for counts stored in registers, and it's great for vararg
functions (i.e., no need for the new foldup opcode at all).

Please consider it.

Joe Wilson

unread,

Dec 27, 2003, 5:38:17 PM12/27/03

to Luke Palmer, Dan Sugalski, perl6-i...@perl.org

--- Luke Palmer <fibo...@babylonia.flatirons.org> wrote:

> Joe Wilson writes:
> > Dan Sugalski:
> > > 2) Parrot's Array and SArray values all accept mixed-type data, which
> > > perl's arrays do *not* do, and as such have some extra speed hits
> > > that perl arrays don't.
> >
> > What do you mean?
> > Perl's arrays do indeed accept mixed data types (see example below).
>
> No, they don't. They are arrays of PerlScalars only.

I assumed Dan meant strings, floats and integers were mixed data types
because this is what my benchmark programs demonstrated.

That crazy english language. Leave out a word and it changes the meaning
of a sentence. ;-)

>
> For this:
>
> set P1, P0[35]
>
> The current Array and SArray have to sift through the sparse table to
> find the 35th index (that's pretty efficient, but it still needs to
> check whether it needs to do that). It then checks whether the 35th
> element is indeed a PMC (it could be an int, num, or str), and then sets
> P1 to that element.
>
> Perl's array doesn't have sparse handling (the timing of
> perl -le '$x[10000000] = 1' should convince you of this), and it doesn't
> need to check whether the 35th element is a scalar, because the only
> thing it holds are scalars.
>
> Luke

__________________________________

Luke Palmer

unread,

Dec 27, 2003, 5:24:00 PM12/27/03

to Joe Wilson, perl6-i...@perl.org

Joe Wilson writes:
> Using a recursive version of the fibonacci function (with
> the integer 32 as an argument) to test function call overhead
> I get these timings for various languages and configurations:
>
> perl 5.6.1 fib.pl 10.93 seconds
> python 2.2.2 fib.py 6.76 seconds
> parrot f.pasm 2.74 seconds
> parrot -j f.pasm 1.53 seconds
> parrot fib.imc (*) 22.07 seconds
> parrot -j fib.imc (*) 18.04 seconds
>
> Prototyped functions in Parrot have a huge runtime overhead
> as compared to normal subroutines. Is this to be expected?

The overhead you're seeing comes from many things. First, using

prototyped (or unprototyped) functions from in imcc follows the parrot
calling conventions. That is, it uses continuation-passing instead of
bsr, sets a few int registers on the run, and does a savetop/restoretop.
None of these things were in your pasm file. The largest account of
this overhead likely comes from the savetop/restoretop.

Not to mention, the imcc version is much less clever as far as getting
speed out of things.

But nonetheless I feel it's worth putting some serious effort
into making sub calls very fast on parrot. That might involve some
minor tweaking of the calling conventions.

Luke

Dan Sugalski

unread,

Dec 27, 2003, 6:30:56 PM12/27/03

to Joe Wilson, Luke Palmer, perl6-i...@perl.org

At 2:34 PM -0800 12/27/03, Joe Wilson wrote:
>I get consistantly much better timings when arguments of prototyped functions
>(regardless of type, number of arguments, or whether it is a vararg function)
>are simply all passed in a single PMC array unconditionally.
>

>Please consider it.

Considered, but also discarded. I'm reasonably happy with how things
are behaving now, other than the issues of inefficiencies in the
array classes. Parameter registers make prototyped calling much
faster, and I'm not convinced they slow down the non-prototyped case
enough to make the change. IMCC has some issues of excessive saving
that need addressing at some point, and the code generated to access
@_ in perl 5 can be optimized a lot better than what most folks have
been doing when emulating it by hand.

Also, as has been pointed out, a good bit of the cost for the IMCC
style calls has nothing to do with registers at all, rather with the
cost to set up the continuation for the return, something that we
also may need to address.

Dan Sugalski

unread,

Dec 27, 2003, 6:17:34 PM12/27/03

to Joe Wilson, Luke Palmer, perl6-i...@perl.org

At 2:38 PM -0800 12/27/03, Joe Wilson wrote:
>--- Luke Palmer <fibo...@babylonia.flatirons.org> wrote:
>> Joe Wilson writes:
>> > Dan Sugalski:
>> > > 2) Parrot's Array and SArray values all accept mixed-type data, which
>> > > perl's arrays do *not* do, and as such have some extra speed hits
>> > > that perl arrays don't.
>> >
>> > What do you mean?
>> > Perl's arrays do indeed accept mixed data types (see example below).
>>
>> No, they don't. They are arrays of PerlScalars only.
>
>I assumed Dan meant strings, floats and integers were mixed data types
>because this is what my benchmark programs demonstrated.

I did, but perl's arrays don't do that. Perl 5, internally, has a
single scalar data type, the SV, and that's all that you can put into
arrays or hashes. When you do this in perl:

$foo[12] = 99;

You're actually putting an SV into slot 12 of foo, with an integer
value of 99. It's the equivalent of storing a PerlInt PMC with a
value of 99.

SArray and Array PMCs, in addition to being sparse (which has some
overhead), allow you to really store PMC *, STRING *, INTVAL, and
NUMVAL entries, which means that each slot in an SArray and Array
needs to have a flag on it that says what data type's in each slot.
Most of the code to handle this lives in list.c and, honestly,
there's a lot of overhead. For the problem it solves--having a
mixed-type sparse array--it's just fine, but that's not what we need
in this case, so the overhead is just too darned excessive.

What we need to do is get SArray a lot more efficient, so it holds
just PMCs and doesn't do any of the sparse stuff that's done now.
That'll tighten things up a lot, and I think should get us more in
line with how perl's behaving and how we want to be behaving.

Also, you might want to make sure you've built Parrot with
optimizations on. By default we don't enable GCC's -O to do any
optimization, and that does slow things down a bunch. On the other
hand, it makes debugging a whole lot easier. Perl 5 is built with
full optimization, so that'll make quite a difference. (Pass the
--optimize flag to Configure.pl to enable it, and expect the core ops
files to chew massive amounts of RAM and swap while it happens)

Dan Sugalski

unread,

Dec 27, 2003, 6:35:50 PM12/27/03

to Joe Wilson, Luke Palmer, perl6-i...@perl.org

At 6:17 PM -0500 12/27/03, Dan Sugalski wrote:
[reasons and some excuses for slowdowns snipped]

None of this, I should point out, in any way means we don't have a
problem, because we do. Things are *not* as fast as they should be,
and we need to address it. (And I'm glad you've brought it up, too,
even if I am desperately short of time at the moment :)

Nicholas Clark

unread,

Dec 27, 2003, 6:42:45 PM12/27/03

to Dan Sugalski, Joe Wilson, Luke Palmer, perl6-i...@perl.org

On Sat, Dec 27, 2003 at 06:17:34PM -0500, Dan Sugalski wrote:
> Also, you might want to make sure you've built Parrot with
> optimizations on. By default we don't enable GCC's -O to do any
> optimization, and that does slow things down a bunch. On the other
> hand, it makes debugging a whole lot easier. Perl 5 is built with
> full optimization, so that'll make quite a difference. (Pass the
> --optimize flag to Configure.pl to enable it, and expect the core ops
> files to chew massive amounts of RAM and swap while it happens)

For benchmarking with gcc 3.x on x86 I'm tending to use
-O2 -falign-loops=16 -falign-jumps=16 -falign-functions=16 -falign-labels=16 -mpreferred-stack-boundary=4 -minline-all-stringops

to disable the default code placement options, which as far as I can tell
from the documentation chose whether to pad code to better alignments based
on the amount of padding that would be needed. These defaults mean that
changing the size of earlier parts of the object file can affect the
alignment (and hence speed) of loops you didn't change. This is very
confusing.

Nicholas Clark

Joe Wilson

unread,

Dec 27, 2003, 9:34:21 PM12/27/03

to Nicholas Clark, Dan Sugalski, Joe Wilson, Luke Palmer, perl6-i...@perl.org

I used the default ./configure options (no idea what they were).

But more to the point - no one explained why the Parrot JIT ran
the code 3 times slower and arrived at the wrong result.

Joe Wilson

unread,

Dec 27, 2003, 9:45:24 PM12/27/03

to Dan Sugalski, Luke Palmer, perl6-i...@perl.org

That's okay - I've also considered and discarded Parrot. ;-)
Good luck with Parrot guys.
I will check out either Mono or Lua's VM for my purposes.
Thanks.

--- Dan Sugalski <d...@sidhe.org> wrote:

Dan Sugalski

unread,

Dec 27, 2003, 9:48:27 PM12/27/03

to Joe Wilson, Nicholas Clark, Joe Wilson, Luke Palmer, perl6-i...@perl.org

At 6:34 PM -0800 12/27/03, Joe Wilson wrote:
>I used the default ./configure options (no idea what they were).

The default's no optimization. At this point we're in development, so
having a build that can be meaningfully thrown into a debugger's more
important than the ultimate speed. (Besides, if we're faster without
optimization, that's a good thing)

>But more to the point - no one explained why the Parrot JIT ran
>the code 3 times slower and arrived at the wrong result.

Oh, that's easy--it's broken. :) Or possibly the JITted version keeps
more stuff in the x686 floating point registers so some intermediate
rounding's not going on that would happen without the JIT. I'm not
sure why, but it'll require some poking around to see what's going
on. Unfortunately x86 assembly's not my strong suit, so it'll end up
having to wait until someone's got time to take a look at it.

>--- Nicholas Clark <ni...@ccl4.org> wrote:
>> On Sat, Dec 27, 2003 at 06:17:34PM -0500, Dan Sugalski wrote:
>> > Also, you might want to make sure you've built Parrot with
>> > optimizations on. By default we don't enable GCC's -O to do any
>> > optimization, and that does slow things down a bunch. On the other
>> > hand, it makes debugging a whole lot easier. Perl 5 is built with
>> > full optimization, so that'll make quite a difference. (Pass the
>> > --optimize flag to Configure.pl to enable it, and expect the core ops
>> > files to chew massive amounts of RAM and swap while it happens)
>>
>> For benchmarking with gcc 3.x on x86 I'm tending to use
>> -O2 -falign-loops=16 -falign-jumps=16 -falign-functions=16 -falign-labels=16
>> -mpreferred-stack-boundary=4 -minline-all-stringops
>>
>> to disable the default code placement options, which as far as I can tell
>> from the documentation chose whether to pad code to better alignments based
>> on the amount of padding that would be needed. These defaults mean that
>> changing the size of earlier parts of the object file can affect the
>> alignment (and hence speed) of loops you didn't change. This is very
>> confusing.
>>
>> Nicholas Clark
>
>
>__________________________________
>Do you Yahoo!?
>New Yahoo! Photos - easier uploading and sharing.
>http://photos.yahoo.com/

Leopold Toetsch

unread,

Dec 27, 2003, 4:55:51 PM12/27/03

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:
> Interesting. However... the two programs aren't equivalent. You're
> using constant values and putting results onto an existing data
> structure in perl, so true 'equivalence' requires a few changes to
> the source. With those in place, I get the following:

Could you please commit these files to examples/benchmarks/ as
some.pasm and some.pl please.

Thanks,
leo

Leopold Toetsch

unread,

Dec 28, 2003, 4:36:08 AM12/28/03

to Joe Wilson, perl6-i...@perl.org

Joe Wilson <deve...@yahoo.com> wrote:

> $ time parrot -j f6.pasm
> 21001094.100000 (note: wrong result and slower with jit)

I don't get a slow down nor a wrong result (i386/linux). On what
platform do you test?

leo

Leopold Toetsch

unread,

Dec 28, 2003, 4:21:03 AM12/28/03

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:

> SArray and Array PMCs, in addition to being sparse (which has some
> overhead), allow you to really store PMC *, STRING *, INTVAL, and
> NUMVAL entries, which means that each slot in an SArray and Array
> needs to have a flag on it that says what data type's in each slot.

SArray can hold arbitrary items and has a type associated with it. Its
fixed sized after initialization and does not handle sparse holes. Its
intended for handling small typed arrays.

Array and PerlArray only store PMCs. If you put an natural int into an
Array a new PerlInteger with that value is stored. Finally only Arrays
and PerlArray can be sparse.

s. also the warnocked: "Q: Array vs SArray" from Dec 11th.

> Most of the code to handle this lives in list.c and, honestly,
> there's a lot of overhead. For the problem it solves--having a
> mixed-type sparse array--it's just fine, but that's not what we need
> in this case, so the overhead is just too darned excessive.

The list.c code is using memory chunks and performs fine with big arrays
(try shift/unshift on perl5 and parrot arrays) The current usage pattern
(put a few PMCs in) isn't optimized yet.

leo

Leopold Toetsch

unread,

Dec 28, 2003, 4:42:44 AM12/28/03

to Joe Wilson, perl6-i...@perl.org

Joe Wilson <deve...@yahoo.com> wrote:
> Perl's arrays do indeed accept mixed data types (see example below).

Perl's Arrays take SV's. Please use a PerlArray instead of SArray.

Parrot (still built unoptimized) is significantly faster then perl5 on this
test.

leo

Leopold Toetsch

unread,

Dec 28, 2003, 5:57:58 AM12/28/03

to Joe Wilson, perl6-i...@perl.org

Joe Wilson <deve...@yahoo.com> wrote:
> I implemented the same variable argument function "varargs_adder"
> in both Perl 5 (addit.pl) and Parrot (f4.pasm).

I've put in the addit benchmarks and some variations of it:
- addit.pl ... as of Joe Wilson
- addit.pasm .. ditto but use PerlArray
- addit.imc ... rewritten as it would be generated and use pdd03
- addit2.imc ... optimized - pull return continuation creation
out of loop

Here are some results running this on my Athlon 800, parrot is *not*
optimized:

perl5.00503 addit.pl 5.9 s
perl5.8.0-threaded 5.6
perl5.8.0-long-double 5.3

parrot addit.pasm 4.3
parrot -C addit.pasm 3.5
parrot -j addit.pasm 2.9

parrot -C addit.imc 6.1
parrot -C -Oc addit.imc 5.3
parrot -j -Oc addit.imc 4.9

parrot -C addit2.imc 3.6
parrot -C -Oc addit2.imc 3.0
parrot -j -Oc addit2.imc 2.5

JIT results are correct BTW. Timings are user time rounded up.

leo

Jeff Clites

unread,

Dec 28, 2003, 7:10:05 PM12/28/03

to l...@toetsch.at, P6I Internals

If I take the f4.pasm that Joe Wilson originally posted, and just
change SArray to PerlArray, then I grow in memory usage very quickly.
If I comment out the "set P5, 5", then it doesn't happen.

Based on your 'warnocked "Q: Array vs SArray" from Dec 11th' (though
now that we are talking about it, I suppose it's not warnocked any
more...), I'd expect this "set P5, 5" to fill the PerlArray with 5
nulls, but I don't see right off why I'd get a memory explosion. Do you
have any idea what might be going on?

Thanks,

JEff

Miguel de Icaza

unread,

Jan 7, 2004, 6:55:09 PM1/7/04

to

Hello,

> >Perhaps some of you may be interested in this
> >entry from Miguel de Icaza's web log:
> >
> >Python running fast on .NET
> >http://primates.ximian.com/~miguel/archive/2003/Dec-09.html

>
> Yeah, but alas Miguel's mis-informed. A reasonable reimplementation
> of core python (without all the grotty bits that arguably throw in
> the huge speed hit) should run that benchmark at about 20x python's
> base speed, and the parts of Python that will give .NET serious fits
> haven't been implemented. Python's method semantics don't match
> .NET's semantics in a number of performance-unpleasant ways.

Dan, it was not me that did that performance test nor the Python compiler,
it was Jim Hugunin who authored Jython. The man deserves some credibility
after all he has now done two Python implementations and probably knows
a lot more than most of us about optimizing Python.

My understanding is that IronPython JIT compiles python into CIL code,
when I spoke to Jim, he said that he also was interested in implementing
type inferencing to get even more performance out of it (JScript and VB.NET
implement it nowadays).

I apologize from any resulting cognitive dissonance and you may now return to
listening to Dan.

Miguel.