OO benches

Leopold Toetsch

unread,

Apr 16, 2004, 12:29:02 PM4/16/04

to P6I

With all current optimizations[1] I now have these timings:

$ ./bench -b=^oo[234f]
Numbers are relative to the first one. (lower is better)
p-j-Oc perl-th perl python ruby
oo2 100% 182% 152% 90% 132%
oo3 100% 276% 256% 333% 383%
oo4 100% 137% 128% 171% 292%
oofib 100% 303% 261% 157% 161%

And
$ time CALL__BUILD=1 parrot -j oo2b.pasm
real 0m2.566s
vs 0m2.630s for oo2.pasm
(w.o any of these optimizations oo2b takes 3.9s)

oo2 is basically object instantiation, oo2b calls the method in the
BUILD property, oo2 calls __init directly.
oo3 is attribute get
oo4 is attribute set (where Parrot creates new PMCs, which isn't really
needed :)
oofib tests mostly function/method call speed

[1]
- set_string_native references the string
- constant strings e.g. "BUILD" get a precomputed hash value from c2str.pl
- use of _S("BUILD") and _S("CONSTRUCT") in objects.c

Athlon 800
Parrot -O3 (gcc 2.92.2)
perl-th is threaded 5.8
perl is 5.8 with long double support
python 2.3.3
ruby 1.8.0

Aaron Sherman

unread,

Apr 16, 2004, 1:26:20 PM4/16/04

to Leopold Toetsch, P6I

On Fri, 2004-04-16 at 12:29, Leopold Toetsch wrote:
> With all current optimizations[1] I now have these timings:
>
> $ ./bench -b=^oo[234f]
> Numbers are relative to the first one. (lower is better)
> p-j-Oc perl-th perl python ruby
> oo2 100% 182% 152% 90% 132%
> oo3 100% 276% 256% 333% 383%

That looks suspicious... especially Python. It smells there's some lazy
evaluation going on here, and that object doesn't get fully instantiated
until oo3. I suspect, in that light, that the numbers aren't quite as
bad for Parrot as they look in oo2, nor as good for Parrot as the look
in oo3 (well, maybe as good as they look, but not as bad... I have to
think about that).

> $ time CALL__BUILD=1 parrot -j oo2b.pasm
> real 0m2.566s
> vs 0m2.630s for oo2.pasm
> (w.o any of these optimizations oo2b takes 3.9s)

I would suggest using iterations that go much longer so that you can
detect over-optimizations and such more easily.

Very nice!

--
Aaron Sherman <a...@ajs.com>
Senior Systems Engineer and Toolsmith
"It's the sound of a satellite saying, 'get me down!'" -Shriekback

Leopold Toetsch

unread,

Apr 16, 2004, 6:18:37 PM4/16/04

to Aaron Sherman, perl6-i...@perl.org

Aaron Sherman <a...@ajs.com> wrote:

> That looks suspicious... especially Python.

You have the sources in examples/benchmarks. Maybe we are comparing
apples and oranges. But the code looks good to me.

> I would suggest using iterations that go much longer so that you can
> detect over-optimizations and such more easily.

More benchmarks welcome.

> Very nice!

leo

Aaron Sherman

unread,

Apr 16, 2004, 6:29:41 PM4/16/04

to Leopold Toetsch, Perl6 Internals List

On Fri, 2004-04-16 at 18:18, Leopold Toetsch wrote:
> Aaron Sherman <a...@ajs.com> wrote:
>
> > That looks suspicious... especially Python.
>
> You have the sources in examples/benchmarks. Maybe we are comparing
> apples and oranges. But the code looks good to me.

Sorry, I gave the wrong impression. I meant it looks suspiciously like
Python is doing a lazy construction on those objects, not that there is
anything wrong with the benchmark.

Lazy construction is perhaps something Parrot should think about too,
though I've not looked into what Parrot does now. How often would it be
of any value to construct a stub object which still needed to be fully
constructed before use? Do such objects pop into existence implicitly in
any commonly-used places that would yield a performance win in the
general case?

Just wondering why Python would do that (if, indeed it is doing that).

Jeff Clites

unread,

Apr 17, 2004, 2:19:08 AM4/17/04

to Leopold Toetsch, P6I

On Apr 16, 2004, at 9:29 AM, Leopold Toetsch wrote:

> With all current optimizations[1] I now have these timings:
>
> $ ./bench -b=^oo[234f]
> Numbers are relative to the first one. (lower is better)
> p-j-Oc perl-th perl python ruby
> oo2 100% 182% 152% 90% 132%
> oo3 100% 276% 256% 333% 383%
> oo4 100% 137% 128% 171% 292%
> oofib 100% 303% 261% 157% 161%

Looks cool!

BTW, I'm failing a bunch of tests now (Mac OS X); not sure if it's
related:

Failed Test Stat Wstat Total Fail Failed List of Failed
------------------------------------------------------------------------
-------
t/op/gc.t 1 256 13 1 7.69% 11
t/pmc/dumper.t 13 3328 13 13 100.00% 1-13
t/pmc/object-meths.t 1 256 19 1 5.26% 9
t/pmc/objects.t 7 1792 37 7 18.92% 23-26 28 35-36

The gc test is failing with:

t/op/gc.................NOK 11# Failed test (t/op/gc.t at line 219)
# got: 'get_pmc_keyed_str() not implemented in class
'RetContinuation''
# expected: 'hello
# hello
# '
# '(cd . && ./parrot -b --gc-debug /tmp/gc_11.pasm)' failed with exit
code 2

and all of the dumper ones look like double-frees:

t/pmc/dumper............NOK 7# Failed test (t/pmc/dumper.t at line
359)
# got: '*** malloc[9416]: Deallocation of a pointer not
malloced: 0x200ee30; This could be a double free(), or free() called
with the middle of an allocated block

I'll poke a bit and see if I can figure out what's going on.

> - constant strings e.g. "BUILD" get a precomputed hash value from
> c2str.pl

This isn't checked in yet, right? (Didn't see c2str.pl anywhere.)

> - use of _S("BUILD") and _S("CONSTRUCT") in objects.c

Mac OS X doesn't like the _S()--it seems it may already be defined to
something. How about something clearer (and less likely to conflict)
instead, like STRING_LITERAL()?

JEff

Leopold Toetsch

unread,

Apr 17, 2004, 1:52:02 AM4/17/04

to Aaron Sherman, perl6-i...@perl.org

Aaron Sherman <a...@ajs.com> wrote:
> On Fri, 2004-04-16 at 18:18, Leopold Toetsch wrote:

> Sorry, I gave the wrong impression. I meant it looks suspiciously like
> Python is doing a lazy construction on those objects, not that there is
> anything wrong with the benchmark.

No, I don't think that this is happening. Parrot's slightly slower
object instantiation is due to register preserving mainly. The "__init"
code is run from inside the "new PObj, IClass" opcode. As its not known
that a method call is happening here, we can't use register preserving
operations that only save needed registers--we have to save all
registers. These two memcpys are the most heavy part of the operation.

> Lazy construction is perhaps something Parrot should think about too,

I can't imagine that lazy construction could be of any value. You have
to construct it finally. Sum up the two parts.

And 90% (or ~100 with gcc 3.3.3 on a Pentium) of Python's performance
isn't that bad the more that Python AFAIK is constructing kind of a hash
and we have a full fledged object.

leo

Leopold Toetsch

unread,

Apr 17, 2004, 2:43:24 AM4/17/04

to Jeff Clites, perl6-i...@perl.org

Jeff Clites <jcl...@mac.com> wrote:

> On Apr 16, 2004, at 9:29 AM, Leopold Toetsch wrote:

>> $ ./bench -b=^oo[234f]

> Looks cool!

Yep.

> BTW, I'm failing a bunch of tests now (Mac OS X); not sure if it's
> related:

Strange. valgrind doesn't indicate any problem with these tests.

> I'll poke a bit and see if I can figure out what's going on.

Yes please.

>> - constant strings e.g. "BUILD" get a precomputed hash value from
>> c2str.pl

> This isn't checked in yet, right? (Didn't see c2str.pl anywhere.)

It was attached to yesterdays message "Constant strings - again". But
I'll resend it with my recent changes WRT hashvalue precalculation.

>> - use of _S("BUILD") and _S("CONSTRUCT") in objects.c

> Mac OS X doesn't like the _S()--it seems it may already be defined to
> something. How about something clearer (and less likely to conflict)
> instead, like STRING_LITERAL()?

We can undef it before using. STRING_LITERAL is more typing and doesn't
assure uniqueness - so rather not. Maybe PSC() - Parrot String Constant.

> JEff

leo

Jeff Clites

unread,

Apr 17, 2004, 2:59:32 AM4/17/04

to Leopold Toetsch, P6I List

On Apr 16, 2004, at 11:19 PM, Jeff Clites wrote:

> BTW, I'm failing a bunch of tests now (Mac OS X); not sure if it's
> related:
>
> Failed Test Stat Wstat Total Fail Failed List of Failed
> -----------------------------------------------------------------------
> --------
> t/op/gc.t 1 256 13 1 7.69% 11
> t/pmc/dumper.t 13 3328 13 13 100.00% 1-13
> t/pmc/object-meths.t 1 256 19 1 5.26% 9
> t/pmc/objects.t 7 1792 37 7 18.92% 23-26 28 35-36

And of those, only these 2 fail if run without --gc-debug, _or_ if
configured with --optimize (seems like an odd correlation):

Failed Test Stat Wstat Total Fail Failed List of Failed
------------------------------------------------------------------------
-------
t/op/gc.t 1 256 13 1 7.69% 11

t/pmc/dumper.t 1 256 13 1 7.69% 12

JEff

Leopold Toetsch

unread,

Apr 17, 2004, 4:50:22 AM4/17/04

to Jeff Clites, perl6-i...@perl.org

Jeff Clites <jcl...@mac.com> wrote:

> BTW, I'm failing a bunch of tests now (Mac OS X); not sure if it's
> related:

> t/op/gc.................NOK 11# Failed test (t/op/gc.t at line 219)

> # got: 'get_pmc_keyed_str() not implemented in class
> 'RetContinuation''

Have that now too - recompiled with ARENA_DOD_FLAGS turned off. The
property hash got freed during DOD. I'm still searching why.

leo

Leopold Toetsch

unread,

Apr 17, 2004, 5:10:31 AM4/17/04

to Jeff Clites, perl6-i...@perl.org

Jeff Clites <jcl...@mac.com> wrote:

> BTW, I'm failing a bunch of tests now (Mac OS X); not sure if it's
> related:

Fixed. It was caused by the faster PMC creation code I've put in earlier
in the week, if ARENA_DOD_FLAGS is off (e.g. due to missing memalign).

Thanks for reporting,
leo

Piers Cawley

unread,

Apr 17, 2004, 6:19:16 AM4/17/04

to l...@toetsch.at, Aaron Sherman, perl6-i...@perl.org

Leopold Toetsch <l...@toetsch.at> writes:

> Aaron Sherman <a...@ajs.com> wrote:
>> On Fri, 2004-04-16 at 18:18, Leopold Toetsch wrote:
>
>> Sorry, I gave the wrong impression. I meant it looks suspiciously like
>> Python is doing a lazy construction on those objects, not that there is
>> anything wrong with the benchmark.
>
> No, I don't think that this is happening. Parrot's slightly slower
> object instantiation is due to register preserving mainly. The "__init"
> code is run from inside the "new PObj, IClass" opcode. As its not known
> that a method call is happening here, we can't use register preserving
> operations that only save needed registers--we have to save all
> registers. These two memcpys are the most heavy part of the operation.

Maybe we should rethink that then and make allocation and
initialization two different phases. Or dictate that

new PObj, IClass

should be treated as if it were a function call with all the caller
saves implications that go with it.

Leopold Toetsch

unread,

Apr 17, 2004, 10:48:01 AM4/17/04

to Piers Cawley, perl6-i...@perl.org

Piers Cawley wrote:

> Leopold Toetsch <l...@toetsch.at> writes:
>>These two memcpys are the most heavy part of the operation.
>>
>
> Maybe we should rethink that then and make allocation and
> initialization two different phases. Or dictate that
>
> new PObj, IClass
>
> should be treated as if it were a function call with all the caller
> saves implications that go with it.

Well, its not only object creation. While this is a bit special and
could have a special syntax, the problem is with all delegate usage e.g.
for tying.
If we need some extra speed for object creation, we could define it as

new PObj, IClass, "BUILD" # call sub in BUILD prop
new PObj, IClass, "CONSTRUCT" # call sub in CONSTRUCT prop
new PObj, IClass # no init call at all

and just save needed registers, as we know, that a Sub is called (or not).

But as said, it doesn't help here:

$ time perl ff.pl
010
real 0m3.287s

$ time parrot -j ff.pasm
010
real 0m2.334s

leo :)

ff.pl

ff.pasm

Dan Sugalski

unread,

Apr 19, 2004, 9:45:50 AM4/19/04

to Piers Cawley, l...@toetsch.at, Aaron Sherman, perl6-i...@perl.org

At 11:19 AM +0100 4/17/04, Piers Cawley wrote:
>Leopold Toetsch <l...@toetsch.at> writes:
>
>> Aaron Sherman <a...@ajs.com> wrote:
>>> On Fri, 2004-04-16 at 18:18, Leopold Toetsch wrote:
>>
>>> Sorry, I gave the wrong impression. I meant it looks suspiciously like
>>> Python is doing a lazy construction on those objects, not that there is
>>> anything wrong with the benchmark.
>>
>> No, I don't think that this is happening. Parrot's slightly slower
>> object instantiation is due to register preserving mainly. The "__init"
>> code is run from inside the "new PObj, IClass" opcode. As its not known
>> that a method call is happening here, we can't use register preserving
>> operations that only save needed registers--we have to save all
>> registers. These two memcpys are the most heavy part of the operation.
>
>Maybe we should rethink that then and make allocation and
>initialization two different phases.

That's the way I'm leaning. I know it's a *bad* idea from a
high-level language point of view, but from the lower levels it's
less of a bad idea.

New, then, would allocate the object and you'd need to then call its
constructor, with the constructor call using full-on parrot calling
conventions and giving the calling code a chance to save the
registers it was interested in. Of course, then we get into the issue
of handling return values from multiple calls into methods as we
automatically redispatch the constructor, but...
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk