benchmarking - it's now all(-1,0,1,5,6)% faster

Nicholas Clark

unread,

Jan 11, 2003, 2:05:22 PM1/11/03

to perl5-...@perl.org, perl6-i...@perl.org

This is intentionally a crosspost. One of parrot's aims is to go faster than
perl 5. Meanwhile, I've been trying to make perl 5 go faster. To achieve
either goal, we need measure "faster". I'm having problems measuring
"faster". Well, I'm stuck. And unless we have a good plan for how to
measure representative average speeds of perl 5 and parrot for representative
tasks, I can't see how we can tune parrot to be faster than perl 5, or perl
5.10 to be faster than 5.8

The story so far:

I'm trying a couple of things, involving inlining one of perl 5's functions,
and applying copy on write to results capture in regexps, to see if it makes
perl go faster. For want of anything better, I'm using the "perlbench"
suite to measure the speed of my patched version. perlbench contains 20
small programs that do representative things, and times them each of them. You
give it several versions of perl to benchmark, and it tells you the relative
timings. So far so good.

I was getting about 5% speedups on penfold against vanilla development perl.
Penfold is an x86 box (actually a Citrix chip, which may be important) running
Debian unstable, with gcc 3.2.1 and 256M of RAM.

I tried the same tests on mirth, a ppc box, again Debian unstable, gcc 3.2.1,
but 128M of RAM. This time I saw 1% slowdowns.

So I tried the same tests on colon, PIII, 1G of RAM, but FreeBSD and gcc 2.95.
There I see 0% or 1% speedup.

But I see some very strange lines. For example on colon (reformatted):

A L B K C J D H E G F I
--- --- --- --- --- --- --- --- --- --- --- ---

array/sort-num 100 100 106 106 51 51 109 109 104 105 52 52

A,L are vanilla patchlevel 18142, B,K have the same patching, C,J have the
same patching as each other, etc.

The difference between E,G and F,I is only 1 thing; for E,G I have this macro:

#define RX_MATCH_COPY_FREE(rx) \
STMT_START {if (RX_MATCH_COPIED(rx)) { \
Safefree(rx->subbeg); \
RX_MATCH_COPIED_off(rx); \
}} STMT_END

whereas for F,I it does slightly more:

#define RX_MATCH_COPY_FREE(rx) \
STMT_START {if (rx->saved_copy) { \
SV_CHECK_THINKFIRST_COW_DROP(rx->saved_copy); \
} \
if (RX_MATCH_COPIED(rx)) { \
Safefree(rx->subbeg); \
RX_MATCH_COPIED_off(rx); \
}} STMT_END

(C, J have considerably more differences than that the above, so it's not
easy to describe what they do. The other binaries also has changes to
code and differing compiler flags, so it's not easy to summarise)

RX_MATCH_COPY_FREE is used exactly 4 times, where regexps need to free saved
matches, and only inside the regexp engine.

(If you're confused it's a macro I created for the existing 3 lines:
if (RX_MATCH_COPIED(rx))
Safefree(rx->subbeg);
RX_MATCH_COPIED_off(rx);
)

yet somehow that simple change makes the sort-num test run at half speed.

The entire sort_num.t file is this:

----------------------------------------------------------------------------
#!perl

# Name: Array sorting
# Require: 4
# Desc:
#

require 'benchlib.pl';

@a = (1..200);
srand(10);
push(@b, splice(@a, rand(@a), 1)) while @a; # shuffle

&runtest(0.3, <<'ENDTEST');

@a = sort {$a <=> $b } @b;

ENDTEST
----------------------------------------------------------------------------

The part that actually oops is the 1 line @a = sort {$a <=> $b } @b;

It goes nowhere near any regexp code or regexp ops. benchlib.pl contains
no regexps, but does load Time::HiRes if it can, which in turn will bring
in Exporter and DynaLoader (which do have regexps). But there are no regexps
in the timing loop.

I see similar wild fluctuations in the index tests, which also have no regexps:

require 'benchlib.pl';

$a = "xx" x 100;
$b = "foobar";
$c = "xxx";

&runtest(15, <<'ENDTEST');

$c = index($a, $b);
$c = index($a, $c);

ENDTEST

So I'm confused. It looks like some bits of perl are incredibly sensitive to
cache alignment, or something similar. And as a consequence, perlbench is
reliably reporting wildly varying timings because of this, and because it
only tries a few, very specific things. Does this mean that it's still useful?
I'm not convinced that the real, average, performance of a perl binary varies
this wildly. And if it doesn't vary this much, but perlbench does vary this
much, what sort of tasks should be used for quantitative benchmarking of
perl and parrot code? Because I fear that if we don't have benchmarks to aim
to improve on, we're shooting in the dark when it comes to improving
performance.

Nicholas Clark

Andreas J. Koenig

unread,

Jan 11, 2003, 5:17:57 PM1/11/03

to Leopold Toetsch, Nicholas Clark, perl5-...@perl.org, perl6-i...@perl.org

>>>>> On Sat, 11 Jan 2003 22:26:39 +0100, Leopold Toetsch <l...@toetsch.at> said:

> Nicholas Clark wrote:
>> So I'm confused. It looks like some bits of perl are incredibly sensitive to
>> cache alignment, or something similar.

> This reminds me on my remarks on JITed mops.pasm which variied ~50%

And it reminds me on my postings to p5p about glibc being very buggy
up to 2.3 (posted during last October). I came to the conclusion that
perl cannot be benchmarked at all with glibc before v2.3.

--
andreas

Nicholas Clark

unread,

Jan 11, 2003, 5:31:42 PM1/11/03

to Andreas J. Koenig, Leopold Toetsch, perl5-...@perl.org, perl6-i...@perl.org

On Sat, Jan 11, 2003 at 11:17:57PM +0100, Andreas J. Koenig wrote:

> And it reminds me on my postings to p5p about glibc being very buggy
> up to 2.3 (posted during last October). I came to the conclusion that
> perl cannot be benchmarked at all with glibc before v2.3.

I remember your posting, but not the details. Did it relate to glibc's malloc
and how long it took to free things? If so, surely benchmarking using perl's
malloc would work with earlier glibc's?

Anyway, on the two Debian systems I tested:

nick@penfold:~/5.8.0-i-g/t$ ls -l /lib/libc.so.6
lrwxrwxrwx 1 root root 13 Jan 2 08:46 /lib/libc.so.6 -> libc-2.3.1.so
nick@mirth:~$ ls -l /lib/libc.so.6
lrwxrwxrwx 1 root root 13 Jan 7 16:20 /lib/libc.so.6 -> libc-2.3.1.so

And (obviously) the FreeBSD has BSD's libc

Thanks for the reminder. It's only good luck that I (well Richard) had 2.3.1
on them.

Nicholas Clark

h...@crypt.org

unread,

Jan 11, 2003, 10:44:06 PM1/11/03

to Nicholas Clark, perl5-...@perl.org, perl6-i...@perl.org

Nicholas Clark <ni...@unfortu.net> wrote:
:So I'm confused. It looks like some bits of perl are incredibly sensitive to

:cache alignment, or something similar. And as a consequence, perlbench is
:reliably reporting wildly varying timings because of this, and because it
:only tries a few, very specific things. Does this mean that it's still useful?

I think I remember seeing a profiler that emulates the x86 instruction set,
and so can give theoretically exact timings. Does this ring a bell for
anyone? I don't know if the emulation extended to details such as RAM
and cache sizes ...

Hugo

Andreas J. Koenig

unread,

Jan 12, 2003, 12:30:21 AM1/12/03

to Nicholas Clark, Andreas J. Koenig, Leopold Toetsch, perl5-...@perl.org, perl6-i...@perl.org

>>>>> On Sat, 11 Jan 2003 22:31:42 +0000, Nicholas Clark <ni...@unfortu.net> said:

> On Sat, Jan 11, 2003 at 11:17:57PM +0100, Andreas J. Koenig wrote:
>> And it reminds me on my postings to p5p about glibc being very buggy
>> up to 2.3 (posted during last October). I came to the conclusion that
>> perl cannot be benchmarked at all with glibc before v2.3.

> I remember your posting, but not the details. Did it relate to glibc's malloc
> and how long it took to free things?

Yes.

> If so, surely benchmarking using perl's malloc would work with
> earlier glibc's?

I saw the erratic speed behaviour with 2.2.3, 2.2.4, and 2.2.5 and
didn't test earlier ones. glibc 2.3 had malloc rewritten from scratch
and with my limited testing it seemed to have this problem fixed.

> Anyway, on the two Debian systems I tested:

> nick@penfold:~/5.8.0-i-g/t$ ls -l /lib/libc.so.6
> lrwxrwxrwx 1 root root 13 Jan 2 08:46 /lib/libc.so.6 -> libc-2.3.1.so
> nick@mirth:~$ ls -l /lib/libc.so.6
> lrwxrwxrwx 1 root root 13 Jan 7 16:20 /lib/libc.so.6 -> libc-2.3.1.so

> And (obviously) the FreeBSD has BSD's libc

> Thanks for the reminder. It's only good luck that I (well Richard) had 2.3.1
> on them.

Well, then my findings don't solve the puzzle.

--
andreas

Michael G Schwern

unread,

Jan 11, 2003, 3:38:44 PM1/11/03

to Nicholas Clark, perl5-...@perl.org, perl6-i...@perl.org

On Sat, Jan 11, 2003 at 07:05:22PM +0000, Nicholas Clark wrote:
> I was getting about 5% speedups on penfold against vanilla development perl.
> Penfold is an x86 box (actually a Citrix chip, which may be important) running
> Debian unstable, with gcc 3.2.1 and 256M of RAM.
>
> I tried the same tests on mirth, a ppc box, again Debian unstable, gcc 3.2.1,
> but 128M of RAM. This time I saw 1% slowdowns.

FWIW, in the past I've noticed that x86 and PPC do react differently to
optimizations. I've had cases where things ran at the same speed on PPC
yet showed large differences on x86.

--

Michael G. Schwern <sch...@pobox.com> http://www.pobox.com/~schwern/
Perl Quality Assurance <per...@perl.org> Kwalitee Is Job One