Execute in place?

Rhys Weatherley

unread,

Oct 25, 2002, 7:15:07 AM10/25/02

to perl6-i...@perl.org

I was just having a look at the packfile format code, and I
have a suggestion on load-time performance of the code segment.

Currently, you read the file, parse out the various sections,
copy them elsewhere in memory, and byte-swap as necessary.
The overhead of doing this could be quite significant on large
applications/modules.

A "trick" that I've found very useful in the past is to design
the bytecode format so that it can be mmap'ed into a block of
memory, and then executed almost immediately with the minimum
number of fixups. Rather than copying the instructions, you
execute them directly out of the mmap'ed region.

This has several benefits. The obvious one is less copying
during program loading. But less obvious is that it can make
the OS kernel do the hard work for you.

When a region is mmap'ed read-only, the OS kernel can manage
memory better. When memory gets tight, it can simply discard
the page, as it can always go back to the file to get it
again. A malloc'ed page, by comparison, must be copied out
to the swap file, which incurs additional overhead.

Even better, the kernel will perform demand-paging of the
bytecode into memory as the code is executed, giving much
faster startup times. Multiple processes accessing the
same module will share the same page. Copying everything
into a malloc'ed region defeats demand-paging.

Of course, this is only going to work if the packfile matches
the host endianness exactly. A byteswap is still required if
the endianness doesn't match. However, in any given install,
it is 95% likely that the pre-compiled modules will be set
up with the host order.

Key to making this work is that fixup information must be
well isolated within the file. e.g. references to external
functions is by static index into a fixup table, not by
applying a relocation directly to the main code segment.

If you ever wonder why PIC-format ELF binaries are so weird,
it is to harness the mmap system deep inside the kernel to
do most of the hard work.

Just an idea. Apologies if I'm rehashing something that has
already been discussed previously and discarded.

Cheers,

Rhys.

Nicholas Clark

unread,

Oct 25, 2002, 8:19:45 AM10/25/02

to Rhys Weatherley, perl6-i...@perl.org

On Fri, Oct 25, 2002 at 09:15:07PM +1000, Rhys Weatherley wrote:
> A "trick" that I've found very useful in the past is to design
> the bytecode format so that it can be mmap'ed into a block of
> memory, and then executed almost immediately with the minimum
> number of fixups. Rather than copying the instructions, you
> execute them directly out of the mmap'ed region.

[snip advantages of a read only mmap over other options]

> Just an idea. Apologies if I'm rehashing something that has
> already been discussed previously and discarded.

IIRC speed is parrot's number 2 priority (after correctness), so anything
that makes parrot faster is good. I remember that a while back Dan was
suggesting some change or addition to the bytecode format that would have
meant that the file could not have been mapped read only. Two people (I was
one) commented that this was a bad thing, because being able to mmap read
only was very useful, for the reasons you describe.

I wasn't aware that the bytecode format had changed sufficiently to
preclude mapping the whole file in read only (even if the current reader
doesn't do this), but I admit that I've not been following changes closely.

Nicholas Clark
--
Brainfuck better than perl? http://www.perl.org/advocacy/spoofathon/

Dan Sugalski

unread,

Oct 25, 2002, 3:59:42 PM10/25/02

to perl6-i...@perl.org

At 9:15 PM +1000 10/25/02, Rhys Weatherley wrote:
>I was just having a look at the packfile format code, and I
>have a suggestion on load-time performance of the code segment.
>
>Currently, you read the file, parse out the various sections,
>copy them elsewhere in memory, and byte-swap as necessary.
>The overhead of doing this could be quite significant on large
>applications/modules.
>
>A "trick" that I've found very useful in the past is to design
>the bytecode format so that it can be mmap'ed into a block of
>memory, and then executed almost immediately with the minimum
>number of fixups. Rather than copying the instructions, you
>execute them directly out of the mmap'ed region.

Gah. This was how things were originally done, and what was supposed
to be followed through on--if the segment on disk matched the host
size/endianness, it's supposed to be just mmapped in. The constants
section may need fixup (string and PMC constants, certainly) but the
code itself is supposed to be shared. This is good for both speed
reasons (as you noted) and for process size reasons, since we're
definitely facing the potential for a half-zillion Apache mod_parrot
processes mapping in the same bytecode to execute server-side
programs.

If this has been broken, then we need to fix it, as it's a bug.
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk