[RFC] multiple code segments and the interpreter

Leopold Toetsch

unread,

Jan 29, 2003, 8:12:01 AM1/29/03

to P6I

The variable layout of interpreter->code (actually the packfile) doesn't
fit very good for multiple code segments. There is only one ->byte_code
pointer, the byte_code_size is in bytes and converted zig times into
opcode_t's and so on.

so:

1) rename interpreter->code to interpreter->pf (the packfile)

The packfile owns all PBC segments: byte_code, const_table and so on.
These segments are all organized in the directory segment. Eval'ing code
generates a new code segment and appends constants to the const_table,
so far so good: But how to load a precompiled PBC module whith its own
const_table (with constants numbered from 0 and spread all over the
code). So:

2) we need multiple constant_tables too.

As one or more code segments may refer to these constants, they
constant_table must be linked to these code segments.

3) one PBC file has exactly one const_table (or none if no constants are
used) and may have multiple code segments

For switching code segments (with the intersegment jump instruction), we
setup this new pointer:

4) interpreter->code ... points to current executing byte_code seg
interpreter->code->const_table ... to current code segs constant

Comments welcome,
leo

Nicholas Clark

unread,

Jan 29, 2003, 4:12:46 PM1/29/03

to Leopold Toetsch, P6I

On Wed, Jan 29, 2003 at 02:12:01PM +0100, Leopold Toetsch wrote:
> The variable layout of interpreter->code (actually the packfile) doesn't
> fit very good for multiple code segments. There is only one ->byte_code
> pointer, the byte_code_size is in bytes and converted zig times into
> opcode_t's and so on.

Would it be better to store the value as a length in opcode_t's, and convert
back to bytes where needed?

>
> so:
>
> 1) rename interpreter->code to interpreter->pf (the packfile)
>
> The packfile owns all PBC segments: byte_code, const_table and so on.
> These segments are all organized in the directory segment. Eval'ing code
> generates a new code segment and appends constants to the const_table,
> so far so good: But how to load a precompiled PBC module whith its own
> const_table (with constants numbered from 0 and spread all over the
> code). So:
>
> 2) we need multiple constant_tables too.
>
> As one or more code segments may refer to these constants, they
> constant_table must be linked to these code segments.
>
> 3) one PBC file has exactly one const_table (or none if no constants are
> used) and may have multiple code segments

If I understand you correctly, every time an eval happens, more code is
created, and that code's associated constants are appended to the constant
table. As is, this feels like an effective leak - in a long running process
(eg mod_parrot) there is the potential for something to cause repeated
evals, where the code is used only once then discarded. The parrot code
block can be released, GCed and recycled, but the constant table will keep
growing.

Maybe code blocks should keep their own constants, so that they can be
released together. Maybe they should even be in the same bit of allocated
memory, so that locality helps caching and VM performance.

Nicholas Clark

Leopold Toetsch

unread,

Jan 30, 2003, 2:42:34 AM1/30/03

to Nicholas Clark, P6I

Nicholas Clark wrote:

> On Wed, Jan 29, 2003 at 02:12:01PM +0100, Leopold Toetsch wrote:
>
>>The variable layout of interpreter->code (actually the packfile) doesn't
>>fit very good for multiple code segments. There is only one ->byte_code
>>pointer, the byte_code_size is in bytes and converted zig times into
>>opcode_t's and so on.

> Would it be better to store the value as a length in opcode_t's, and convert
> back to bytes where needed?

Yep. The value in length of op's comes out for free by reading the
packfile. The length in bytes is not needed at all (AFAIK).

>>3) one PBC file has exactly one const_table (or none if no constants are
>>used) and may have multiple code segments

> If I understand you correctly, every time an eval happens, more code is
> created, and that code's associated constants are appended to the constant
> table.

... only if they are not yet there.

> ... As is, this feels like an effective leak - in a long running process

> (eg mod_parrot) there is the potential for something to cause repeated
> evals, where the code is used only once then discarded. The parrot code
> block can be released, GCed and recycled, but the constant table will keep
> growing.

Normally constants are folded. I think, there are 2 or 3 kinds of code
to be evaled:
- code is static - need to be compiled only once
- code changes but constants are the same
- all dynamic

The latter would need a constant_table per eval/code_block, which then
get's recycled. But as it seems not too easy to detect, if an evaled
code in a loop does produces always the same constants, it might be
necessary to always regenerated (and GC) the constant table too.

> Maybe code blocks should keep their own constants, so that they can be
> released together. Maybe they should even be in the same bit of allocated
> memory, so that locality helps caching and VM performance.

Do you like to append the constants to the "normal" code block, to the
prederefed or to the JITed ;-)

> Nicholas Clark

leo

Nicholas Clark

unread,

Feb 4, 2003, 5:36:32 PM2/4/03

to Leopold Toetsch, P6I

The summary reminded me I had a question still

On Thu, Jan 30, 2003 at 08:42:34AM +0100, Leopold Toetsch wrote:
> Nicholas Clark wrote:

> >If I understand you correctly, every time an eval happens, more code is
> >created, and that code's associated constants are appended to the constant
> >table.
>
>
> ... only if they are not yet there.

How do you know if they are there? Presumably the constant table is just
an array. It may start ordered, but if you can add values to it, it becomes
unordered. How is eval going to determine if a constant is already there?
Create a hash table of values present at the time of the first eval, and
use that from then on?

That feels expensive, as does the only alternative I can think of (linear
search). Whereas simply creating new constant tables each time feels
cheaper, albeit with more transient cost in memory, but less long term
if they can be freed before parrot's exit.

>
> >... As is, this feels like an effective leak - in a long running process
> >(eg mod_parrot) there is the potential for something to cause repeated
> >evals, where the code is used only once then discarded. The parrot code
> >block can be released, GCed and recycled, but the constant table will keep
> >growing.
>
>
> Normally constants are folded. I think, there are 2 or 3 kinds of code
> to be evaled:
> - code is static - need to be compiled only once
> - code changes but constants are the same
> - all dynamic
>
> The latter would need a constant_table per eval/code_block, which then
> get's recycled. But as it seems not too easy to detect, if an evaled
> code in a loop does produces always the same constants, it might be
> necessary to always regenerated (and GC) the constant table too.

I can't see a good way to tell the second two apart. ("hindsight" isn't
a good way, until someone starts making computers than can travel both
directions in time)

What's the relative cost in calculating constants, versus adding them to
a table? I'd've thought that most of the effort would actually go into
working out what the value is, with only a little more for "have we seen
this before". Hence is all this discussion about constant tables (making
lots with lots of GC, or making few but having to keep track of things)
false optimisation?

> >Maybe code blocks should keep their own constants, so that they can be
> >released together. Maybe they should even be in the same bit of allocated
> >memory, so that locality helps caching and VM performance.
>
>
> Do you like to append the constants to the "normal" code block, to the
> prederefed or to the JITed ;-)

Normal. The other two types of block always refer back to a normal block,
don't they?

Nicholas Clark

Leopold Toetsch

unread,

Feb 5, 2003, 2:01:15 AM2/5/03

to Nicholas Clark, P6I

Nicholas Clark wrote:

> The summary reminded me I had a question still
>
> On Thu, Jan 30, 2003 at 08:42:34AM +0100, Leopold Toetsch wrote:

[ constant folding for eval ]

> Create a hash table of values present at the time of the first eval, and
> use that from then on?

There are 2 possibilities:
- we are executing a source file, then the lexer in imcc has a symbol
hash with constants anyway. Constant folding in eval is done automatically.
- we are executing a PBC. Then - as laid out by your analysis - we are
out of luck.

So I think: all *known* (from lexers POV) constants get their number

from the main constant table. New constants go into a new constant_table[1]

per eval code block.

Argh: Decoffeinitis

[1] Would still mean to append to the main constant table, remember the
const count of main, and clean up only new constants.

Sounds rather complicated. So probably to start with: the eval block has
it's own constant_table. Optimization for later: when eval() runs in a
loop, the usage count of constants is kept and when they are the same,

the constants are added to the main constant_table and kept.

> Nicholas Clark

leo