Draft sketch of bytecode generation

Chromatic

unread,

Oct 27, 2002, 4:08:06 PM10/27/02

to perl6-i...@perl.org

On Sun, 27 Oct 2002 08:54:08 -0800, Dan Sugalski wrote:

These two seem highly similar:

> =item Add source code to segment
>
> This adds a line or more of source code to the bytecode segment.

> =item Add line number information
>
> This adds line number info to the bytecode segment, allowing the interpreter
> to find out what line of source a particular piece of bytecode corresponds to.

They might be related to this one, though I'm not completely sure:

> =item Add binary data chunk to segment
>
> Add in some raw binary data to the bytecode segment

Is there an underlying function used to add arbitrary (Unicode text) metadata
to the bytecode?

-- c

Juergen Boemmels

unread,

Oct 28, 2002, 7:21:37 AM10/28/02

to Brent Dax, Dan Sugalski, perl6-i...@perl.org

"Brent Dax" <bren...@cpan.org> writes:

[...]
> Optional?
>
> # =item Add bytecode to segment
> #
> # Add in actual bytecode to the segment.
>
> Not optional. :^)

really?
There may be reasons to have no bytecode. This segment could not be
run. This rises the question: If there are multiple segments, which
bytecode is run. I would say the bytecode in the first segment from
the beginning. Then only in the first segment the bytecode is
non-optional.

[...]

> # =item Add binary data chunk to segment
> #
> # Add in some raw binary data to the bytecode segment
>
> Optional?
>
> Can you have multiple raw chunks? Can they have symbolic names?
> Symbolic type names?

I second that.

> Are the other chunks essentially subclasses of this one?

I have implemented a first draft of this (perl #18056). The diffrence
is that I called the directory items PackFile_Segment.

> Is there some kind of directory that tells you stuff about these chunks?
> If there is, I can imagine two basic formats:
>
> SEGMENT 1
> DIRECTORY
> CHUNK 1
> OFFSET: 100 (4 bytes)
> SIZE: 55 (4 bytes)
> NAME: "src" (7 bytes)
> TYPE: "Parrot::Source Code" (23 bytes)
> CHUNK 2
> OFFSET: 155
> SIZE: 524
> NAME: "bc"
> TYPE: "Parrot::Bytecode"
> ...
> CHUNK 1
> use strict;
> use warnings;
> print "Hello World!\n";
> exit;
> CHUNK 2
> 101011101...
>
> Or
>
> SEGMENT 1
> DIRECTORY
> CHUNK 1 OFFSET: 100
> CHUNK 2 OFFSET: 189
> CHUNK 1:
> SIZE: 55
> NAME: "src"
> TYPE: "Parrot::Source Code"
> DATA: ...
> CHUNK 2:
> SIZE: 524
> NAME: "bc"
> TYPE: "Parrot::Bytecode"
> DATA: ...

I like the first one more: The data necessary to unpack a chunk is
localized in the first chunk (the DIRECTORY). Only this part of the
file needs to be read first. In the second case the metadata is
scattered over the whole file.

I will start modifing my patch to the new format.

bye
b.
--
Juergen Boemmels boem...@physik.uni-kl.de
Fachbereich Physik Tel: ++49-(0)631-205-2817
Universitaet Kaiserslautern Fax: ++49-(0)631-205-3906
PGP Key fingerprint = 9F 56 54 3D 45 C1 32 6F 23 F6 C7 2F 85 93 DD 47

Dan Sugalski

unread,

Oct 28, 2002, 4:02:39 PM10/28/02

to chromatic, perl6-i...@perl.org

At 1:08 PM -0800 10/27/02, chromatic wrote:
>On Sun, 27 Oct 2002 08:54:08 -0800, Dan Sugalski wrote:
>
>These two seem highly similar:
>
>> =item Add source code to segment
>>
>> This adds a line or more of source code to the bytecode segment.
>
>> =item Add line number information
>>
>> This adds line number info to the bytecode segment, allowing the interpreter
>> to find out what line of source a particular piece of bytecode
>>corresponds to.
>
>They might be related to this one, though I'm not completely sure:
>
> > =item Add binary data chunk to segment

Nope, not really related. The engine may know about the source or
line number information--certainly if it wants to emit meaningful
error messages for the number chunk, or reparse the source for the
former. (On the off chance that something changes significantly
enough to warrant recompiling a segment of code) Debuggers might also
make some use of that, though I don't suppose they have to.

>Is there an underlying function used to add arbitrary (Unicode text) metadata
>to the bytecode?

Arbitrary metadata? Nope, no plans for that. While I can see it as a
useful thing (though it wouldn't be unicode, at least unicode
wouldn't be required) I'm not sure it's worth the time to define,
implement, and maintain.

On the other hand, if someone has a good proposal, clean API, and
generally feels strongly about it I certainly won't have any
objections.
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Dan Sugalski

unread,

Oct 28, 2002, 4:07:40 PM10/28/02

to Brent Dax, perl6-i...@perl.org

At 11:40 PM -0800 10/27/02, Brent Dax wrote:
>Dan Sugalski:
># =item Add source code to segment
>#
># This adds a line or more of source code to the bytecode segment.
>
>Optional?

Yes.

># =item Add AST to segment
>#
># This adds the AST for some of the source code to the bytecode segment.
>
>Optional?

Yes.

># =item Add line number information
>#
># This adds line number info to the bytecode segment, allowing
># the interpreter to find out what line of source a particular
># piece of bytecode corresponds to.
>
>Optional?

Yes.

># =item Add bytecode to segment
>#
># Add in actual bytecode to the segment.
>
>Not optional. :^)

No. :)

There's no reason this can't be optional, though at that point you
don't have an *executable* bytecode segment. That doesn't mean it
can't otherwise be useful.

># =item Add constants (string, PMC, and float)
>#
># Add one or more constants to the bytecode constant pool
>#
># =item Add symbols to segment
>#
># Add in symbolic information to the bytecode segment,
># including exported variables, classes, and functions
>
>Optional?

Yes.

># =item Add binary data chunk to segment
>#
># Add in some raw binary data to the bytecode segment
>
>Optional?

Yes.

>Can you have multiple raw chunks? Can they have symbolic names?
>Symbolic type names?

Yes, maybe, and maybe, respectively. I hadn't planned on anything
past "raw binary data segment #n, subtype Y" for them.

>Are the other chunks essentially subclasses of this one?

They could be considered as such, yeah, but they probably won't be in practice.

Chunks, I think, need a type and subtype attached to them. The type
for binary data will be "raw hunk 'o binary data" while the subtype
can be something else, which the program using the bytecode would
presumably have some knowledge of.

>Is there some kind of directory that tells you stuff about these chunks?

Yes, we're going to have to have one.

Hrm. Probably the former, though there is definitely an advantage to
keeping the metadata for a chunk with the data itself. Makes
extraction utilities a lot easier to work with, and makes the file a
bit less prone to damage, though arguably once it's damaged you're
pretty hosed. (OTOH, having had files creamed before, I do understand
the advantage to being able to extract *something* meaningful out of
the file, even if you can't get everything)

Dan Sugalski

unread,

Oct 28, 2002, 4:08:33 PM10/28/02

to Juergen Boemmels, Brent Dax, perl6-i...@perl.org

At 1:21 PM +0100 10/28/02, Juergen Boemmels wrote:
>I like the first one more: The data necessary to unpack a chunk is
>localized in the first chunk (the DIRECTORY). Only this part of the
>file needs to be read first. In the second case the metadata is
>scattered over the whole file.

That's fine.

>I will start modifing my patch to the new format.

Cool, thanks.

Chromatic

unread,

Oct 28, 2002, 11:40:00 AM10/28/02

to Brent Dax, perl6-i...@perl.org

On Sunday 27 October 2002 23:27, Brent Dax wrote:

> Appearances are deceiving--the first adds some (unparsed?) source code,
> the second adds information on file and line numbers, probably based on
> offset into the bytecode.

Similar in terms of implementation, that is. :) In Perl terms, I'd expect to
see:

sub add_source_code_to_segment
{
...
.add_metadata_into_segment();
}

sub add_line_info_to_segment
{
...
.add_metadata_into_segment();
}

Maybe I'm jumping too quickly into implementation details.

-- c

Dan Sugalski

unread,

Oct 29, 2002, 10:36:32 AM10/29/02

to chromatic, Brent Dax, perl6-i...@perl.org

At 8:40 AM -0800 10/28/02, chromatic wrote:
>On Sunday 27 October 2002 23:27, Brent Dax wrote:
>
>> Appearances are deceiving--the first adds some (unparsed?) source code,
>> the second adds information on file and line numbers, probably based on
>> offset into the bytecode.
>
>Similar in terms of implementation, that is. :)

Underlying implementation, sure--we'll just be jamming bytes into the
appropriate chunks of the segment, potentially in multiple calls. I'm
not sure I want to worry too much about that quite yet, though, as
I'm still trying to nail down the required functionality. :)

Chromatic

unread,

Oct 29, 2002, 12:35:09 PM10/29/02

to Dan Sugalski, perl6-i...@perl.org

On Monday 28 October 2002 13:02, Dan Sugalski wrote:

> At 1:08 PM -0800 10/27/02, chromatic wrote:

> >Is there an underlying function used to add arbitrary (Unicode text)
> > metadata to the bytecode?

> Arbitrary metadata? Nope, no plans for that. While I can see it as a
> useful thing (though it wouldn't be unicode, at least unicode
> wouldn't be required) I'm not sure it's worth the time to define,
> implement, and maintain.

> On the other hand, if someone has a good proposal, clean API, and
> generally feels strongly about it I certainly won't have any
> objections.

I'd really like to be able to save comments from source files as metadata.
This has at least two potential benefits. First, it makes it much easier to
recreate the whole file from bytecode (especially refactored bytecode).
Second, it makes it possible to pull out method documentation in the
Smalltalk or Python sense.

Maybe metadata's not the place for this, but it seems rather natural to me.

-- c

Kv org

unread,

Oct 30, 2002, 5:17:14 PM10/30/02

to perl6-i...@perl.org

On Tue, 29 Oct 2002 09:55:23 -0800, Chromatic wrote:
>
>I'd really like to be able to save comments from
>source files as metadata. This has at least two
>potential benefits. First, it >makes it much easier
>to recreate the whole file from bytecode (especially
>refactored bytecode).
>Second, it makes it possible to pull out method
>documentation in the Smalltalk or Python sense.
>
> Maybe metadata's not the place for this, but it
>seems rather natural to me.

I always thought metadata in bytecode was the place
for storing security/permission/capability related
information about the compiled chunk. If we want Perl6
and Parrot to handle security and limited code
sandboxes better than Perl5's Safe.pm, this is a basic
requirement. I suggest LPC (the object-oritented
c-like scripting langauge of LP Muds) as an example of
a scripting language with good security that can
handle multiple users inside a compilation engine
gracefully. (And at the same time, implementing LPC
wiht Parrot would be a good exercise for a more
competent reader than me).

This would add a secondary layer to the security
implemented by Safe/Opcode-like opcode filters. The
data in the segment would become part of the called
function context so that it could check if the caller
has significant capabilities to actually call the
function and fail otherwise. Loading of mudules from
within limited code sandboxes could be handled with
the same kind of security checks.

__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com

Dan Sugalski

unread,

Nov 6, 2002, 10:09:50 AM11/6/02

to perl6-i...@perl.org

At 10:17 PM +0000 10/30/02, Kv Org wrote:
>On Tue, 29 Oct 2002 09:55:23 -0800, Chromatic wrote:
>>
>>I'd really like to be able to save comments from
>>source files as metadata. This has at least two
>>potential benefits. First, it >makes it much easier
>>to recreate the whole file from bytecode (especially
>>refactored bytecode).
>>Second, it makes it possible to pull out method
>>documentation in the Smalltalk or Python sense.
>>
>> Maybe metadata's not the place for this, but it
>>seems rather natural to me.
>
>I always thought metadata in bytecode was the place
>for storing security/permission/capability related
>information about the compiled chunk. If we want Perl6
>and Parrot to handle security and limited code
>sandboxes better than Perl5's Safe.pm, this is a basic
>requirement.

Unfortunately not. (Though I really, *really* wish this was the case)
The bytecode data, all of it, must be considered completely
untrustworthy unless explicitly (and out-of-bandly) marked otherwise.
The code segment that invokes a stronger security context can be
considered out of band in this context, as it is for the code running
in the secure

The interpreter engine is responsible for enforcing security. It
*must*, when running with security turned on, assume that all
bytecode has been written by malicious vermin with too much time on
their hands and the morals (and ethics) of a rabid weasel. It just
can't be trusted, unfortunately. (Parrot bytecode is inherently
unverifiable as well, at least in the general case, which exacerbates
the problem)

Gopal V

unread,

Nov 6, 2002, 1:21:11 PM11/6/02

to perl6-i...@perl.org

If memory serves me right, Dan Sugalski wrote:
> (Parrot bytecode is inherently unverifiable as well, at least in
> the general case, which exacerbates the problem)

Hmm... Why ? ... Loose typing ?

Or does it just become an undecidability problem ?...

Gopal
--
The difference between insanity and genius is measured by success

Dan Sugalski

unread,

Nov 6, 2002, 3:44:07 PM11/6/02

to Gopal V, perl6-i...@perl.org

At 11:51 PM +0530 11/6/02, Gopal V wrote:
>If memory serves me right, Dan Sugalski wrote:
>> (Parrot bytecode is inherently unverifiable as well, at least in
>> the general case, which exacerbates the problem)
>
>Hmm... Why ? ... Loose typing ?
>
>Or does it just become an undecidability problem ?...

Loose typing is one reason, as is the potential for
self-modification, by direct code rewriting, sub definition changes,
and runtime code production. It's one of those halting problem
issues--we can't tell whether the code is safe without running it.

That's why more effort's been put into thinking about runtime
security issues, including resource limits and privilege assignment.