Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bytecode metadata

22 views
Skip to first unread message

Dan Sugalski

unread,
Jan 22, 2003, 1:27:47 PM1/22/03
to perl6-i...@perl.org
Since it looks like it's time to extend the packfile format and the
in-memory bytecode layout, this would be the time to start discussing
metadata. What sorts of metadata do people think are useful to have
in either the packfile (on disk) or in the bytecode (in memory).

Keep in mind that parrot may be in the position where it has to
ignore or mistrust the metadata, so be really cautious with things
you propose as required.
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

James Michael Dupont

unread,
Jan 23, 2003, 1:29:39 AM1/23/03
to Dan Sugalski, perl6-i...@perl.org, Dave Beckett, introspectors

--- Dan Sugalski <d...@sidhe.org> wrote:
> Since it looks like it's time to extend the packfile format and the
> in-memory bytecode layout, this would be the time to start discussing
>
> metadata. What sorts of metadata do people think are useful to have
> in either the packfile (on disk) or in the bytecode (in memory).
>
> Keep in mind that parrot may be in the position where it has to
> ignore or mistrust the metadata, so be really cautious with things
> you propose as required.

Dear Dan,

I would like to see a powerful meta-data system made possible,
even if it is not implemented immediatly. The symantic web researchers
like David Beckett and Tim Bernard-Lee have been working on powerfull
systems to support meta-data in general, maybe as the parrot meta-data
is just getting started, we can cut a bit of that off?

Take a look at the list here at Diffuse MetaData Interchange [4] at the
bottom of this mail, you will see an overview of metadata systems.
Even if they are not specific to parrot, the goals are similar in many
casess.

Recently I have been making progress with the rdf[1], specifically with
the redland application framework[2]. With the simple concept of
triples of data, a triple being (subject, predicate, object) we are
able to capture the metadata of the gcc compiler, and I hope other
compilers and systems.

Redland is written in clean c, and supports meta-data storage in
memory, and on disk in multiple formats, in rdf/xml, rdf/ntriples (even
in berkleydb). It would be possible to create a new storage model to
store the a packfile as well.

The subjects are the items in the program, the node, each getting a
number inside the system. Predicates are important, the represent the
meat of the system. The objects are either literal data or other
subjects.

Via the redland api, you can add in new statements about things, and
find all the statements about a subject, about an object, all that meet
a predicate.

I tell you this, because maybe you want to provide this sort of
flexible meta-data api into parrot :
for example the predicates that we extract that you might find
interesting :

* Filename of the node

* Line number of the node (the Column Number is not supported yet)

* Internal Type of the node (variable declaration, type, integer
const, etc), as opposed to the type of the

* Name of the node (the identifier)

* Type of the node (if it is a variable, or constant) this is a
pointer to another node

* Unsigned Type of a type, if a type supports itself being unsigned,
here it is.

* Comments are supported, but not used yet, but would be a good idea.


Now we get into more specific types of predicates

* Parameters of an expression
* Variables in a block
* Size of a variable
* Alignment of a variable
* Constant flag
* Volitile flag

then we have
* Fields of a struct
* Parameters of a function
* Return type of a function
* Body block of a function

So, with this idea of meta-data, by adding more predicates,
you can support the capturing and storage of all the source code in an
abstract form, or just the basic function data.

You will probably think that this is overkill for parrot, but I think
that it would give you an extensible system to add in new forms of
meta-data as langauges are added. Via OWL[3] the users will be able to
define the meaning and the classes of metadata as well.

mike

[1] RDF http://www.w3.org/RDF/
[2] Redland http://www.redland.opensource.ac.uk/
[3] OWL http://www.w3.org/TR/owl-absyn/
[4] Diffuse MetaData Interchange standards
http://www.diffuse.org/meta.html

=====
James Michael DuPont
http://introspector.sourceforge.net/

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

Brent Dax

unread,
Jan 23, 2003, 3:10:29 AM1/23/03
to Dan Sugalski, perl6-i...@perl.org
Dan Sugalski:
# Since it looks like it's time to extend the packfile format and the
# in-memory bytecode layout, this would be the time to start discussing
# metadata. What sorts of metadata do people think are useful to have
# in either the packfile (on disk) or in the bytecode (in memory).

I do think that, whatever "native" (i.e. understood by Parrot) metadata
we support, we *must* allow for extensibility, both for future native
metadata and for third-party tools. Moreover, this must not be
implemented with a special type of metadata block, or by using
sequentially-increasing numbers. (The first means that any metadata we
decide to add in the future will be slower than the metadata we add now;
the second has problems with several third-party tools picking the same
number.)

--Brent Dax <bren...@cpan.org>
@roles=map {"Parrot $_"} qw(embedding regexen Configure)

>How do you "test" this 'God' to "prove" it is who it says it is?
"If you're God, you know exactly what it would take to convince me. Do
that."
--Marc Fleury on alt.atheism


Chromatic

unread,
Jan 23, 2003, 2:48:34 PM1/23/03
to perl6-i...@perl.org
On Wed, 22 Jan 2003 13:27:47 +0000, Dan Sugalski wrote:

> Since it looks like it's time to extend the packfile format and the
> in-memory bytecode layout, this would be the time to start discussing
> metadata. What sorts of metadata do people think are useful to have
> in either the packfile (on disk) or in the bytecode (in memory).

Comments, if a disassembler is to be able to reconstruct the original source
sufficiently well[1].

-- c

1) for the various values of "well" that include "semantic equivalence"

Juergen Boemmels

unread,
Jan 23, 2003, 3:21:45 PM1/23/03
to Dan Sugalski, perl6-i...@perl.org
Hello,

after quite a long time away from keyboard and fighting through a huge
backlog of mail I'm (hopefully) back again.

Dan Sugalski <d...@sidhe.org> writes:

> Since it looks like it's time to extend the packfile format and the
> in-memory bytecode layout, this would be the time to start discussing
> metadata. What sorts of metadata do people think are useful to have in
> either the packfile (on disk) or in the bytecode (in memory).

My current idea for the in memory format of the bytecode is this:
One bytecodesegment is a PMC consisting of three parts the actual
bytecode (a flat array of opcode_t), the associated constants, which
don't fit into an opcode_t (floats and strings), and a scratch area
for the JITed code. All other Metadata will be attached as
properties (or maybe as elements of an aggregate). This will be an
easy way for future extension. The invoke call to this pmc would
simply start the bytecode from the first instruction.

To support inter-segment jumps a kind of symboltable is also
neccessary. All externally reachable codepoints need some special
markup. This could be a special opcode extlabel_sc or an entry in a
symboltable. Also needed is a fixup of the outgoing calls, either via
modification of the bytecode or via a jumptable. Both have their pros
and cons: The bytecode modifcation prohibits a readonly mmap of the
data on disk and the fixup needs to be done at load-time but once this
is done the impact on the runtimespeed is minimal, whereas the
jumpcode is on extra indirection. But as stated somewere else the
typical inter-segment jump will be call/tailcall/callmethod/invoke,
which are at least two indirections.

The on disk version is a matter of serializing and deserializing this
PMC.

> Keep in mind that parrot may be in the position where it has to ignore
> or mistrust the metadata, so be really cautious with things you
> propose as required.

Ok to summarize:

ByteCodeSegment = {
bytecode => requiered;
constants => only neccessary if string or num constants;
fixup => (or jumptable) only neccessary if outgoing jumps;
symbols => all possible incomming branchpoints, optional;
JIT => will be filled when bytecode is invoked;

source => surely optional;
debuginfo => also optional;
...
}

bye
boe.
--
Juergen Boemmels boem...@physik.uni-kl.de
Fachbereich Physik Tel: ++49-(0)631-205-2817
Universitaet Kaiserslautern Fax: ++49-(0)631-205-3906
PGP Key fingerprint = 9F 56 54 3D 45 C1 32 6F 23 F6 C7 2F 85 93 DD 47

Dave Mitchell

unread,
Jan 23, 2003, 3:39:21 PM1/23/03
to Juergen Boemmels, Dan Sugalski, perl6-i...@perl.org
On Thu, Jan 23, 2003 at 09:21:45PM +0100, Juergen Boemmels wrote:
> My current idea for the in memory format of the bytecode is this:

I would strongly urge any file-based byte-code format to arranged
in such a way that it (or most of it) can simply be mmap-ed in (RO),
analogously to executables.

This means that a Perl server that relies on a lot of modules, and which
forks for each connection (imagine a Perl-based web server), doesn't
consume acres of swap space just to have an in-memory image per Perl
process, of all the modules.

This is a real problem that's hitting me hard with Perl 5 in my day job.

Dave.

--
Any [programming] language that doesn't occasionally surprise the
novice will pay for it by continually surprising the expert.
- Larry Wall

James Michael Dupont

unread,
Jan 23, 2003, 4:04:57 PM1/23/03
to chromatic, perl6-i...@perl.org

Yes!
Deparsing. that would be great.

mike

James Michael Dupont

unread,
Jan 23, 2003, 4:21:11 PM1/23/03
to Dave Mitchell, Juergen Boemmels, Dan Sugalski, perl6-i...@perl.org

--- Dave Mitchell <da...@fdgroup.com> wrote:
> On Thu, Jan 23, 2003 at 09:21:45PM +0100, Juergen Boemmels wrote:
> > My current idea for the in memory format of the bytecode is this:
>
> I would strongly urge any file-based byte-code format to arranged
> in such a way that it (or most of it) can simply be mmap-ed in (RO),
> analogously to executables.
>
> This means that a Perl server that relies on a lot of modules, and
> which
> forks for each connection (imagine a Perl-based web server), doesn't
> consume acres of swap space just to have an in-memory image per Perl
> process, of all the modules.

sounds good.

could that be seen as similar to shared memory communication with the
compile,
via mem-mapped file interfaces?

mike

> This is a real problem that's hitting me hard with Perl 5 in my day
> job.
>
> Dave.
>
> --
> Any [programming] language that doesn't occasionally surprise the
> novice will pay for it by continually surprising the expert.
> - Larry Wall

Dan Sugalski

unread,
Jan 23, 2003, 1:39:03 PM1/23/03
to James Michael DuPont, perl6-i...@perl.org, Dave Beckett, introspectors
At 10:29 PM -0800 1/22/03, James Michael DuPont wrote:
>You will probably think that this is overkill for parrot,

Why yes, yes I do. On the other hand, when we hand people bazookas to
deal with their fly problems, we often find they start in on the
elephant problems as well.

The proposal in general interests me--it looks like a general
annotation system we can attach to the bytecode. (I admit, I haven't
read the page you pointed at) I will admit, though, that I was
thinking more about metadata that the engine could use itself, or
would provide to programs running on it, but the scheme you've
outlined may be useful for that.

'Swhat I get for asking a too-general question. :)

Dan Sugalski

unread,
Jan 23, 2003, 1:29:16 PM1/23/03
to Brent Dax, perl6-i...@perl.org
At 12:10 AM -0800 1/23/03, Brent Dax wrote:
>Dan Sugalski:
># Since it looks like it's time to extend the packfile format and the
># in-memory bytecode layout, this would be the time to start discussing
># metadata. What sorts of metadata do people think are useful to have
># in either the packfile (on disk) or in the bytecode (in memory).
>
>I do think that, whatever "native" (i.e. understood by Parrot) metadata
>we support, we *must* allow for extensibility, both for future native
>metadata and for third-party tools.

"Must" is an awfully strong word, there. We don't really "must" do
anything, though I do realize the feature is useful, hence my
question.

> Moreover, this must not be
>implemented with a special type of metadata block, or by using
>sequentially-increasing numbers. (The first means that any metadata we
>decide to add in the future will be slower than the metadata we add now;
>the second has problems with several third-party tools picking the same
>number.)

I'm afraid extensible metadata is going to live in its own chunk
unless someone can come up with a way to embed it without penalty.
(And I'm generally considering using separate chunks for the metadata
the engine does understand)

Juergen Boemmels

unread,
Jan 23, 2003, 4:31:55 PM1/23/03
to Dave Mitchell, Dan Sugalski, perl6-i...@perl.org
Dave Mitchell <da...@fdgroup.com> writes:

> On Thu, Jan 23, 2003 at 09:21:45PM +0100, Juergen Boemmels wrote:
> > My current idea for the in memory format of the bytecode is this:
>
> I would strongly urge any file-based byte-code format to arranged
> in such a way that it (or most of it) can simply be mmap-ed in (RO),
> analogously to executables.
>
> This means that a Perl server that relies on a lot of modules, and which
> forks for each connection (imagine a Perl-based web server), doesn't
> consume acres of swap space just to have an in-memory image per Perl
> process, of all the modules.

This might be possible if the byteorder, wordsize, defaultencoding
etc. are the same in the file on disk and the host.

bye
boe

Dan Sugalski

unread,
Jan 23, 2003, 5:05:54 PM1/23/03
to Juergen Boemmels, Dave Mitchell, perl6-i...@perl.org
At 10:31 PM +0100 1/23/03, Juergen Boemmels wrote:
>Dave Mitchell <da...@fdgroup.com> writes:
>
>> On Thu, Jan 23, 2003 at 09:21:45PM +0100, Juergen Boemmels wrote:
>> > My current idea for the in memory format of the bytecode is this:
>>
>> I would strongly urge any file-based byte-code format to arranged
>> in such a way that it (or most of it) can simply be mmap-ed in (RO),
>> analogously to executables.
>>
>> This means that a Perl server that relies on a lot of modules, and which
>> forks for each connection (imagine a Perl-based web server), doesn't
>> consume acres of swap space just to have an in-memory image per Perl
>> process, of all the modules.
>
>This might be possible if the byteorder, wordsize, defaultencoding
>etc. are the same in the file on disk and the host.

Which will generally be the case, I expect. Tell a sysadmin that they
can reduce the memory footprint of mod_parrot by 50% by running a
utility (that we provide in the parrot kit) over the library and I
expect you'll see smoke from the keyboard as he/she whips off the
command at supersonic speeds... :)

Juergen Boemmels

unread,
Jan 23, 2003, 5:31:01 PM1/23/03
to Dan Sugalski <dan@sidhe.org>Perl6 Internals
Dan Sugalski <d...@sidhe.org> writes:

> >This might be possible if the byteorder, wordsize, defaultencoding
> >etc. are the same in the file on disk and the host.
>
> Which will generally be the case, I expect. Tell a sysadmin that they
> can reduce the memory footprint of mod_parrot by 50% by running a
> utility (that we provide in the parrot kit) over the library and I
> expect you'll see smoke from the keyboard as he/she whips off the
> command at supersonic speeds... :)

It might be even possible to dump the jitted code. This would increase
the startup. Then strip the bytecode to reduce the size of the file
and TADA: Yet another new binary format.

I'm really not sure if I'm serious here

Dan Sugalski

unread,
Jan 23, 2003, 5:36:55 PM1/23/03
to Dave Mitchell, Juergen Boemmels, perl6-i...@perl.org
At 8:39 PM +0000 1/23/03, Dave Mitchell wrote:
>On Thu, Jan 23, 2003 at 09:21:45PM +0100, Juergen Boemmels wrote:
>> My current idea for the in memory format of the bytecode is this:
>
>I would strongly urge any file-based byte-code format to arranged
>in such a way that it (or most of it) can simply be mmap-ed in (RO),
>analogously to executables.

This is the way the bytecode currently works, and we will *not*
switch to any bytecode format that doesn't at least allow the
executable code to be mmapped in.

Brent Dax

unread,
Jan 23, 2003, 5:48:38 PM1/23/03
to Dan Sugalski, perl6-i...@perl.org
Dan Sugalski:
# At 12:10 AM -0800 1/23/03, Brent Dax wrote:
# >Dan Sugalski:
# ># Since it looks like it's time to extend the packfile
# format and the #
# >in-memory bytecode layout, this would be the time to start
# discussing #
# >metadata. What sorts of metadata do people think are useful
# to have #
# >in either the packfile (on disk) or in the bytecode (in memory).
# >
# >I do think that, whatever "native" (i.e. understood by
# Parrot) metadata
# >we support, we *must* allow for extensibility, both for
# future native
# >metadata and for third-party tools.
#
# "Must" is an awfully strong word, there. We don't really "must" do
# anything, though I do realize the feature is useful, hence my
# question.

A strong word for a strong opinion. :^) Besides, I did qualify it with
an "I do think", which is another way to say IMO.

# > Moreover, this must not be
# >implemented with a special type of metadata block, or by using
# >sequentially-increasing numbers. (The first means that any
# metadata we
# >decide to add in the future will be slower than the metadata we add
# >now; the second has problems with several third-party tools
# picking the
# >same
# >number.)
#
# I'm afraid extensible metadata is going to live in its own chunk
# unless someone can come up with a way to embed it without penalty.
# (And I'm generally considering using separate chunks for the metadata
# the engine does understand)

Are you expecting to have chunk type determined by order? If so, what
will you do if a future restructuring means you either don't need chunk
type X or you need a new, highly incompatible version? Will you leave
in an "empty" ghost chunk?

I would suggest (roughly) the following format for a chunk:

TYPE: One 32-bit number
VERSION: One 32-bit number; suggested usage is as four eight-bit
components
SIZE: One 32-bit number of bytes (or maybe 64-bit)
DATA: arbitrary length

For C-heads, think of it like this:

struct Chunk {
opcode_t type;
opcode_t version;
opcode_t size;
void data[];
};

Type IDs less than 256 would be reserved to Parrot (so we have plenty of
room for future expansion); all third-party tools would use some sort of
cryptographic checksum of the tool's name and the data structure's name,
making sure (of course) that their type ID was greater than 255.

If there's a directory of some sort, it should record the type ID and
the offset to the beginning of the chunk. This should allow for a
fairly quick lookup by type. If you think that there might be a demand
for multiple instances of the same type of metadata, you may want to add
a chunk ID of some sort.

Dan Sugalski

unread,
Jan 23, 2003, 5:35:51 PM1/23/03
to chromatic, perl6-i...@perl.org
At 11:48 AM -0800 1/23/03, chromatic wrote:
>On Wed, 22 Jan 2003 13:27:47 +0000, Dan Sugalski wrote:
>
>> Since it looks like it's time to extend the packfile format and the
>> in-memory bytecode layout, this would be the time to start discussing
>> metadata. What sorts of metadata do people think are useful to have
>> in either the packfile (on disk) or in the bytecode (in memory).
>
>Comments, if a disassembler is to be able to reconstruct the original source
>sufficiently well[1].

Noted. I can see problems with multiline comments across multiline
code, but that's probably rare enough to not really care much about.

James Michael Dupont

unread,
Jan 23, 2003, 6:22:12 PM1/23/03
to Juergen Boemmels, Dan Sugalski, perl6-i...@perl.org


I LIKE IT.
Bytecodes have a type? each bytecode has meta-data?
Here are the metadata I have collected from the parrot source code so
far. It should be a set of predicates to define all the other meta-data
needed.

First, this is the core meta-data for storing perl code :
in order of simplicity
identifier_node
Name of things

boolean_type,integer_type,real_type
types of things that are simple

all *_decls have a type that is a type_*
all *_decls have a name that is a type_decl or identifier_node

const_decl
Constant values
var_decl
variable values


The rest of the more complex types need a tree_list
tree_list

function_decl,
parm_decl # list of

array_type
integer_cst, # list of

enumeral_type
integer_cst # list of

record_type,union_type,
field_decl # list of

# a void is very special
void_type

The following are derived types :
pointer_type,reference_type

# function types allow for linkage
function_type,
type_* # we have a list of

# here the user defines its own
type_decl

# this is a commonly defined user type
complex_type,

Dan Sugalski

unread,
Jan 23, 2003, 6:37:26 PM1/23/03
to Brent Dax, perl6-i...@perl.org
At 2:48 PM -0800 1/23/03, Brent Dax wrote:
>Dan Sugalski:
># At 12:10 AM -0800 1/23/03, Brent Dax wrote:
># >Dan Sugalski:
># ># Since it looks like it's time to extend the packfile
># format and the #
># >in-memory bytecode layout, this would be the time to start
># discussing #
># >metadata. What sorts of metadata do people think are useful
># to have #
># >in either the packfile (on disk) or in the bytecode (in memory).
># >
># >I do think that, whatever "native" (i.e. understood by
># Parrot) metadata
># >we support, we *must* allow for extensibility, both for
># future native
># >metadata and for third-party tools.
>#
># "Must" is an awfully strong word, there. We don't really "must" do
># anything, though I do realize the feature is useful, hence my
># question.
>
>A strong word for a strong opinion. :^) Besides, I did qualify it with
>an "I do think", which is another way to say IMO.

Heh. I try and avoid the absolute statements. This is all
engineering, and engineering is applied economics--you juggle
features and make compromises to get the thing that meets your needs
as best as possible at a cost you can manage. Allowing extensibility
is Really Keen, but has its associated cost that has to be balanced
against everything else.

Having said that, I think we can do this, but I want a better feel
for what we need, what we want, and what it'll cost before we make a
decision.

>Are you expecting to have chunk type determined by order?

Yes and no. Yes in that I want the first few chunks, the ones that
are required, to be at fixed offsets. Following that will be a
directory, and from there we can index off to wherever we need to.

James Michael Dupont

unread,
Jan 23, 2003, 7:26:35 PM1/23/03
to Brent Dax, Dan Sugalski, perl6-i...@perl.org

Cool!
that means we can use opcodes to store the introspector data!

We need to have the meta data paired with the opcodes.

basically this means storing the source code in some ast form in the
meta-data for full reflection and introspection on the expression
level.


mike

Leopold Toetsch

unread,
Jan 24, 2003, 1:38:26 AM1/24/03
to Dan Sugalski, perl6-i...@perl.org
Dan Sugalski wrote:

> Since it looks like it's time to extend the packfile format and the
> in-memory bytecode layout, this would be the time to start discussing
> metadata. What sorts of metadata do people think are useful to have in
> either the packfile (on disk) or in the bytecode (in memory).

I'm currently simplifying the whole packfile routines. It still does
read the old format, but the compat code is centralized now in one place.

The main change is now this structure:
struct PackFile_funcs {
PackFile_Segment_new_func_t new_seg;
PackFile_Segment_destroy_func_t destroy;
PackFile_Segment_packed_size_func_t packed_size;
PackFile_Segment_pack_func_t pack;
PackFile_Segment_unpack_func_t unpack;
PackFile_Segment_dump_func_t dump;
};

All registered types define these funtions to make pack/unpack/dump work
for their type.
Registered types are consecutively numbered, unknown types still get
unpacked or dumped:

typedef enum {
PF_DIR_SEG,
PF_UNKNOWN_SEG,
PF_FIXUP_SEG,
PF_CONST_SEG,
PF_BYTEC_SEG,
PF_DEBUG_SEG,

PF_MAX_SEG
} pack_file_flags;

All packfiles sizes/offsets are in opcode_t not bytes for simplicity -
though this might need a conversion (but we don't seem to handle
wordsize transforms now anyway).

leo

Leopold Toetsch

unread,
Jan 24, 2003, 1:23:04 AM1/24/03
to Dave Mitchell, Juergen Boemmels, Dan Sugalski, perl6-i...@perl.org
Dave Mitchell wrote:

> On Thu, Jan 23, 2003 at 09:21:45PM +0100, Juergen Boemmels wrote:
>
>>My current idea for the in memory format of the bytecode is this:
>>
>
> I would strongly urge any file-based byte-code format to arranged
> in such a way that it (or most of it) can simply be mmap-ed in (RO),
> analogously to executables.


How many mmap's can $arch have for one program and for all?
Could we hit some limits here, if every module loaded gets (and stays)
mmap()ed.


> Dave.

leo


Dan Sugalski

unread,
Jan 24, 2003, 2:21:29 AM1/24/03
to Leopold Toetsch, Dave Mitchell, Juergen Boemmels, perl6-i...@perl.org

We certainly could, which I suppose would argue for building in
sufficient smarts to the bytecode loader to switch to file reading if
an mmap fails. It'll be slower, but working is generally a good thing.

Leopold Toetsch

unread,
Jan 24, 2003, 1:59:13 AM1/24/03
to Juergen Boemmels, Dan Sugalski <dan@sidhe.org>Perl6 Internals
Juergen Boemmels wrote:

> Dan Sugalski <d...@sidhe.org> writes:


> It might be even possible to dump the jitted code. This would increase
> the startup. Then strip the bytecode to reduce the size of the file
> and TADA: Yet another new binary format.


When you then are able to to get the same memory layout for a newly
created interpreter, it might even run ;-)


> I'm really not sure if I'm serious here
> boe


leo


Leopold Toetsch

unread,
Jan 24, 2003, 1:56:13 AM1/24/03
to Dan Sugalski, Dave Mitchell, Juergen Boemmels, perl6-i...@perl.org
Dan Sugalski wrote:

> At 8:39 PM +0000 1/23/03, Dave Mitchell wrote:

>> in such a way that it (or most of it) can simply be mmap-ed in (RO),
>> analogously to executables.
>
>
> This is the way the bytecode currently works, and we will *not* switch
> to any bytecode format that doesn't at least allow the executable code
> to be mmapped in.


s/works/should work/

The file get mmap()ed if possible, then the bytecode get's memcpy'd and
the map is munmap'd.

leo

Leopold Toetsch

unread,
Jan 24, 2003, 12:05:22 PM1/24/03
to Leopold Toetsch, Dan Sugalski, perl6-i...@perl.org
Leopold Toetsch wrote:


> I'm currently simplifying the whole packfile routines. It still does
> read the old format, but the compat code is centralized now in one place.

> Registered types are consecutively numbered, unknown types still get

> unpacked or dumped:
>
> typedef enum {
> PF_DIR_SEG,
> PF_UNKNOWN_SEG,
> PF_FIXUP_SEG,
> PF_CONST_SEG,
> PF_BYTEC_SEG,
> PF_DEBUG_SEG,
>
> PF_MAX_SEG
> } pack_file_flags;


Here is a sample dump of a packfile with file/line info generated by
$ imcc -d -o eval.pbc eval.pasm
$ pdump eval.pbc
DIRECTORY => { # 3 segments
type 3 name CONSTANT offs 0x1c length 35
type 4 name BYTECODE offs 0x40 length 14
type 5 name BYTECODE_DB offs 0x4f length 17
}
CONST => [
### snipped (as old) ###
],

BYTECODE => [ # 14 ops at ofs 0x40
0041: 00000349 00000001 00000003 00000057 00000001 00000002 00000347
0048: 00000000 00000001 00000000 00000345 0000001a 00000001 00000000
]
BYTECODE_DB => [ # 17 ops at ofs 0x4f
0050: 6c617665 7361702e 0000006d 00000001 00000002 00000003 00000004 00000005
0058: 00000006 00000000 00000000 00000000 00000000 00000000 00000000 00000000
0060: 00000000
]


(the line array is currently too big (per opcode not per ins ;-))


Anyway, packing/unpacking and dumping above packfile data is working now.

Does anybody want to have a look at the patch?
Should I check in - or send it to the list?

$ diffstat packf.diff
TODO | 5
debug.c | 2
include/parrot/packfile.h | 129 +++---
languages/imcc/TestCompiler.pm | 6
languages/imcc/imclexer.c | 2
languages/imcc/main.c | 2
languages/imcc/pbc.c | 6
languages/imcc/t/harness | 15
languages/imcc/t/syn/eval.t | 61 ++
packdump.c | 4
packfile.c | 848 ++++++++++++++++++++++-------------------
packout.c | 156 +++----
pdump.c | 23 -
13 files changed, 715 insertions(+), 544 deletions(-)


leo


Dan Sugalski

unread,
Jan 24, 2003, 12:45:31 PM1/24/03
to Dave Mitchell, Leopold Toetsch, Juergen Boemmels, perl6-i...@perl.org
At 5:32 PM +0000 1/24/03, Dave Mitchell wrote:

>On Fri, Jan 24, 2003 at 07:23:04AM +0100, Leopold Toetsch wrote:
>> How many mmap's can $arch have for one program and for all?
>> Could we hit some limits here, if every module loaded gets (and stays)
>> mmap()ed.
>
>I just wrote a quick C program that successfully mmap-ed in all 1639
>files in my Linux box's /usr/share/man/man1 directory.

Linux is not the universe, though. And what it'll do depends on the
version. We have to worry about Windows and a half-zillion other
flavors of Unix, at the very least. IIRC, some versions of BSD
weren't too thrilled about a lot of mmaps.

>Note that in Perl5 we already (indirectly) rely on the OS's ability to
>mmap in the library code for any XS-based modules.

No, we use dlopen, which isn't the same thing at all. It can be, but
doesn't have to me.

Dave Mitchell

unread,
Jan 24, 2003, 12:32:42 PM1/24/03
to Leopold Toetsch, Juergen Boemmels, Dan Sugalski, perl6-i...@perl.org
On Fri, Jan 24, 2003 at 07:23:04AM +0100, Leopold Toetsch wrote:
> How many mmap's can $arch have for one program and for all?
> Could we hit some limits here, if every module loaded gets (and stays)
> mmap()ed.

I just wrote a quick C program that successfully mmap-ed in all 1639


files in my Linux box's /usr/share/man/man1 directory.

Note that in Perl5 we already (indirectly) rely on the OS's ability to


mmap in the library code for any XS-based modules.

--
"But Sidley Park is already a picture, and a most amiable picture too.
The slopes are green and gentle. The trees are companionably grouped at
intervals that show them to advantage. The rill is a serpentine ribbon
unwound from the lake peaceably contained by meadows on which the right
amount of sheep are tastefully arranged." Lady Croom - Arcadia

Nicholas Clark

unread,
Jan 24, 2003, 6:32:16 PM1/24/03
to Brent Dax, Dan Sugalski, Leopold Toetsch, perl6-i...@perl.org, James Michael DuPont, Dave Beckett, introspectors, Juergen Boemmels, Dave Mitchell
On Thu, Jan 23, 2003 at 02:48:38PM -0800, Brent Dax wrote:

> Are you expecting to have chunk type determined by order? If so, what
> will you do if a future restructuring means you either don't need chunk
> type X or you need a new, highly incompatible version? Will you leave
> in an "empty" ghost chunk?
>
> I would suggest (roughly) the following format for a chunk:
>
> TYPE: One 32-bit number
> VERSION: One 32-bit number; suggested usage is as four eight-bit
> components
> SIZE: One 32-bit number of bytes (or maybe 64-bit)
> DATA: arbitrary length
>
> For C-heads, think of it like this:
>
> struct Chunk {
> opcode_t type;
> opcode_t version;
> opcode_t size;
> void data[];
> };

I agree with the "roughly" bit, but I'd suggest ensuring that you put
in enough bits to get data[] 64 bit aligned. Mainly because at least 1
architecture exists that has no 32 bit types (Crays I know about; others
may exist. I can't remember if perl 5.8 passes 100% of tests on Crays.
We certainly tried)

> If there's a directory of some sort, it should record the type ID and
> the offset to the beginning of the chunk. This should allow for a
> fairly quick lookup by type. If you think that there might be a demand
> for multiple instances of the same type of metadata, you may want to add
> a chunk ID of some sort.

It might be useful for making "portable" fat bytecode.

On Thu, Jan 23, 2003 at 01:39:03PM -0500, Dan Sugalski wrote:
> At 10:29 PM -0800 1/22/03, James Michael DuPont wrote:
> >You will probably think that this is overkill for parrot,
>
> Why yes, yes I do. On the other hand, when we hand people bazookas to
> deal with their fly problems, we often find they start in on the
> elephant problems as well.

No wonder the rolls of sticky elephant paper never sold.

> The proposal in general interests me--it looks like a general
> annotation system we can attach to the bytecode. (I admit, I haven't
> read the page you pointed at) I will admit, though, that I was
> thinking more about metadata that the engine could use itself, or
> would provide to programs running on it, but the scheme you've
> outlined may be useful for that.

I'm thinking that register usage information from imcc could be of use
to the JIT, as that would save it having to work out things again. So that
probably needs a segment.

Also some way of storing a cryptographic signature in the file, so that you
could compile a parrot that automatically refuses to load code that isn't
signed by you.

On Thu, Jan 23, 2003 at 05:05:54PM -0500, Dan Sugalski wrote:

> Which will generally be the case, I expect. Tell a sysadmin that they
> can reduce the memory footprint of mod_parrot by 50% by running a
> utility (that we provide in the parrot kit) over the library and I
> expect you'll see smoke from the keyboard as he/she whips off the
> command at supersonic speeds... :)

Followed by writs for claims for supersonic RSI addressed to p6i

On Fri, Jan 24, 2003 at 07:59:13AM +0100, Leopold Toetsch wrote:
> Juergen Boemmels wrote:
>
> >Dan Sugalski <d...@sidhe.org> writes:
>
>
> >It might be even possible to dump the jitted code. This would increase
> >the startup. Then strip the bytecode to reduce the size of the file
> >and TADA: Yet another new binary format.
>
>
> When you then are able to to get the same memory layout for a newly
> created interpreter, it might even run ;-)

So the JITted code contains lots of hard references to address in running
interpreter? It's not just dependent on that particular binary's layout?
I guess in future once the normal JIT works, and we've got the pigs flying
nicely then it would be possible to write a Not Just In Time compiler that
saves out assembly code and relocation instructions.

Bah. That's "parrot -o foo.o foo.pmc" isn't it?

Nicholas Clark

Leopold Toetsch

unread,
Jan 25, 2003, 4:26:22 AM1/25/03
to Nicholas Clark, Brent Dax, Dan Sugalski, perl6-i...@perl.org, James Michael DuPont, Dave Beckett, introspectors, Juergen Boemmels, Dave Mitchell
Nicholas Clark wrote:

> On Thu, Jan 23, 2003 at 02:48:38PM -0800, Brent Dax wrote:
>> struct Chunk {
>> opcode_t type;
>> opcode_t version;
>> opcode_t size;
>> void data[];
>> };
>>
>
> I agree with the "roughly" bit, but I'd suggest ensuring that you put
> in enough bits to get data[] 64 bit aligned.

>>If there's a directory of some sort, it should record the type ID and
>>the offset to the beginning of the chunk.


Putting this together, and inserting an "Id" field above, would give
alignment on a 64 bit boundary for data in PBC - assuming the strings,
data, ... are also N*64 bit wide.


> It might be useful for making "portable" fat bytecode.


As I stated, I changed all sizes/offsets to be opcode_t. Of course this
breaks reading 32 bit PBC on machines with 64 bit opcode_t - but this
was already broken before, e.g.:

header->magic = PackFile_fetch_op(self, cursor++);

If we want this portable, it probably should kook like

header->magic = PackFile_fetch_op(self, &cursor);

where the _fetch_xx has to advance the cursor by the PBC defined wordsize.

A _fetch_cstring and a _fetch_n_opcodes would also be handy. And for the
latter, if the packfile is mmap()ed, it shouldn't fetch anything, but
just set up the code pointer, advance the cursor, and remember, that the
code_segment->code field should better not be freed at destroy time.


> I'm thinking that register usage information from imcc could be of use
> to the JIT, as that would save it having to work out things again. So that
> probably needs a segment.


Yep. imcc does the whole CFG and life analysis, which JIT is doing
again. At least basic blocks and register usage could be passed. Though
register life range in JIT is different and depends on $arch. Calling
(JIT) external functions ends a registers life, so it must be saved
before calling and restored after.


> Also some way of storing a cryptographic signature in the file, so that you
> could compile a parrot that automatically refuses to load code that isn't
> signed by you.


The palladium parrot :)


>>Juergen Boemmels wrote:


>>>It might be even possible to dump the jitted code.

>>When you then are able to to get the same memory layout for a newly

>>created interpreter, it might even run ;-)

> So the JITted code contains lots of hard references to address in running
> interpreter? It's not just dependent on that particular binary's layout?


JIT/i386 does call parrot functions directly e.g. pmc_new_noinit or
string_make, so these would need relocation - or probably slightly
slower but simpler to handle a jump table. We (all? JIT $arch) have at
least one register pointing to parrot data. Including a jump table there
for used parrot functions would do it.


> I guess in future once the normal JIT works, and we've got the pigs flying
> nicely then it would be possible to write a Not Just In Time compiler that
> saves out assembly code and relocation instructions.
>
> Bah. That's "parrot -o foo.o foo.pmc" isn't it?


*g*


> Nicholas Clark


leo

Nicholas Clark

unread,
Jan 25, 2003, 6:52:14 AM1/25/03
to Leopold Toetsch, Brent Dax, Dan Sugalski, perl6-i...@perl.org, James Michael DuPont, Dave Beckett, introspectors, Juergen Boemmels, Dave Mitchell
On Sat, Jan 25, 2003 at 10:26:22AM +0100, Leopold Toetsch wrote:
> Nicholas Clark wrote:

> >Also some way of storing a cryptographic signature in the file, so that you
> >could compile a parrot that automatically refuses to load code that isn't
> >signed by you.
>
>
> The palladium parrot :)

naa. I said "signed by you", not "signed by the RIAA^WMPAA^WMicrosoft"

Nicholas Clark

Leopold Toetsch

unread,
Jan 25, 2003, 8:49:35 AM1/25/03
to Dan Sugalski, Dave Mitchell, Juergen Boemmels, perl6-i...@perl.org
Dan Sugalski wrote:

> At 5:32 PM +0000 1/24/03, Dave Mitchell wrote:
>
>> I just wrote a quick C program that successfully mmap-ed in all 1639
>> files in my Linux box's /usr/share/man/man1 directory.
>
>
> Linux is not the universe, though.


I have it changed to use mmap() bytecode (other segments, with have a
similar thing (i.e. size and opcode_t[size]) will be mmaped too).

If mmap'ing the packfile fails, a fallback to IO reading is there.

leo

Leopold Toetsch

unread,
Jan 25, 2003, 9:13:28 AM1/25/03
to Nicholas Clark, Brent Dax, Dan Sugalski, perl6-i...@perl.org, James Michael DuPont, Dave Beckett, introspectors, Juergen Boemmels, Dave Mitchell
Nicholas Clark wrote:


Yes, of course. I would do this with a personalized version of
fingerprint.c and generate a separate executable.


> Nicholas Clark


leo


Sean O'Rourke

unread,
Jan 25, 2003, 9:18:47 AM1/25/03
to Leopold Toetsch, Dan Sugalski, Dave Mitchell, Juergen Boemmels, perl6-i...@perl.org
On Sat, 25 Jan 2003, Leopold Toetsch wrote:
> Dan Sugalski wrote:
>
> > At 5:32 PM +0000 1/24/03, Dave Mitchell wrote:
> >
> >> I just wrote a quick C program that successfully mmap-ed in all 1639
> >> files in my Linux box's /usr/share/man/man1 directory.
> >
> >
> > Linux is not the universe, though.

How true. On Solaris, for example, mmap's are aligned on 64k boundaries,
which leads to horrible virtual address space consumption when you map
lots of small things. If we're mmap()ing things, we want to be sure
they're fairly large.

/s

Jason Gloudon

unread,
Jan 25, 2003, 10:04:37 AM1/25/03
to Dave Mitchell, perl6-i...@perl.org
On Thu, Jan 23, 2003 at 08:39:21PM +0000, Dave Mitchell wrote:

> This means that a Perl server that relies on a lot of modules, and which
> forks for each connection (imagine a Perl-based web server), doesn't
> consume acres of swap space just to have an in-memory image per Perl
> process, of all the modules.

Are you sure the swap space allocation isn't mostly attributable to the poor
locality in the Perl process's data structures ?

--
Jason

Dave Mitchell

unread,
Jan 25, 2003, 6:43:40 PM1/25/03
to Sean O'Rourke, Leopold Toetsch, Dan Sugalski, Juergen Boemmels, perl6-i...@perl.org

Okay, I just ran a program on a a Solaris machines that mmaps in each
of 571 man files 20 times (a total of 11420 mmaps). The process size
was 181Mb, but the total system swap available only decreased by 1.2Mb
(since files mmapped in RO effecctively don't consume swap).

I think Solaris and Linux can both cut this. If other OSes can't, then
we fallback to reading in the file when necessary.

--
Lady Nancy Astor: If you were my husband, I would flavour your coffee
with poison.
Churchill: Madam - if I were your husband, I would drink it.

Dave Mitchell

unread,
Jan 25, 2003, 6:46:04 PM1/25/03
to Jason Gloudon, perl6-i...@perl.org

I was using swap space as a loose term to mean virutal memory consumption
- ie that resource which necessitates buying more RAM and/or swap disks.
The locality wasn't a proplem.

--
A walk of a thousand miles begins with a single step...
then continues for another 1,999,999 or so.

Nicholas Clark

unread,
Jan 25, 2003, 7:40:19 PM1/25/03
to Sean O'Rourke, Leopold Toetsch, Dan Sugalski, Juergen Boemmels, perl6-i...@perl.org
On Sat, Jan 25, 2003 at 11:43:40PM +0000, Dave Mitchell wrote:
> On Sat, Jan 25, 2003 at 06:18:47AM -0800, Sean O'Rourke wrote:
> > On Sat, 25 Jan 2003, Leopold Toetsch wrote:
> > > Dan Sugalski wrote:
> > >
> > > > At 5:32 PM +0000 1/24/03, Dave Mitchell wrote:
> > > >
> > > >> I just wrote a quick C program that successfully mmap-ed in all 1639
> > > >> files in my Linux box's /usr/share/man/man1 directory.
> > > >
> > > >
> > > > Linux is not the universe, though.

There's always NetBSD if Linux won't run on your hardware :-)
<ducks>

> > How true. On Solaris, for example, mmap's are aligned on 64k boundaries,
> > which leads to horrible virtual address space consumption when you map
> > lots of small things. If we're mmap()ing things, we want to be sure
> > they're fairly large.
>
> Okay, I just ran a program on a a Solaris machines that mmaps in each
> of 571 man files 20 times (a total of 11420 mmaps). The process size
> was 181Mb, but the total system swap available only decreased by 1.2Mb
> (since files mmapped in RO effecctively don't consume swap).

11420 simultaneous mmaps in the same process? (just checking that I
understand you)

> I think Solaris and Linux can both cut this. If other OSes can't, then
> we fallback to reading in the file when necessary.

Maybe I'm paranoid (or even plain wrong) but we (parrot) can handle it
if an mmap fails - we just automatically fall back to plain file loading.
Can dlopen() cope if an mmap fails? Or on a platform which can only
do a limited number of mmaps do we run the danger of exhausting them early
with all our bytecode segments, and then the first time someone attempts
a require POSIX; it fails because the perl6 DynaLoader can't dlopen
POSIX.so? (And by then we've done our could-have-been-plain-loaded
mmaps, so it's too late to adapt)

Nicholas Clark

Dave Mitchell

unread,
Jan 25, 2003, 7:45:33 PM1/25/03
to Nicholas Clark, Sean O'Rourke, Leopold Toetsch, Dan Sugalski, Juergen Boemmels, perl6-i...@perl.org
On Sun, Jan 26, 2003 at 12:40:19AM +0000, Nicholas Clark wrote:
> On Sat, Jan 25, 2003 at 11:43:40PM +0000, Dave Mitchell wrote:
> > Okay, I just ran a program on a a Solaris machines that mmaps in each
> > of 571 man files 20 times (a total of 11420 mmaps). The process size
> > was 181Mb, but the total system swap available only decreased by 1.2Mb
> > (since files mmapped in RO effecctively don't consume swap).
>
> 11420 simultaneous mmaps in the same process? (just checking that I
> understand you)

yep, exactly that. Src code included below.

> Maybe I'm paranoid (or even plain wrong) but we (parrot) can handle it
> if an mmap fails - we just automatically fall back to plain file loading.
> Can dlopen() cope if an mmap fails? Or on a platform which can only
> do a limited number of mmaps do we run the danger of exhausting them early
> with all our bytecode segments, and then the first time someone attempts
> a require POSIX; it fails because the perl6 DynaLoader can't dlopen
> POSIX.so? (And by then we've done our could-have-been-plain-loaded
> mmaps, so it's too late to adapt)

If there's such a platform, then presumably we don't bother mmap at all
for that platform.


to run: cd to a man directory, then C</tmp/foo *>


#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>

main(int argc, char *argv[])
{
int i,j;
int fd;
off_t size;
void *p;
struct stat st;
for (j=0; j<20; j++) {
for (i=1; i<argc; i++) {
fd = open(argv[i], O_RDONLY);
if (fd == -1) {
perror("open"); exit(1);
}
if (fstat(fd, &st) == -1) {
perror("fstat"); exit(1);
}
size = st.st_size;
/* printf("%d %5d %s\n", i, size, argv[i]); */

p = mmap(0, size, PROT_READ, MAP_SHARED, fd, 0);
if (p < 0) {
perror("mmap"); exit(1);
}

close(fd);
}
printf("done loop %d\n",j);
}
sleep(1000);

Sean O'Rourke

unread,
Jan 25, 2003, 8:38:08 PM1/25/03
to Dave Mitchell, Sean O'Rourke, Leopold Toetsch, Dan Sugalski, Juergen Boemmels, perl6-i...@perl.org
On Sat, 25 Jan 2003, Dave Mitchell wrote:
> On Sat, Jan 25, 2003 at 06:18:47AM -0800, Sean O'Rourke wrote:
> > On Sat, 25 Jan 2003, Leopold Toetsch wrote:
> > > Dan Sugalski wrote:
> > >
> > > > At 5:32 PM +0000 1/24/03, Dave Mitchell wrote:
> > > >
> > > >> I just wrote a quick C program that successfully mmap-ed in all 1639
> > > >> files in my Linux box's /usr/share/man/man1 directory.
> > > >
> > > >
> > > > Linux is not the universe, though.
> >
> > How true. On Solaris, for example, mmap's are aligned on 64k boundaries,
> > which leads to horrible virtual address space consumption when you map
> > lots of small things. If we're mmap()ing things, we want to be sure
> > they're fairly large.
>
> Okay, I just ran a program on a a Solaris machines that mmaps in each
> of 571 man files 20 times (a total of 11420 mmaps). The process size
> was 181Mb, but the total system swap available only decreased by 1.2Mb
> (since files mmapped in RO effecctively don't consume swap).

The problem's actually _virtual_ memory use/fragmentation, not physical
memory or swap. Say you map in 10k small files -- that's 640M virtual
memory, just over a fourth of what's available. Now let's say you're also
using mmap() in your webserver to send large (10M) files quickly over the
network. The small files, if they're long-lived get scattered all over
VA-space, so there's a non-trivial chance that the OS won't be able to
find a 10MB chunk of free addresses at some point.

To see it, you might try changing your program to map and unmap a large
file periodically while mapping the man pages. Then take a look at the
process's address space with /usr/proc/bin/pmap to see what the OS is
doing with the maps.

Weird, I know, but that's why it stuck in my mind. You have to map quite
a few files to get this to happen, but it's a real possibility with a
32-bit address space and a long-running process that does many small
mmap()s and some large ones.

Anyways...

/s

Leopold Toetsch

unread,
Jan 25, 2003, 12:42:14 PM1/25/03
to Sean O'Rourke, Dan Sugalski, Dave Mitchell, Juergen Boemmels, perl6-i...@perl.org
Sean O'Rourke wrote:

> How true. On Solaris, for example, mmap's are aligned on 64k boundaries,
> which leads to horrible virtual address space consumption when you map
> lots of small things. If we're mmap()ing things, we want to be sure
> they're fairly large.


Is one PBC file a small thing? Or in other words, should we have a low
limit where we start again to malloc and copy PBC files?
Configure option? Commandline switch?


> /s


leo


Dave Mitchell

unread,
Jan 26, 2003, 8:15:37 AM1/26/03
to Sean O'Rourke, Leopold Toetsch, Dan Sugalski, Juergen Boemmels, perl6-i...@perl.org
On Sat, Jan 25, 2003 at 05:38:08PM -0800, Sean O'Rourke wrote:
> The problem's actually _virtual_ memory use/fragmentation, not physical
> memory or swap. Say you map in 10k small files -- that's 640M virtual
> memory, just over a fourth of what's available. Now let's say you're also
> using mmap() in your webserver to send large (10M) files quickly over the
> network. The small files, if they're long-lived get scattered all over
> VA-space, so there's a non-trivial chance that the OS won't be able to
> find a 10MB chunk of free addresses at some point.

Yeah, but in pratice, most, if not all the small files will mapped in at
startup. It's no different than the situation at the moment on Solaris
where XS modules require the .so object to be mmapped in.

> Weird, I know, but that's why it stuck in my mind. You have to map quite
> a few files to get this to happen, but it's a real possibility with a
> 32-bit address space and a long-running process that does many small
> mmap()s and some large ones.

But we'll all be using 64-bit processors by the time parrot's released :-)

--
This email is confidential, and now that you have read it you are legally
obliged to shoot yourself. Or shoot a lawyer, if you prefer. If you have
received this email in error, place it in its original wrapping and return
for a full refund. By opening this email, you accept that Elvis lives.

Sean O'Rourke

unread,
Jan 26, 2003, 2:18:51 PM1/26/03
to Leopold Toetsch, Sean O'Rourke, Dan Sugalski, Dave Mitchell, Juergen Boemmels, perl6-i...@perl.org
On Sat, 25 Jan 2003, Leopold Toetsch wrote:
> Is one PBC file a small thing? Or in other words, should we have a low
> limit where we start again to malloc and copy PBC files?
> Configure option? Commandline switch?

Maybe a config option? The app I'm thinking of was pathological, in that
it mapped in thousands of 20-byte files. Now that I think about it,
unless someone implements something very strangely (or has absolutely
enormous numbers of threads) this shouldn't be an issue.

/s

James Mastros

unread,
Jan 26, 2003, 4:06:38 PM1/26/03
to perl6-i...@perl.org, Leopold Toetsch, Nicholas Clark, Brent Dax, Dan Sugalski, perl6-i...@perl.org, James Michael DuPont, Dave Beckett, introspectors, Juergen Boemmels, Dave Mitchell
On 01/25/2003 4:26 AM, Leopold Toetsch wrote:
> Nicholas Clark wrote:
>> Also some way of storing a cryptographic signature in the file, so
>> that you
>> could compile a parrot that automatically refuses to load code that isn't
>> signed by you.
> The palladium parrot :)
Just because it's possible to use a technology for evil doesn't mean you
shouldn't create it. I think it would be quite useful to define a
standard for signed PBC. It doesn't need to be complex; just define a
new packfile section, SIGNATURE, that is defined to be a cryptographic
signature of all sections previous to it in the file. (We'd have to
exclude certian parts of the header, or otherwise work around
chicken-and-egg problems with the signed header changing in the act of
attaching the signature, but those are long-since-solved problems.)

In purticlar, it would be nice to be able to trust code written by
myself and people I personaly trust, run CPAN code in checked mode, run
code submited by users without access to create IO PMCs, and not run
Micorosoft code at all.

A code signing standard would enable that. It's defining a trust model
that doesn't let the user know what's actualy going on that we have to
be wary of. (Even authenticating the host is potentialy useful...
though I can't think of a good use.)

-=- James Mastros

James Mastros

unread,
Jan 26, 2003, 4:14:23 PM1/26/03
to perl6-i...@perl.org, Sean O'Rourke, Leopold Toetsch, Dan Sugalski, Dave Mitchell, Juergen Boemmels, perl6-i...@perl.org
On 01/26/2003 2:18 PM, Sean O'Rourke wrote:
> Maybe a config option? The app I'm thinking of was pathological, in that
> it mapped in thousands of 20-byte files. Now that I think about it,
> unless someone implements something very strangely (or has absolutely
> enormous numbers of threads) this shouldn't be an issue.
Might I suggest that we make sure we can deal sanely with either mmaping
or reading PBC files, and then worry about this later, like when
somebody actualy finds it being a problem in real use?

-=- James Mastros

Juergen Boemmels

unread,
Jan 27, 2003, 2:50:56 PM1/27/03
to Nicholas Clark, Brent Dax, Dan Sugalski, Leopold Toetsch, perl6-i...@perl.org, James Michael DuPont, Dave Beckett, introspectors, Dave Mitchell
Nicholas Clark <ni...@unfortu.net> writes:

[...]

> > struct Chunk {
> > opcode_t type;
> > opcode_t version;
> > opcode_t size;
> > void data[];
> > };

will this ever compile?
void data[] is not allowed, and even char data[] is an incomplete
type, so its not allowed in a structure definition. A void * data
pointer seems more appropriate. This way its possible to have one
TableOfContent for the whole bytecode file and every chunk of the file
can be accessed in constant time (no need to scan over the complete
file to reach the last chunk)

> I agree with the "roughly" bit, but I'd suggest ensuring that you put
> in enough bits to get data[] 64 bit aligned. Mainly because at least 1
> architecture exists that has no 32 bit types (Crays I know about; others
> may exist. I can't remember if perl 5.8 passes 100% of tests on Crays.
> We certainly tried)

opcode_t will be 64 bit on this architectures.

[...]

> I'm thinking that register usage information from imcc could be of use
> to the JIT, as that would save it having to work out things again. So that
> probably needs a segment.
>
> Also some way of storing a cryptographic signature in the file, so that you
> could compile a parrot that automatically refuses to load code that isn't
> signed by you.

These ideas show clearly one thing:
The typecode must be extendible.

[...]


> > >It might be even possible to dump the jitted code. This would increase
> > >the startup. Then strip the bytecode to reduce the size of the file
> > >and TADA: Yet another new binary format.

> > When you then are able to to get the same memory layout for a newly
> > created interpreter, it might even run ;-)
>
> So the JITted code contains lots of hard references to address in running
> interpreter? It's not just dependent on that particular binary's
> layout?

And if there are two interpreters in the same process (isn't that the
supposed way of multiple threads) each one has to compile the same
code again?

> I guess in future once the normal JIT works, and we've got the pigs flying
> nicely then it would be possible to write a Not Just In Time compiler that
> saves out assembly code and relocation instructions.
>
> Bah. That's "parrot -o foo.o foo.pmc" isn't it?

And if we make C a parrot supported language we can even build parrot
with parrot?

bye
boe
--
Juergen Boemmels boem...@physik.uni-kl.de
Fachbereich Physik Tel: ++49-(0)631-205-2817
Universitaet Kaiserslautern Fax: ++49-(0)631-205-3906
PGP Key fingerprint = 9F 56 54 3D 45 C1 32 6F 23 F6 C7 2F 85 93 DD 47

Andy Dougherty

unread,
Jan 27, 2003, 3:22:37 PM1/27/03
to Juergen Boemmels, Nicholas Clark, Perl6 Internals
On 27 Jan 2003, Juergen Boemmels wrote:

> Nicholas Clark <ni...@unfortu.net> writes:

> > > struct Chunk {
> > > opcode_t type;
> > > opcode_t version;
> > > opcode_t size;
> > > void data[];
> > > };

> > I agree with the "roughly" bit, but I'd suggest ensuring that you put


> > in enough bits to get data[] 64 bit aligned. Mainly because at least 1
> > architecture exists that has no 32 bit types (Crays I know about; others
> > may exist. I can't remember if perl 5.8 passes 100% of tests on Crays.
> > We certainly tried)
>
> opcode_t will be 64 bit on this architectures.

For native bytecode, yes. However, consider a bytecode file generated on
a machine with a 32-bit opcode_t that is now being read on a machine with
a 64-bit opcode_t. In that case, it would be helpful if the data were
aligned on a 64-bit boundary.

--
Andy Dougherty doug...@lafayette.edu

Leopold Toetsch

unread,
Jan 27, 2003, 3:53:23 PM1/27/03
to Juergen Boemmels, Nicholas Clark, Dan Sugalski, perl6-i...@perl.org
Juergen Boemmels wrote:

> Nicholas Clark <ni...@unfortu.net> writes:


>>> struct Chunk {
>>> opcode_t type;
>>> opcode_t version;
>>> opcode_t size;
>>> void data[];
>>> };
> will this ever compile?


It's similar to "opcode_t *data". If size == 0, no data follow in byte
stream. byte_code_{un,}pack is implemented like this now.


>>I agree with the "roughly" bit, but I'd suggest ensuring that you put
>>in enough bits to get data[] 64 bit aligned.

> opcode_t will be 64 bit on this architectures.


PBC segments and above data are aligned on 4*sizeof(opcode_t) boundary.


> The typecode must be extendible.


If it does follow above conventions not. Only a unique name would be
necessary. But, yes in the long run.


> And if there are two interpreters in the same process (isn't that the
> supposed way of multiple threads) each one has to compile the same
> code again?


No: interpreter->code of both points to the same data and JIT code
already lives in the packfile now.


> And if we make C a parrot supported language we can even build parrot
> with parrot?


And if it runs, yes.

> bye
> boe


leo

James Michael Dupont

unread,
Jan 27, 2003, 10:39:03 PM1/27/03
to Juergen Boemmels, Nicholas Clark, Brent Dax, Dan Sugalski, Leopold Toetsch, perl6-i...@perl.org, James Michael DuPont, Dave Beckett, introspectors, Dave Mitchell

--- Juergen Boemmels <boem...@physik.uni-kl.de> wrote:
> Nicholas Clark <ni...@unfortu.net> writes:

> > I guess in future once the normal JIT works, and we've got the pigs
> flying
> > nicely then it would be possible to write a Not Just In Time
> compiler that
> > saves out assembly code and relocation instructions.
> >
> > Bah. That's "parrot -o foo.o foo.pmc" isn't it?
>
> And if we make C a parrot supported language we can even build parrot
> with parrot?

I was just thinking that myself. There are two issues here :

1. The gcc : I have %99 of the information about the function bodies of
parrot c source code in rdf/xml. That could be fed to parrot.

2. The pnet/C : there has been work done by Rhys to make a managed c
compiler for pnet. Gopal has been working on a parrot bytecode emitter
for Pnet.

mike

=====
James Michael DuPont
http://introspector.sourceforge.net/

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

mar...@kurahaupo.gen.nz

unread,
Jan 27, 2003, 6:26:18 PM1/27/03
to James Mastros
On Sun, 26 Jan 2003, James Mastros wrote:
> just define a new packfile section, SIGNATURE, that is defined to be a
> cryptographic signature of all sections previous to it in the file.

I'm battling with this in another file format at the moment; if possible can
we please *not* have it sensitive to its own location in the file?

For example, an auto-dearchive zip-file has its index at the end of the
file, so that the code can go at the front. It would be nice if the whole
archive could be signed, rather than just the dearchiving code.

My suggestion: make the signature define which other parts of the file it
applies to, say with a list of region boundaries as byte addresses in the
file; that way signature manipulation remains fairly simple, and it's not
too hard to check that a given section is spanned by a signature. And you
could have multiple signatures applying to different parts of the file (one
to the zip archive, another to the unarchiver).

And how is this going to interact with "-T" or whatever we're going to use?
Under my suggested scheme, the data would be untainted if it's covered by a
verified signature, and tainted if not.

-Martin


Gopal V

unread,
Jan 28, 2003, 8:53:07 AM1/28/03
to James Michael DuPont, Perl 6 Internals Mailing List
If memory serves me right, James Michael DuPont wrote:
> > > Bah. That's "parrot -o foo.o foo.pmc" isn't it?
> >
> > And if we make C a parrot supported language we can even build parrot
> > with parrot?

Hmmm... bootstrapping ....

> 1. The gcc : I have %99 of the information about the function bodies of
> parrot c source code in rdf/xml. That could be fed to parrot.

That would only be part of the issue ... generating stuff out of
RDF AST's is only half of our trouble ... In fact , I think it would
be much easier if someone managed to convert RTL into Parrot (a gcc
backend) ...

It won't be a "hack" like egcs-jvm since Parrot is already a register
machine and has pointer instructions already ...

> 2. The pnet/C : there has been work done by Rhys to make a managed c
> compiler for pnet. Gopal has been working on a parrot bytecode emitter
> for Pnet.

Well ... I haven't been "working" on parrot bytecode emitter ...
It's just that we have a "pm_codegen.tc" treecc handler inside parrot
which is stubbed up ... until I can do some kind of class generation
for parrot, the C# AST cannot be used for anything useful.

Also Dan was not so hot about having a C compiler for Parrot when he
met Rhys on IRC...

So I'm sticking to compiling C# to Parrot as an aim , and *maybe* Java
as well ...

Gopal
--
The difference between insanity and genius is measured by success

James Michael Dupont

unread,
Feb 4, 2003, 4:06:18 AM2/4/03
to James Mastros, perl6-i...@perl.org, Leopold Toetsch, Nicholas Clark, Brent Dax, Dan Sugalski, perl6-i...@perl.org, James Michael DuPont, Dave Beckett, introspectors, Juergen Boemmels, Dave Mitchell
Dear All,
I just wanted to ask about a conclusion on the bytecode metadata.

Here are the things I would like to know about a given bytecode :
what line (maybe column) it comes from
Possible comments about it.

If it is a method call, what is the method name,signature,locatoin
If it is a variable or constant, what is the variable name, type, size
If it is a expression , what is the type of it, the size
For a given type, the name, size would be great to store.

Is it going to be possible to store this data in the meta-data,
it does not have to be all there at once, but will the framework handle
it?
Hopefully you have answered this already, and you can just say, rtfm.
Thanks for you patience, i am a bit slow today.

Leopold Toetsch

unread,
Feb 4, 2003, 5:36:47 AM2/4/03
to James Michael DuPont, perl6-i...@perl.org
James Michael DuPont wrote:

> Dear All,
> I just wanted to ask about a conclusion on the bytecode metadata.
>
> Here are the things I would like to know about a given bytecode :
> what line (maybe column) it comes from


File/line information is already there (imcc -d -o...) and working.


> If it is a method call, what is the method name,signature,locatoin
> If it is a variable or constant, what is the variable name, type, size
> If it is a expression , what is the type of it, the size
> For a given type, the name, size would be great to store.
>
> Is it going to be possible to store this data in the meta-data,
> it does not have to be all there at once, but will the framework handle
> it?


Yep. The framework can now handle all kinds of information in the PBC,
though the details have to be determined. For actually doing something
useful with these data we probably need a PBC PMC class, which can do
something with this data at PASM level and if possible routines like in
jit_debug.c which handle such information over to the debugger.


> Hopefully you have answered this already, and you can just say, rtfm.


Some minutes ago, I did check in a major update of docs/parrotbyte.pod.


> mike


leo


gre...@focusresearch.com

unread,
Feb 4, 2003, 6:53:59 AM2/4/03
to James Michael DuPont, Juergen Boemmels, Brent Dax, Dan Sugalski, Dave Mitchell, Dave Beckett, introspectors, James Mastros, Leopold Toetsch, James Michael DuPont, Nicholas Clark, perl6-i...@perl.org
Mike --

Thats a lot of metadata. Sounds like maybe the metadata is primary
and the bytecode is secondary, in which case perhaps what you
really want is a (metadata) tree decorated with bytecode rather than
a (bytecode) array decorated with metadata.

Of course, the most natural candidate for the metadata would be the
annotated (file & line, etc.) parse tree, or some approximation to it
after compilation-related transformations.

I can imagine a process that loads the tree, and linearizes the
bytecode with the metadata consisting of backpointers to nodes of
the tree, either in band as escaped noop-equivalent bytecode or
out of band in an offset-pointer table.

With a suitable amount of forethought on the tree representation,
you should be able to have good flexibility while still having enough
standardization on how tree-emitting compilers represent typical
debug-related metadata (file, line, etc.) that debuggers and other
tools could be generic.


Regards,

-- Gregor

James Michael DuPont <mdupo...@yahoo.com>
02/04/2003 04:06 AM


To: James Mastros <ja...@mastros.biz>, perl6-i...@perl.org, Leopold
Toetsch <l...@toetsch.at>
cc: Nicholas Clark <ni...@unfortu.net>, Brent Dax <bren...@cpan.org>, Dan
Sugalski <d...@sidhe.org>, perl6-i...@perl.org, James Michael DuPont
<mdupo...@yahoo.com>, Dave Beckett <dave.b...@bristol.ac.uk>,
introspectors <introspecto...@lists.sourceforge.net>, Juergen
Boemmels <boem...@physik.uni-kl.de>, Dave Mitchell <da...@fdgroup.com>
Subject: Re: Bytecode metadata


Dear All,
I just wanted to ask about a conclusion on the bytecode metadata.

Here are the things I would like to know about a given bytecode :
what line (maybe column) it comes from

Possible comments about it.

If it is a method call, what is the method name,signature,locatoin
If it is a variable or constant, what is the variable name, type, size
If it is a expression , what is the type of it, the size
For a given type, the name, size would be great to store.

Is it going to be possible to store this data in the meta-data,
it does not have to be all there at once, but will the framework handle
it?

Hopefully you have answered this already, and you can just say, rtfm.

Juergen Boemmels

unread,
Feb 4, 2003, 8:15:29 AM2/4/03
to gre...@focusresearch.com, Perl6 Internals
gre...@focusresearch.com writes:

> Mike --
>
> Thats a lot of metadata. Sounds like maybe the metadata is primary
> and the bytecode is secondary, in which case perhaps what you
> really want is a (metadata) tree decorated with bytecode rather than
> a (bytecode) array decorated with metadata.

The bytecode is primary. This is whats get executed, this is what
needs too be fast (both in startup time and runtime). Some kind of
data is necessary for the bytecode, such as the string
constants. These need also be accessed fast (don't know if this is
called metadata, this is more data). The metadata is only needed in
rare cases e.g. debugging, so it doesn't need to be as fast (but even
here speed is nice)

> Of course, the most natural candidate for the metadata would be the
> annotated (file & line, etc.) parse tree, or some approximation to it
> after compilation-related transformations.
>
> I can imagine a process that loads the tree, and linearizes the
> bytecode with the metadata consisting of backpointers to nodes of
> the tree, either in band as escaped noop-equivalent bytecode or
> out of band in an offset-pointer table.

Bytecode reading must be fast. Ideally it is mmap and start.
Treewalking for bytecodegeneration should be done by the compiler.

> With a suitable amount of forethought on the tree representation,
> you should be able to have good flexibility while still having enough
> standardization on how tree-emitting compilers represent typical
> debug-related metadata (file, line, etc.) that debuggers and other
> tools could be generic.

The tree metadata can sure be some kind of intermediate output of the
compiler (the output of the compiler front end), but normaly this
should be fed into a backend which generates fast running bytecode or
even native code.

bye
b.

gre...@focusresearch.com

unread,
Feb 4, 2003, 8:42:20 AM2/4/03
to Juergen Boemmels, Perl6 Internals
b. --

I agree that under normal circumstances the bytecode is primary.
I was observing that as more and more metadata is considered,
eventually its quantity (measured, say, in bytes) could approach
or even exceed that of the raw bytecode. In cases where one
would feel such a quantity of metadata is needed, it may not
always be necessary to get greased-weasel speed-of-loading
(but, see below).

I understand the the mmap-and-go idea, although it doesn't
always work out even when mmap is available (for example,
prederef requires a side pointer-array to store its prederef
results). Sometimes its mmap-mumble-go (but, see below).


Certainly, there is nothing to prevent one from having
the linearized bytecode pregenerated in the PBC file even
when a metadata tree is also present (the tree could reference
contiguous chunks of that bytecode by offset-size pairs). If
you don't care about the tree, you don't process it. If you do
process it, you probably produce an index data structure mapping
byte code offsets to tree nodes for the debugger. I believe
we retain high speed with this approach.


We do need to consider how the metadata makes it from the
compiler *through* IMCC to land in the PBC file. The compiler
needs to be able to produce annotated input to IMCC, and IMCC
needs to be able to retain the annotations while it makes its
code improvements and rendering (register mapping, etc.).
I'm thinking that, too, could possibly be a tree. IMCC can pick out
the chunks of IMC, generate bytecode, and further annotate the
tree with the offset and size of the generated PBC chunk. The
tree can be retained as the metadata segment in the PBC file.


Regards,

-- Gregor

Juergen Boemmels <boem...@physik.uni-kl.de>
Sent by: boem...@physik.uni-kl.de
02/04/2003 08:15 AM


To: gre...@focusresearch.com
cc: Perl6 Internals <perl6-i...@perl.org>
Subject: Re: Bytecode metadata

James Michael Dupont

unread,
Feb 4, 2003, 9:22:46 AM2/4/03
to gre...@focusresearch.com, Juergen Boemmels, Perl6 Internals
Juergen,

I completly agree with you. For my needs, the meta-data does not have
to be loaded at the same time at all. I can be in a different file for
I care. I just want to know how where we can put it. The Microsoft IL
has a whole section on meta-data, and one wonders what Parrot might be
doing to address the same issues. excuse my ignorance, I am sure you
addressed this, and no I have not read the new pods yet.

> Bytecode reading must be fast. Ideally it is mmap and start.
> Treewalking for bytecodegeneration should be done by the compiler.

yes I agree, I just want to be able to reconstruct the tree for
debugging or reverse engineering (if the compiler that produced the
bytecode whats to produce this).

I would like to prototype some meta-data storage of my gcc

> The tree metadata can sure be some kind of intermediate output of the
> compiler (the output of the compiler front end), but normaly this
> should be fed into a backend which generates fast running bytecode or
> even native code.

That sounds great.

Normally you dont need this information, I just want to know how I can
store it if I *do* need it.

The metadata from the c++ that i am extracting even exceeds the size of
the sourcecode itself.

--- gre...@focusresearch.com wrote:
> b. --
>
> I agree that under normal circumstances the bytecode is primary.
> I was observing that as more and more metadata is considered,
> eventually its quantity (measured, say, in bytes) could approach
> or even exceed that of the raw bytecode. In cases where one
> would feel such a quantity of metadata is needed, it may not
> always be necessary to get greased-weasel speed-of-loading
> (but, see below).
>
> I understand the the mmap-and-go idea, although it doesn't
> always work out even when mmap is available (for example,
> prederef requires a side pointer-array to store its prederef
> results). Sometimes its mmap-mumble-go (but, see below).
>
>
> Certainly, there is nothing to prevent one from having
> the linearized bytecode pregenerated in the PBC file even
> when a metadata tree is also present (the tree could reference
> contiguous chunks of that bytecode by offset-size pairs). If
> you don't care about the tree, you don't process it. If you do
> process it, you probably produce an index data structure mapping
> byte code offsets to tree nodes for the debugger. I believe
> we retain high speed with this approach.

yeah, that is the idea. Reflection and introspector require the
meta-data, that can be read by special reflection operations.


>
>
> We do need to consider how the metadata makes it from the
> compiler *through* IMCC to land in the PBC file. The compiler
> needs to be able to produce annotated input to IMCC, and IMCC
> needs to be able to retain the annotations while it makes its
> code improvements and rendering (register mapping, etc.).
> I'm thinking that, too, could possibly be a tree. IMCC can pick out
> the chunks of IMC, generate bytecode, and further annotate the
> tree with the offset and size of the generated PBC chunk. The
> tree can be retained as the metadata segment in the PBC file.

Sounds good to me. For me, it could also be a graph in triples formats
(subject,predicate,object), and not a tree. This is what I wanted to
know, what is defined, and what needs to be defined.

Regards,

Gopal V

unread,
Feb 4, 2003, 11:56:53 AM2/4/03
to James Michael DuPont, Perl6 Internals
If memory serves me right, James Michael DuPont wrote:
> I just want to know how where we can put it. The Microsoft IL
> has a whole section on meta-data,

AFAIK, that just holds the offset, line number and filename. IIRC the
JVM had a LineNumberTable and VarNameTable for debugging which were
declared as ``attributes'' to each method in the .class tree.

I suppose VarNameTable is totally irrelevant for Parrot ...

> yes I agree, I just want to be able to reconstruct the tree for
> debugging or reverse engineering (if the compiler that produced the
> bytecode whats to produce this).

Optimisations ? ... (bang, there goes the line numbers ;)

> Normally you dont need this information, I just want to know how I can
> store it if I *do* need it.
>
> The metadata from the c++ that i am extracting even exceeds the size of
> the sourcecode itself.
>

> yeah, that is the idea. Reflection and introspector require the
> meta-data, that can be read by special reflection operations.

I think Parrot is going to *need* reflection operations :) ...
You might be able to extract information like you do with C# ,
with reflection looping over the methods.

Btw, your RDF stuff wouldn't be what I call "metadata" :) .. it's
data itself in a pre-processed format.

> > IMCC can pick out
> > the chunks of IMC, generate bytecode,

.line 42 "life.fubar"

?

Gopal
PS: don't look at me like that , I don't know anything about debugging
eval()...

James michael dupont

unread,
Feb 4, 2003, 5:30:30 PM2/4/03
to gre...@focusresearch.com, Juergen Boemmels, Brent Dax, Dan Sugalski, Dave Mitchell, Dave Beckett, introspectors, James Mastros, Leopold Toetsch, James Michael DuPont, Nicholas Clark, perl6-i...@perl.org

--- gre...@focusresearch.com wrote:
> Mike --
>
> Thats a lot of metadata. Sounds like maybe the metadata is primary
> and the bytecode is secondary, in which case perhaps what you
> really want is a (metadata) tree decorated with bytecode rather than
> a (bytecode) array decorated with metadata.

Fair enough. good point!


> Of course, the most natural candidate for the metadata would be the
> annotated (file & line, etc.) parse tree, or some approximation to it
> after compilation-related transformations.

OK that sounds fine. My current problems with the graphs of meta-data
are the speed of loading. I would like to use something like what you
are talking about with the mmap. Also, dot.net IL has tons of
meta-data, very very much of it.

>
> I can imagine a process that loads the tree, and linearizes the
> bytecode with the metadata consisting of backpointers to nodes of
> the tree, either in band as escaped noop-equivalent bytecode or
> out of band in an offset-pointer table.

Sure, a zippper (Reihverschluss ;) concept.

>
> With a suitable amount of forethought on the tree representation,
> you should be able to have good flexibility while still having enough
> standardization on how tree-emitting compilers represent typical
> debug-related metadata (file, line, etc.) that debuggers and other
> tools could be generic.

OK. Well the current rdf format that I have is ok, so that brings me
back to the idea of using rdf.... Redland supports a bdb, which also
supports fast loading, but is not platform independant.

James michael dupont

unread,
Feb 4, 2003, 5:40:39 PM2/4/03
to Gopal V, Perl6 Internals
Hey Gopal,
Nice to meet you here ;)

--- Gopal V <gopa...@symonds.net> wrote:
> If memory serves me right, James Michael DuPont wrote:
> > I just want to know how where we can put it. The Microsoft IL
> > has a whole section on meta-data,
>
> AFAIK, that just holds the offset, line number and filename. IIRC the
>
> JVM had a LineNumberTable and VarNameTable for debugging which were
> declared as ``attributes'' to each method in the .class tree.
>
> I suppose VarNameTable is totally irrelevant for Parrot ...

I dont know that, what is it? Variable name table? If so, i think it
might be good for debugging.

>
> > yes I agree, I just want to be able to reconstruct the tree for
> > debugging or reverse engineering (if the compiler that produced the
> > bytecode whats to produce this).
>
> Optimisations ? ... (bang, there goes the line numbers ;)

If you want to debug, you dont want optimizations. When you run the
debugger in the gcc, it produces a dwarf file, that is the type of
meta-data i am talking about.

>
> > Normally you dont need this information, I just want to know how I
> can
> > store it if I *do* need it.
> >
> > The metadata from the c++ that i am extracting even exceeds the
> size of
> > the sourcecode itself.
> >
> > yeah, that is the idea. Reflection and introspector require the
> > meta-data, that can be read by special reflection operations.
>
> I think Parrot is going to *need* reflection operations :) ...
> You might be able to extract information like you do with C# ,
> with reflection looping over the methods.

You might want to run C# in parrot, then you need it.

>
> Btw, your RDF stuff wouldn't be what I call "metadata" :) .. it's
> data itself in a pre-processed format.

Well read my first answer to the original meta-data post, with all the
links.
I think it is meta-data, all information about the bytecode that might
be interesting to a person, to understand what a give bytecode is and
means, it's context, meaning, usage, and all that. it is all meta-data.
The bytecode is what is needed to run.

>
> > > IMCC can pick out
> > > the chunks of IMC, generate bytecode,
>
> .line 42 "life.fubar"

Again, nice to meet you again Mr. Victory,

Mike

------------------------------------------------------------------------

The first ten million years were the worst, said Marvin, and the second
ten million years were the worst too. The third ten million I didn't
enjoy at all. After that I went into a bit of a decline......
The best conversation I had was over forty million years ago, continued
Marvin.....And that was with a coffee machine.
- Marvin complaining about being left alone for years.

Leopold Toetsch

unread,
Feb 5, 2003, 2:07:37 AM2/5/03
to James Michael DuPont, introspectors, perl6-i...@perl.org
James Michael DuPont wrote:

> --- gre...@focusresearch.com wrote:
>
>>Mike --
>>
>>Thats a lot of metadata.
>>
>

> OK that sounds fine. My current problems with the graphs of meta-data
> are the speed of loading.


When you arrange the meta-data as a single opcode stream, you have ~zero
load time for the mmap()ed case.
This means, you delay parsing of this stream to the time, when you are
actually using it.


> mike

leo

James Michael Dupont

unread,
Feb 5, 2003, 3:55:19 AM2/5/03
to Leopold Toetsch, introspectors, perl6-i...@perl.org

Great. I will review the code and see how this can be done. That would
be great!!!

mike

Gopal V

unread,
Feb 5, 2003, 9:17:55 AM2/5/03
to James Michael DuPont, Perl6 Internals
If memory serves me right, James Michael DuPont wrote:
> > JVM had a LineNumberTable and VarNameTable for debugging which were
> > declared as ``attributes'' to each method in the .class tree.
> >
> > I suppose VarNameTable is totally irrelevant for Parrot ...
>
> I dont know that, what is it? Variable name table? If so, i think it
> might be good for debugging.

Mapping between local vars and variable names ... because JVM has unlimited
(well virtually unlimited) local vars, this works for the JVM. But since
Parrot has only 32 registers , they get re-used for local-vars .

I think using IMCC will in general mess around the registers numbers for
the temporaries. So it doesn't make sense for Parrot to have a VarNameTable.

> > I think Parrot is going to *need* reflection operations :) ...
> > You might be able to extract information like you do with C# ,
> > with reflection looping over the methods.
>
> You might want to run C# in parrot, then you need it.

Not really for C# support only ... Dynamic invocation will need it...

<?php
$a="foobar";
$a(); // get method name "foobar" from global scope, run
?>

Something of this sort will need to occur .. C# is much easier as you
already know what all types/method there might be and there's no dynamic
member lookup :).

> I think it is meta-data, all information about the bytecode that might
> be interesting to a person, to understand what a give bytecode is and
> means, it's context, meaning, usage, and all that. it is all meta-data.

You could argue about that , but IMHO except the basic Reflection stuff
and debug information , all the other itty-bitty details are useless for
the engine . Better keep it in a seperate file ?. (and build a cool
bytecode analysis tool as well :)

> > .line 42 "life.fubar"
>
> Again, nice to meet you again Mr. Victory,

That's quoted off my fubar-to-IL compiler ... (that's what we call
the f'uped beyond recognition pascal clone we ``implement'' in our lab).

Cheerio,
Gopal

0 new messages