Parrot 0.0.9

Steve Fink

unread,

Oct 23, 2002, 4:27:22 PM10/23/02

to perl6-i...@perl.org

I suppose I ought to try to wrap up a release one of these days. I've
been thinking about the possibilities, but I'm not sure about the
current state of a couple of things. And what I'd most like to see
right now is some stabilization. So I'll list my current thinking:

Prerequisites for 0.0.9 release
-------------------------------
* Reclaim the tinderbox!
- requires the multiarray.pmc memory allocation problem to be fixed
(Josef is looking into this)
- requires sprintf* to work on PPC. (Brent -- what's the status?)
* Warnings reduction
- I doubt we can make it to zero warnings for all platforms, but we
really need to at least get things like gcc3 on the sparc down to
a reasonable state
* JIT trace/restart() failure
- Leo's analysis of the situation is correct.
- If nobody beats me to it, I'll probably work on this the next time
I get a chance to do actual coding.
* Patch backlog
- Artificial goal: I want the list of pending patches to be smaller
than one screenfull before I release. Fortunately, I have a large
screen.

Possible (feature/architectural) goals for 0.0.9
------------------------------------------------
* PMC cleanup
- Leo did a huge amount of work on this, but there are a few things left:
- array.pmc still autocreates something called "PerlUndef"
- the various unions should probably be coalesced into one
- I think the variable/value distinction Dan was talking about may
require some changes, but I haven't been paying attention to that
discussion (as in, I read it, but not closely enough to understand
what anybody's talking about yet)

* PMC method invocations (written in C)
- This is just something I think we really really need in order to
build on top of, but I don't know the current state here either.
- I say "(written in C)" because eventually we want to be able to
write PMC methods in Parrot (and have them JITted etc.), but
that's further off

* Keyed access
- Another discussion that's gone over my head. Leo has a scheme to
dramatically reduce the number of instructions, at the cost of
requiring a couple of opcodes for keyed accesses; Dan says that
lots of instructions are no big deal and pushing forward with the
status quo is better.
- Either way, the current keyed support isn't complete.

* Bytecode format
- We need lots of bits of metadata:
- Source filenames/line numbers
- Debugging information
- Stratospheric rehydrocalibration amplifiers for the .NET people
(er... or something; I can't remember what they needed)
- A couple of other things I had written down in a notebook I left
at work
- Juergen Boemmels, the guy who gets his name constantly mangled by
us 7-bit ASCII throwbacks, has a good start on the underlying
support for this. Leo liked it, so it must be good.
- I still don't understand why we can't write our own
serializers/deserializers for ELF or some other standard format,
and will periodically keep whining about it until someone explains
it to me in a way I can understand. (I'm still quite happy to have
the current packfile format extended in the meantime, though it
would be very nice if the assembler could follow IMCC's lead and
use the packfile.c API.)

* Exceptions
- I haven't been paying much attention to developments on this,
although I know Brent went through and cleaned up a bunch of stuff
so that at least exceptions will be thrown when they should be.
- Dan was also working on some of that design stuff he does. Is this
close enough to reality to squeeze into this release?

That's all I can think of for now. Please, if you know more about the
state of any of these things, can you reply with an update? I plan to
make a release as soon as we can finish any one of these five feature
goal ideas. (Unless something else is close enough to completion, in
which case I'll hold off for that.)

Leopold Toetsch

unread,

Oct 23, 2002, 5:41:37 PM10/23/02

to Steve Fink, perl6-i...@perl.org

Steve Fink wrote:

> I suppose I ought to try to wrap up a release one of these days.

> - Artificial goal: I want the list of pending patches to be smaller
> than one screenfull before I release. Fortunately, I have a large
> screen.

I did set 2 of them to "Applied". I'll wade through my contributions and
set status accordingly.

> Possible (feature/architectural) goals for 0.0.9
> ------------------------------------------------
> * PMC cleanup
> - Leo did a huge amount of work on this, but there are a few things left:
> - array.pmc still autocreates something called "PerlUndef"

This is the way it worked earlier too. The problem is (and was) that
extending an array just makes room for new PMCs. Allocating all PMCs
during extension is not an option (IMHO) this would render sparse arrays
to an noop and would be expensive, if these PMCs remain unused. Now,
setting the value of an array cell is done by a vtable method like
set_integer_native, which requires an PMC to be there, which happens to
be a PerlUndef, changing its vtable immediately to the desired type.

> - the various unions should probably be coalesced into one

I'll send a patch tomorrow or so, regarding hash/array data types,
unifying the enum_type values.

> - I think the variable/value distinction Dan was talking about may
> require some changes, but I haven't been paying attention to that
> discussion (as in, I read it, but not closely enough to understand
> what anybody's talking about yet)

I was talking about it, but for a patch, my understanding seems to
resemble yours ;-)

> * Keyed access

> - Either way, the current keyed support isn't complete.

Adding a couple of lines to assemble.pl, which do the same as my patch
WRT imcc would make these multi_keyed operations available to HL. Then
we could look at usage patterns and finally decide, what to do.
(Who could extend the assembler?)

> * Bytecode format

> - I still don't understand why we can't write our own
> serializers/deserializers for ELF or some other standard format,

I don't see the point, why to use ELF. The new proposed packfile format
should give us all we need.

leo

Tom Hughes

unread,

Oct 23, 2002, 5:51:07 PM10/23/02

to perl6-i...@perl.org

In message <2002102316...@foxglove.digital-integrity.com>
Steve Fink <st...@fink.com> wrote:

> * Keyed access
> - Another discussion that's gone over my head. Leo has a scheme to
> dramatically reduce the number of instructions, at the cost of
> requiring a couple of opcodes for keyed accesses; Dan says that
> lots of instructions are no big deal and pushing forward with the
> status quo is better.
> - Either way, the current keyed support isn't complete.

I've got a more or less complete patch for dynamic key contruction
lying around here somewhere... I'll try and dig it out and send it
in a while...

Tom

--
Tom Hughes (t...@compton.nu)
http://www.compton.nu/

Andy Dougherty

unread,

Oct 23, 2002, 6:46:09 PM10/23/02

to Perl6 Internals

On Wed, 23 Oct 2002, Steve Fink wrote:

> I suppose I ought to try to wrap up a release one of these days. I've
> been thinking about the possibilities, but I'm not sure about the
> current state of a couple of things. And what I'd most like to see
> right now is some stabilization. So I'll list my current thinking:
>
> Prerequisites for 0.0.9 release
> -------------------------------
> * Reclaim the tinderbox!
> - requires the multiarray.pmc memory allocation problem to be fixed
> (Josef is looking into this)
> - requires sprintf* to work on PPC. (Brent -- what's the status?)
> * Warnings reduction
> - I doubt we can make it to zero warnings for all platforms, but we
> really need to at least get things like gcc3 on the sparc down to
> a reasonable state

For what it's worth, with the following config on Solaris 8, gcc,
INTVAL='long long', I get the warnings appended below. (This
configuration is what I get with perl Configure.pl --defaults, since my
default perl on this machine was built with -Duse64bitint.)

There are also lots of test failures. Those are appended below too.

Summary of my parrot 0.0.8 configuration:
configdate='Wed Oct 23 12:49:51 2002'
Platform:
osname=solaris, archname=sun4-solaris-64int
jitcapable=0, jitarchname=nojit,
jitosname=nojit, jitcpuarch=i386
perl=perl64
Compiler:
cc='gcc', ccflags='-I/usr/local/include -I/opt/gnu/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -g',
Linker and Libraries:
ld='gcc', ldflags=' -L/usr/local/lib -L/opt/gnu/lib ',
cc_ldflags='',
libs='-lsocket -lnsl -ldl -lm'
Dynamic Linking:
so='.so', ld_shared='-G -L/usr/local/lib -L/opt/gnu/lib',
ld_shared_flags=''
Types:
iv=long long, intvalsize=8, intsize=4, opcode_t=long long, opcode_t_size=8,
ptrsize=4, ptr_alignment=4 byteorder=87654321,
nv=double, numvalsize=8, doublesize=8

continuation.pmc: In function `Parrot_Continuation_set_integer_native':
continuation.pmc:32: warning: cast to pointer from integer of different size
coroutine.pmc: In function `Parrot_Coroutine_init':
coroutine.pmc:31: warning: cast to pointer from integer of different size
coroutine.pmc: In function `Parrot_Coroutine_set_integer':
coroutine.pmc:50: warning: cast to pointer from integer of different size
coroutine.pmc: In function `Parrot_Coroutine_set_integer_native':
coroutine.pmc:54: warning: cast to pointer from integer of different size
intqueue.pmc: In function `Parrot_IntQueue_init':
intqueue.pmc:92: warning: cast increases required alignment of target type
pointer.pmc: In function `Parrot_Pointer_get_integer':
pointer.pmc:53: warning: cast from pointer to integer of different size
pointer.pmc: In function `Parrot_Pointer_get_number':
pointer.pmc:57: warning: cast from pointer to integer of different size
sub.pmc: In function `Parrot_Sub_init':
sub.pmc:31: warning: cast to pointer from integer of different size
sub.pmc: In function `Parrot_Sub_set_integer':
sub.pmc:40: warning: cast to pointer from integer of different size
sub.pmc: In function `Parrot_Sub_set_integer_native':
sub.pmc:44: warning: cast to pointer from integer of different size
debug.ops: In function `Parrot_debug_break':
debug.ops:96: warning: cast increases required alignment of target type
debug.ops:106: warning: cast increases required alignment of target type
string.c: In function `string_append':
string.c:148: warning: cast to pointer from integer of different size
string.c: In function `string_make':
string.c:209: warning: cast discards `const' from pointer target type
string.c: In function `string_concat':
string.c:455: warning: cast to pointer from integer of different size
string.c: In function `string_repeat':
string.c:500: warning: cast to pointer from integer of different size
runops_cores.c: In function `runops_slow_core':
runops_cores.c:83: warning: implicit declaration of function `Parrot_init'
resources.c: In function `compact_pool':
resources.c:287: warning: cast increases required alignment of target type
embed.c: In function `Parrot_readbc':
embed.c:218: warning: cast increases required alignment of target type
headers.c: In function `add_extra_buffer_header':
headers.c:422: warning: cast increases required alignment of target type
dod.c: In function `free_unused_buffers':
dod.c:430: warning: cast increases required alignment of target type
spf_vtable.c: In function `getptr_pmc':
spf_vtable.c:301: warning: cast to pointer from integer of different size

t/src/basic.........ok
t/src/intlist.......# Failed test (t/src/intlist.t at line 9)
# got: 'The answer is 0.
# '
# expected: 'The answer is 42.
# '
# Failed test (t/src/intlist.t at line 35)
# got: 'Failed: build-up first pop
# '
# expected: 'I need a shower.
# '
# Failed test (t/src/intlist.t at line 112)
# got: 'Step 1: 0
# Failed: build-up first pop
# '
# expected: 'Step 1: 0
# Step 2: 1
# Step 3: 2
# Step 4: 255
# Step 5: 256
# Step 6: 257
# Done.
# '
# Failed test (t/src/intlist.t at line 241)
# got: 'Out get failed on i=1
# Out shift failed on i=1
# Out get failed on i=2
# Out shift failed on i=2
[ ... lots more ... ]
# Out get failed on i=987
# Out shift failed on i=987
# Out get failed on '
# expected: 'Done.
# '
# Looks like you failed 4 tests of 4.
dubious
Test returned status 4 (wstat 1024, 0x400)
DIED. FAILED tests 1-4
Failed 4/4 tests, 0.00% okay
t/src/list..........ok
t/src/manifest......# Looks like you planned 4 tests but only ran 3.
dubious
Test returned status 1 (wstat 256, 0x100)
DIED. FAILED test 4
Failed 1/4 tests, 75.00% okay (less 1 skipped test: 2 okay, 50.00%)
t/src/sprintf.......# Failed test (t/src/sprintf.t at line 9)
# got: 'C
# Hello, %Parrot!%
# PerlHash[0x100]
# PerlHash[0x100]
# Hello, Pa!
# Hello, Hello, Pa!
# 1 == 1
# -255 == -255
# 256 == 256
# 0.500000 == 0.500000
# 0.500 == 0.500
# 0.001 == 0.001
# 1e+06 == 1e+06
# 0.5 == 0.5
# 0x20 == 0x0
# That's all, folks!
# '
# expected: 'C
# Hello, %Parrot!%
# PerlHash[0x100]
# PerlHash[0x100]
# Hello, Pa!
# Hello, Hello, Pa!
# 1 == 1
# -255 == -255
# 256 == 256
# 0.500000 == 0.500000
# 0.500 == 0.500
# 0.001 == 0.001
# 1e+06 == 1e+06
# 0.5 == 0.5
# 0x20 == 0x20
# That's all, folks!
# '
# Looks like you failed 1 tests of 1.
dubious
Test returned status 1 (wstat 256, 0x100)
DIED. FAILED test 1
Failed 1/1 tests, 0.00% okay
t/op/basic..........ok
t/op/bitwise........ok
t/op/debuginfo......ok
t/op/gc.............ok
t/op/globals........ok
t/op/hacks..........ok
t/op/ifunless.......ok
t/op/info...........ok
t/op/integer........ok
t/op/interp.........ok
t/op/lexicals.......ok
t/op/macro..........ok
1/15 skipped: various reasons
t/op/number.........ok
t/op/rx.............ok
1/23 skipped: various reasons
t/op/stacks.........ok
1/35 skipped: various reasons
t/op/string.........# Failed test (t/op/string.t at line 1326)
# got: 'resources.c:332: failed assertion `new_block->size >= (size_t)new_block->top - (size_t)new_block->start'
# '
# expected: '[foo bar quux ]
# '
# Looks like you failed 1 tests of 97.
dubious
Test returned status 1 (wstat 256, 0x100)
DIED. FAILED test 93
Failed 1/97 tests, 98.97% okay
t/op/time...........ok
t/op/trans..........ok
t/pmc/array.........ok
t/pmc/boolean.......ok
t/pmc/intlist.......# Failed test (t/pmc/intlist.t at line 19)
# got: 'FAILED: first pop
# Found: 8589934592
# Wanted: 2
# '
# expected: 'I need a shower.
# '
# Failed test (t/pmc/intlist.t at line 152)
# got: 'err: wanted 0 got 1
# '
# expected: 'ok
# '
# Failed test (t/pmc/intlist.t at line 192)
# got: 'err: wanted 99999 got 429492434732702
# '
# expected: 'ok 1
# ok 2
# '
# Failed test (t/pmc/intlist.t at line 236)
# got: ''
# expected: 'ok
# '
# Failed test (t/pmc/intlist.t at line 290)
# got: ''
# expected: 'ok 1
# ok 2
# '
# Failed test (t/pmc/intlist.t at line 359)
# got: 'nok val 100000 1100 4724464025600 100100 100101'
# expected: 'ok
# '
# Failed test (t/pmc/intlist.t at line 424)
# got: 'nok 1 nok 2 nok 4 nok 5 ok
# '
# expected: 'ok
# '
# Looks like you failed 7 tests of 8.
dubious
Test returned status 7 (wstat 1792, 0x700)
DIED. FAILED tests 2-8
Failed 7/8 tests, 12.50% okay
t/pmc/perlarray.....ok
t/pmc/perlhash......ok
t/pmc/perlint.......ok
1/4 skipped: various reasons
t/pmc/perlstring....ok
1/8 skipped: various reasons
t/pmc/pmc...........# Failed test (t/pmc/pmc.t at line 68)
# got: 'Illegal PMC enum (0) in new
# '
# expected: 'Illegal PMC enum (18) in new
# '
# Failed test (t/pmc/pmc.t at line 82)
# got: 'Illegal PMC enum (0) in new
# '
# expected: 'Illegal PMC enum (18) in new
# '
# Failed test (t/pmc/pmc.t at line 1624)
# got: ''
# expected: 'All names and ids ok.
# '
# Looks like you failed 3 tests of 83.
dubious
Test returned status 3 (wstat 768, 0x300)
DIED. FAILED tests 3, 5, 76
Failed 3/83 tests, 96.39% okay (less 1 skipped test: 79 okay, 95.18%)
t/pmc/sub...........ok
Failed 6/32 test scripts, 81.25% okay. 17/489 subtests failed, 96.52% okay.
Failed Test Stat Wstat Total Fail Failed List of Failed
-------------------------------------------------------------------------------
t/op/string.t 1 256 97 1 1.03% 93
t/pmc/intlist.t 7 1792 8 7 87.50% 2-8
t/pmc/pmc.t 3 768 83 3 3.61% 3 5 76
t/src/intlist.t 4 1024 4 4 100.00% 1-4
t/src/manifest.t 1 256 4 1 25.00% 4
t/src/sprintf.t 1 256 1 1 100.00% 1
7 subtests skipped.

--
Andy Dougherty doug...@lafayette.edu
Dept. of Physics
Lafayette College, Easton PA 18042

Dan Sugalski

unread,

Oct 23, 2002, 7:36:48 PM10/23/02

to Leopold Toetsch, Steve Fink, perl6-i...@perl.org

At 7:41 PM +0200 10/23/02, Leopold Toetsch wrote:
>>Possible (feature/architectural) goals for 0.0.9
>>------------------------------------------------
>>* PMC cleanup
>> - Leo did a huge amount of work on this, but there are a few things left:
>> - array.pmc still autocreates something called "PerlUndef"
>
>
>This is the way it worked earlier too. The problem is (and was) that
>extending an array just makes room for new PMCs. Allocating all PMCs
>during extension is not an option (IMHO) this would render sparse
>arrays to an noop and would be expensive, if these PMCs remain
>unused. Now, setting the value of an array cell is done by a vtable
>method like set_integer_native, which requires an PMC to be there,
>which happens to be a PerlUndef, changing its vtable immediately to
>the desired type.

It's OK for the array.pmc class to not care about sparseness. I'm
fine with us later on adding a sparsearray.pmc.

It'd probably be a good idea for us to have a generic undef.pmc for
undefined usage. Dunno if there'll ever be any reason for it to
behave differently than perlundef, but it might. For now, perlundef
can just subclass undef. (Which means, I suppose, that we just rename
perlundef to undef and make a few changes in places that need it)
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Rhys Weatherley

unread,

Oct 23, 2002, 9:43:34 PM10/23/02

to perl6-i...@perl.org

Steve Fink wrote:

> - Stratospheric rehydrocalibration amplifiers for the .NET people
> (er... or something; I can't remember what they needed)

The ability to embed arbitrary data in a pbc file under a
named section. This data needs to be readable by the program
when it runs, but is otherwise ignored by the rest of Parrot.

My goal was to store a compact class representation that could
be unpacked to create the necessary Parrot structures at runtime
in some kind of "InitCsharpClass" library function. This usage
may be moot once the Parrot metadata system is figured out.

C# also embeds string and other UI resources directly into the
program binary as a separate section. If I can create a section
called "csharp.resources" or something, then I can plonk the
necessary data in place and everything should work.

Cheers,

Rhys.

Dan Sugalski

unread,

Oct 23, 2002, 11:05:10 PM10/23/02

to perl6-i...@perl.org

At 7:43 AM +1000 10/24/02, Rhys Weatherley wrote:
>Steve Fink wrote:
>
>> - Stratospheric rehydrocalibration amplifiers for the .NET people
>> (er... or something; I can't remember what they needed)
>
>The ability to embed arbitrary data in a pbc file under a
>named section. This data needs to be readable by the program
>when it runs, but is otherwise ignored by the rest of Parrot.

Right, good call. This'll make perl's named embedded filehandles
(__DATA__ and suchlike things--I'm pretty sure Larry and Damian have
Evil Things in mind for this at some point in perl 6) a lot easier as
well.

A binary data chunk section with named directory for it (per bytecode
segment, I think) would work pretty well for this. I don't think
we'll need it writable, though. Hopefully not, though there is the
potential for interesting things if it is.

Brent Dax

unread,

Oct 24, 2002, 1:57:37 AM10/24/02

to Steve Fink, perl6-i...@perl.org

Steve Fink:
....
# - requires sprintf* to work on PPC. (Brent -- what's the status?)

Dan said that he would give me an account on a PPC machine so I could
debug this, but that hasn't happened yet.

....
# * Exceptions
# - I haven't been paying much attention to developments on this,
# although I know Brent went through and cleaned up a bunch of stuff
# so that at least exceptions will be thrown when they should be.

I wrote this, but another part of the patch was deemed unacceptable, so
it was never committed.

--Brent Dax <bren...@cpan.org>
@roles=map {"Parrot $_"} qw(embedding regexen Configure)

Wire telegraph is a kind of a very, very long cat. You pull his tail in
New York and his head is meowing in Los Angeles. And radio operates
exactly the same way. The only difference is that there is no cat.
--Albert Einstein (explaining radio)

Steve Fink

unread,

Oct 24, 2002, 6:25:45 AM10/24/02

to Dan Sugalski, Leopold Toetsch, perl6-i...@perl.org

On Oct-23, Dan Sugalski wrote:
> At 7:41 PM +0200 10/23/02, Leopold Toetsch wrote:
> >>Possible (feature/architectural) goals for 0.0.9
> >>------------------------------------------------
> >>* PMC cleanup
> >> - Leo did a huge amount of work on this, but there are a few things
> >> left:
> >> - array.pmc still autocreates something called "PerlUndef"
> >
> >
> >This is the way it worked earlier too. The problem is (and was) that
> >extending an array just makes room for new PMCs. Allocating all PMCs
> >during extension is not an option (IMHO) this would render sparse
> >arrays to an noop and would be expensive, if these PMCs remain
> >unused. Now, setting the value of an array cell is done by a vtable
> >method like set_integer_native, which requires an PMC to be there,
> >which happens to be a PerlUndef, changing its vtable immediately to
> >the desired type.
>
> It's OK for the array.pmc class to not care about sparseness. I'm
> fine with us later on adding a sparsearray.pmc.
>
> It'd probably be a good idea for us to have a generic undef.pmc for
> undefined usage. Dunno if there'll ever be any reason for it to
> behave differently than perlundef, but it might. For now, perlundef
> can just subclass undef. (Which means, I suppose, that we just rename
> perlundef to undef and make a few changes in places that need it)
> --

Yes, that's what I was saying. Sorry the comment was vague -- all I
meant was that general Parrot PMCs should not be creating
Perl-specific PMCs. I agree completely with Dan's solution. Is there
anything Perl-specific about the current PerlUndef? If not, then just
renaming it to Undef seems best.

Leopold Toetsch

unread,

Oct 24, 2002, 7:10:08 AM10/24/02

to Andy Dougherty, Perl6 Internals

Andy Dougherty wrote:

> Types:
> iv=long long, intvalsize=8, intsize=4, opcode_t=long long, opcode_t_size=8,
> ptrsize=4, ptr_alignment=4 byteorder=87654321,

The INTVAL2PTR and PTR2INTVAL macros should take care of such a
configuration. Though I'm not to sure, if we can get rid of all the
warnings. But above case seems to be missing in the macros.

leo

Leopold Toetsch

unread,

Oct 24, 2002, 9:37:02 AM10/24/02

to Steve Fink, perl6-i...@perl.org

Steve Fink wrote:

> - the various unions should probably be coalesced into one

I did check in my datatypes patch.
- all? native and other data types are summarized in datatypes.h
- hash and list use the same enums now
- datatype.c has currently 2 functions to retrieve types per name/enum
(conversion functions could go here later)
- core.ops is adjusted, to retrieve data types or check, if a type enum
is valid
- test included

leo

Leopold Toetsch

unread,

Oct 24, 2002, 9:31:05 AM10/24/02

to Steve Fink, Dan Sugalski, perl6-i...@perl.org

Steve Fink wrote:

> On Oct-23, Dan Sugalski wrote:

>>It'd probably be a good idea for us to have a generic undef.pmc for
>>undefined usage.

> Yes, that's what I was saying. Sorry the comment was vague -- all I

> meant was that general Parrot PMCs should not be creating
> Perl-specific PMCs. I agree completely with Dan's solution. Is there
> anything Perl-specific about the current PerlUndef? If not, then just
> renaming it to Undef seems best.

I'll rename PerlUndef.pmc to undef.pmc and create a PerlUndef.pmc
derived from it.

leo

Leopold Toetsch

unread,

Oct 24, 2002, 9:28:31 AM10/24/02

to Steve Fink, perl6-i...@perl.org

Steve Fink wrote:

> Prerequisites for 0.0.9 release
> -------------------------------
> * Reclaim the tinderbox!

On one machine I suddenly have additionally:

Failed Test Status Wstat Total Fail Failed List of failed
-------------------------------------------------------------------------------
t/op/stacks.t 35 3 8,57% 4, 7, 34

Running w/o --gc_debug is ok. This emerged after some code changes WRT
datatypes. A second machine doesn't show these failures.

Strange.

leo

Andy Dougherty

unread,

Oct 24, 2002, 12:22:12 PM10/24/02

to Leopold Toetsch, Perl6 Internals

On Thu, 24 Oct 2002, Leopold Toetsch wrote:

> Andy Dougherty wrote:
>
> > Types:
> > iv=long long, intvalsize=8, intsize=4, opcode_t=long long, opcode_t_size=8,
> > ptrsize=4, ptr_alignment=4 byteorder=87654321,

> The INTVAL2PTR and PTR2INTVAL macros should take care of such a
> configuration.

Yes, I know. I'm the one who put those macros in parrot.h.

> Though I'm not to sure, if we can get rid of all the
> warnings. But above case seems to be missing in the macros.

However, casting isn't always the correct solution. Sometimes the issue
is that the underlying types ought to be changed. For example, I think
that a number of variables currently of type UINTVAL really ought to be of
tipe size_t. I have posted about this at length before but haven't fixed
it myself due to both time constraints and laziness.

--
Andy Dougherty doug...@lafayette.edu

Juergen Boemmels

unread,

Oct 24, 2002, 3:28:21 PM10/24/02

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> writes:

> >The ability to embed arbitrary data in a pbc file under a
> >named section. This data needs to be readable by the program
> >when it runs, but is otherwise ignored by the rest of Parrot.
>
> Right, good call. This'll make perl's named embedded filehandles
> (__DATA__ and suchlike things--I'm pretty sure Larry and Damian have
> Evil Things in mind for this at some point in perl 6) a lot easier as
> well.

My proposed extension of the packfile format is going in this
direction. But I'm not sure at the moment not sure about string
encodings in the segment directory. I was thinking about limiting to
ASCII because its an internal. Allowing diffrent encodings opens a can
of worms. UTF-8 only may also be a possiblity. Furthermore a part of
the namespace should be reserved for internal use only. ATM I use
all-caps names, but think also about dot-prefix.

> A binary data chunk section with named directory for it (per bytecode
> segment, I think) would work pretty well for this.

I'm not sure if I understand you correctly: You talk about more than
one bytecode segment in a packfile and each of them has its own
associated directory with independed namespace. Trickey. But it should
be possible. Having a root directory segment and sub directories.

> I don't think we'll
> need it writable, though. Hopefully not, though there is the potential
> for interesting things if it is.

The main problem with writing to a pbc is the concurrend access. You
need some kind of locking. But for read-only access no write should be
required so you will never know if some other process is reading the
file you want to change, and the reading process has no chance to be
sure that the file will not change.

But it would be nice if you could write a new packfile. This would be
very handy for writing compilers.

bye
b.

Leopold Toetsch

unread,

Oct 24, 2002, 5:21:58 PM10/24/02

to Steve Fink, Dan Sugalski, perl6-i...@perl.org

Steve Fink wrote:

>... If not, then just

> renaming it to Undef seems best.

I had a closer look at it. Just renaming doesn't: PerlUndef is derived
from PerlInt, which provides major funtionality for it.

If this syllable "Perl" is really a problem, I will reorganize them
again i a more hierarchical way, all perl classes on top of basic classes.

But, as this is again a major patch I'd prefer to do it after 0.0.9, a
long with PMC/Buffer unification and variable/value separation, as both
steps will change the whole classes subdir drastically again.

I have already a working example for PMC/Buffer unification and
variable/value separation WRT tied/non tied scalars. I'll send a test
program later.

leo

Steve Fink

unread,

Oct 25, 2002, 6:48:03 AM10/25/02

to Leopold Toetsch, perl6-i...@perl.org

On Oct-23, Leopold Toetsch wrote:
> Steve Fink wrote:
>
> >I suppose I ought to try to wrap up a release one of these days.
>
>
> > - Artificial goal: I want the list of pending patches to be smaller
> > than one screenfull before I release. Fortunately, I have a large
> > screen.
>
> I did set 2 of them to "Applied". I'll wade through my contributions and
> set status accordingly.

Thanks.

> >* Keyed access
> > - Either way, the current keyed support isn't complete.
>
> Adding a couple of lines to assemble.pl, which do the same as my patch
> WRT imcc would make these multi_keyed operations available to HL. Then
> we could look at usage patterns and finally decide, what to do.
> (Who could extend the assembler?)

Sounds good to me. But it does suggest a question -- are there any
compelling reasons to preserve the separate assembler? Given that imcc
appears to be a strict superset of the assembler these days, I'm
tempted to standardize on imcc. Anyone want to argue otherwise?

Architecturally, I suppose it would be nice to have a separate library
for only processing PASM code, but I don't see that as hugely
important. And perhaps the correct method of obtaining that would be
by carving out a pasm component of imcc and having the main imcc
delegate unrecognized lines to it.

But the assembler seems to be a somewhat religious issue, so I'll not
jump to conclusions.

> >* Bytecode format
>
> > - I still don't understand why we can't write our own
> > serializers/deserializers for ELF or some other standard format,
>
> I don't see the point, why to use ELF. The new proposed packfile format
> should give us all we need.

It's my knee-jerk "standards are good" reaction. While the proposed
format provides everything we're thinking of at the moment, it seems
like there are a lot of other things we might want to be able do with
packfiles in the future.

Our problem set feels pretty similar to the ELF problem set. The major
counterargument I can envision is that it's too complicated and
provides way more functionality than we need -- but although I really
don't know that much about it, I'm under the impression that ELF is a
pretty simple format. It mostly just sounds scary because it happens
to be used for complex purposes.

To my naive thinking, using a standard could provide some useful
advantages. Mostly, I like the general idea of using a standard
because a bunch of other presumably intelligent and motivated people
have already dealt with all the niggling little issues of naming,
referencing, total and partial symbol table stripping, etc. that we
may not have thought of yet. ELF, and other formats like it, provides
support for an arbitrary number of sections, a place to put indexes,
segments, section attributes, etc.

Going further out on a limb, it might even be possible to use ELF for
some of the things it is normally used for, rather than just a hollow
packaging that we stuff our own things into. gdb could make some sense
of it automatically. We could store read-only bytecode wads in shared
memory. PBC files could be executed as native binaries on ELF-based
systems. We could use existing ELF tools to, at the very least,
provide test result verification.

ELF is certainly not the only possibility. In the past we've mentioned
COFF and IFF. Java's .class file format comes to mind too. Even PNG
could serve as a general-purpose container. They all pretty much boil
down to a variant of a magic-number, some amount of versioning
information, maybe an endianness specifier, and an index pointing to
the offsets and sizes of a bunch of named sections.

On the other hand, talk is cheap and I personally don't plan on trying
to implement any of this, so it doesn't really matter what I think.
:-)

Steve Fink

unread,

Oct 25, 2002, 7:25:32 AM10/25/02

to Leopold Toetsch, perl6-i...@perl.org

Thanks! It's a little scary how fast you are.

Steve Fink

unread,

Oct 25, 2002, 7:37:02 AM10/25/02

to Leopold Toetsch, Dan Sugalski, perl6-i...@perl.org

On Oct-24, Leopold Toetsch wrote:

> Steve Fink wrote:
>
>
> >... If not, then just
> >renaming it to Undef seems best.
>
> I had a closer look at it. Just renaming doesn't: PerlUndef is derived
> from PerlInt, which provides major funtionality for it.
>
> If this syllable "Perl" is really a problem, I will reorganize them
> again i a more hierarchical way, all perl classes on top of basic classes.

Well, it definitely bothers me, but maybe I'm just being anal
retentive. Maybe this is the right time to ask another question I've
been wondering about: is there anything perl-specific about PerlInt?
PerlNum?

Although if we're going to change PerlInt to Int (or just make a new
Int base class that PerlInt would inherit from), then we should
probably handle the question of how many bits these integers should
have, and possibly create a couple of PMCs -- Int32, Int64,
IntAtLeast32, NativeInt, UnboundedInt, IntAsBigAsYourHead, etc.

Dan, do you have any design guidance to kick in here? What Parrot
Int/Num PMCs do we need, and how should PerlInt relate to them?

> But, as this is again a major patch I'd prefer to do it after 0.0.9, a
> long with PMC/Buffer unification and variable/value separation, as both
> steps will change the whole classes subdir drastically again.

Fair enough. Although it looks like this release is still going to
take some time to stabilize; there are still an uncomfortable number
of warnings and GC bugs.

Leopold Toetsch

unread,

Oct 25, 2002, 8:34:32 AM10/25/02

to Steve Fink, perl6-i...@perl.org

Steve Fink wrote:

> On Oct-23, Leopold Toetsch wrote:

>>we could look at usage patterns and finally decide, what to do.
>>(Who could extend the assembler?)

> Sounds good to me. But it does suggest a question -- are there any
> compelling reasons to preserve the separate assembler?

Macros, currently. When we have a macros preprocessor, we could toss the
assemble.pl.

> ... Given that imcc

> appears to be a strict superset of the assembler these days, I'm
> tempted to standardize on imcc. Anyone want to argue otherwise?

imcc has a slightly stricter syntax WRT subroutines, though this is not
final. And there are some keyword clashes, e.g. imcc »if« vs pasm »if«.

> Architecturally, I suppose it would be nice to have a separate library
> for only processing PASM code, but I don't see that as hugely
> important. And perhaps the correct method of obtaining that would be
> by carving out a pasm component of imcc and having the main imcc
> delegate unrecognized lines to it.

This is current behaviour. There are 2 possible ways to switch to pasm:
.emit
pasm code is here
..
.eom

and unrecognized keywords are looked up as pasm opnames in all lines.

> But the assembler seems to be a somewhat religious issue, so I'll not
> jump to conclusions.

The assembler is slow, but has this nice macro feature, which is heavily
used in some tests.

>>>* Bytecode format

> ... We could use existing ELF tools to, at the very least,
> provide test result verification.

This is an argument. If we get e.g. bsr fixup at load time done by the
elf loader, it would be nice.

OTOH fixup is not complicated (imcc does it), but when we have e.g. native

dynamic libraries mixed with PBC, and ELF does the right thing, it would

be an advantage. Using gdb is another nice feature - but what with different

platforms not having all these tools?

leo

Leopold Toetsch

unread,

Oct 25, 2002, 9:05:11 AM10/25/02

to Steve Fink, Dan Sugalski, perl6-i...@perl.org

Steve Fink wrote:

> On Oct-24, Leopold Toetsch wrote:

>... is there anything perl-specific about PerlInt?
> PerlNum?

This depends. The PerlScalars change there types on demand.

add PerlInt, PerlInt, PerlNum

changes the type of the LHS to a PerlNum. Other languages might prefer

to round the result to an int and keep the type -- I dunno.

> Although if we're going to change PerlInt to Int (or just make a new
> Int base class that PerlInt would inherit from), then we should
> probably handle the question of how many bits these integers should
> have, and possibly create a couple of PMCs -- Int32, Int64,
> IntAtLeast32, NativeInt, UnboundedInt, IntAsBigAsYourHead, etc.

Int's with size <= sizeof(INTVAL) will be handled natively (or by
Int.pmc) + some bit adjustment/sign promotion ops. Integers bigger then
INTVAL need there own class. Fixed sized ints could be mapped at
configure time to an appropriate type.

> Dan, do you have any design guidance to kick in here? What Parrot
> Int/Num PMCs do we need, and how should PerlInt relate to them?

IMHO we should currently concentrate on the PerlScalars - as we don't
have other major HLs now. But as soon as they start using parrot, we
will know, how these scalars should behave.

A long with the variable/value split, we will get some more modular
VTABLE. Eventually we will have a scalar constructor like:

new Int, [ size 32, tieable yes, morph_type perl, taint_check yes ]

and put appropriate VTABLE pieces together, to achieve the desired
behaviour.

[ past 0.0.9 ]

> Fair enough. Although it looks like this release is still going to
> take some time to stabilize; there are still an uncomfortable number
> of warnings and GC bugs.

These warnings and GC bugs should definitely be weeded out, yes.

leo

Leopold Toetsch

unread,

Oct 25, 2002, 8:41:05 AM10/25/02

to Steve Fink, perl6-i...@perl.org

Steve Fink wrote:

> On Oct-24, Leopold Toetsch wrote:

> Thanks! It's a little scary how fast you are.

This depends on the RL work I've waiting to be done. The more is in the
queue (especially putting invoices together for the revenue office), the
more I'll code parrot stuff ;-)

leo

Juergen Boemmels

unread,

Oct 25, 2002, 10:23:06 AM10/25/02

to Leopold Toetsch, Steve Fink, perl6-i...@perl.org

Leopold Toetsch <l...@toetsch.at> writes:

[imcc...]

> >>>* Bytecode format
>
>
> > ... We could use existing ELF tools to, at the very least,
> > provide test result verification.
>
>
> This is an argument. If we get e.g. bsr fixup at load time done by the
> elf loader, it would be nice.
>
>
> OTOH fixup is not complicated (imcc does it), but when we have
> e.g. native dynamic libraries mixed with PBC, and ELF does the right
> thing, it would be an advantage. Using gdb is another nice feature -
> but what with different platforms not having all these tools?

For our own bytecode format there are also platforms not having this
tools: All. So its a bit less discriminating but not necessary better.

Dynamic libraries are not simple, and if we get it for free one
platform this is a good thing. Use it on other platforms is a matter
of porting ld.so (which is far from simple).

bye
b.
--
Juergen Boemmels boem...@physik.uni-kl.de
Fachbereich Physik Tel: ++49-(0)631-205-2817
Universitaet Kaiserslautern Fax: ++49-(0)631-205-3906
PGP Key fingerprint = 9F 56 54 3D 45 C1 32 6F 23 F6 C7 2F 85 93 DD 47

Dan Sugalski

unread,

Oct 25, 2002, 6:40:22 PM10/25/02

to Juergen Boemmels, perl6-i...@perl.org

At 5:28 PM +0200 10/24/02, Juergen Boemmels wrote:
>Dan Sugalski <d...@sidhe.org> writes:
>
>> >The ability to embed arbitrary data in a pbc file under a
>> >named section. This data needs to be readable by the program
>> >when it runs, but is otherwise ignored by the rest of Parrot.
>>
>> Right, good call. This'll make perl's named embedded filehandles
>> (__DATA__ and suchlike things--I'm pretty sure Larry and Damian have
>> Evil Things in mind for this at some point in perl 6) a lot easier as
>> well.
>
>My proposed extension of the packfile format is going in this
>direction. But I'm not sure at the moment not sure about string
>encodings in the segment directory. I was thinking about limiting to
>ASCII because its an internal. Allowing diffrent encodings opens a can
>of worms. UTF-8 only may also be a possiblity. Furthermore a part of
>the namespace should be reserved for internal use only. ATM I use
>all-caps names, but think also about dot-prefix.

I'm thinking something else, actually. Names made perfect sense
except for encoding info and duplication. We can put limits on the
name encoding if we want, but... really, who cares? It's only useful
for introspection purposes and while that's certainly important, I'm
not sure it's worth much hassle.

Instead, lets just give an entry number. We can have arbitrary data
chunk #1, #2, #3, and so on. I'm not sure it'll buy us much having
names attached.

> > A binary data chunk section with named directory for it (per bytecode
>> segment, I think) would work pretty well for this.
>
>I'm not sure if I understand you correctly: You talk about more than
>one bytecode segment in a packfile and each of them has its own
>associated directory with independed namespace. Trickey. But it should
>be possible. Having a root directory segment and sub directories.

Yep, but only a little.

>
>But it would be nice if you could write a new packfile. This would be
>very handy for writing compilers.

Writing a new packfile is definitely a different beast than altering
an existing one.

Brent Dax

unread,

Oct 25, 2002, 7:30:54 PM10/25/02

to Dan Sugalski, Juergen Boemmels, perl6-i...@perl.org

Dan Sugalski:
# I'm thinking something else, actually. Names made perfect sense
# except for encoding info and duplication. We can put limits on the
# name encoding if we want, but... really, who cares? It's only useful
# for introspection purposes and while that's certainly important, I'm
# not sure it's worth much hassle.
#
# Instead, lets just give an entry number. We can have arbitrary data
# chunk #1, #2, #3, and so on. I'm not sure it'll buy us much having
# names attached.

What happens if two tools (say, a custom debugger and the Perl compiler)
both use the same segment number for something? Names make collisions
less likely.

Dan Sugalski

unread,

Oct 25, 2002, 7:33:01 PM10/25/02

to Brent Dax, Juergen Boemmels, perl6-i...@perl.org

At 12:30 PM -0700 10/25/02, Brent Dax wrote:
>Dan Sugalski:
># I'm thinking something else, actually. Names made perfect sense
># except for encoding info and duplication. We can put limits on the
># name encoding if we want, but... really, who cares? It's only useful
># for introspection purposes and while that's certainly important, I'm
># not sure it's worth much hassle.
>#
># Instead, lets just give an entry number. We can have arbitrary data
># chunk #1, #2, #3, and so on. I'm not sure it'll buy us much having
># names attached.
>
>What happens if two tools (say, a custom debugger and the Perl compiler)
>both use the same segment number for something? Names make collisions
>less likely.

Whoever's writing the bytecode file needs to deal with
that--hopefully there's only one writer. I'm in the middle of getting
the API down on electrons, so we should have something to savage
reasonably soon.

Dan Sugalski

unread,

Oct 25, 2002, 9:26:14 PM10/25/02

to Juergen Boemmels, Leopold Toetsch, Steve Fink, perl6-i...@perl.org

At 12:23 PM +0200 10/25/02, Juergen Boemmels wrote:
>Leopold Toetsch <l...@toetsch.at> writes:
>
>[imcc...]
>
>> >>>* Bytecode format
>>
>>
>> > ... We could use existing ELF tools to, at the very least,
>> > provide test result verification.
>>
>>
>> This is an argument. If we get e.g. bsr fixup at load time done by the
>> elf loader, it would be nice.
>>
>>
>> OTOH fixup is not complicated (imcc does it), but when we have
>> e.g. native dynamic libraries mixed with PBC, and ELF does the right
>> thing, it would be an advantage. Using gdb is another nice feature -
>> but what with different platforms not having all these tools?
>
>For our own bytecode format there are also platforms not having this
>tools: All. So its a bit less discriminating but not necessary better.
>
>Dynamic libraries are not simple, and if we get it for free one
>platform this is a good thing. Use it on other platforms is a matter
>of porting ld.so (which is far from simple).

Dynamic libraries aren't really a player here, as we're not going to
be dynamically generating platform-native shared libraries on disk.
Bytecode yes, but that's definitely not the same thing.

FWIW, I really don't have any vested interest in any bytecode format
as long as we:

*) Standardize on one before release
*) Find or build one that can properly version and tag itself so we
can handle backwards compatibility
*) Get one that meets our needs

Rhys Weatherley

unread,

Oct 25, 2002, 11:13:15 PM10/25/02

to perl6-i...@perl.org

Dan Sugalski wrote:

> ># Instead, lets just give an entry number. We can have arbitrary data
> ># chunk #1, #2, #3, and so on. I'm not sure it'll buy us much having
> ># names attached.
> >
> >What happens if two tools (say, a custom debugger and the Perl compiler)
> >both use the same segment number for something? Names make collisions
> >less likely.
>
> Whoever's writing the bytecode file needs to deal with
> that--hopefully there's only one writer. I'm in the middle of getting
> the API down on electrons, so we should have something to savage
> reasonably soon.

I don't think you can guarantee that. Sooner or later someone
will download the packfile spec and write a stand-alone compiler
that generates bytecode directly, using none of the Parrot tools.
If such a compiler needs an extension section, what number do
they give it? (I'll be using imcc for C#, but others might want
to do things manually).

Numbers need to be centrally managed to prevent conflicts, because
it is impossible for an independent person to "make up a number"
and guarantee no conflicts. Names are easier to make unique, as
the name of the language/project/author/DNS name will normally
be unique enough to act as a prefix. No central management
required.

e.g. compare the MIME type system with SNMP's ASN.1 based object
identifiers. Picking a new MIME type out of thin air is easy.
Adding a new field identifier in SNMP requires massive co-ordination,
and sacrificing of large numbers of rubber chickens to the IETF gods.

Names are also easier to remember. Quick now: what is the MIME type
for HTML? What is the SNMP object identifier for the IP default TTL?

Cheers,

Rhys.

Dan Sugalski

unread,

Oct 26, 2002, 4:58:49 AM10/26/02

to perl6-i...@perl.org

At 9:13 AM +1000 10/26/02, Rhys Weatherley wrote:
>Dan Sugalski wrote:
>
>> ># Instead, lets just give an entry number. We can have arbitrary data
>> ># chunk #1, #2, #3, and so on. I'm not sure it'll buy us much having
>> ># names attached.
>> >
>> >What happens if two tools (say, a custom debugger and the Perl compiler)
>> >both use the same segment number for something? Names make collisions
>> >less likely.
>>
>> Whoever's writing the bytecode file needs to deal with
>> that--hopefully there's only one writer. I'm in the middle of getting
>> the API down on electrons, so we should have something to savage
>> reasonably soon.
>
>I don't think you can guarantee that. Sooner or later someone
>will download the packfile spec and write a stand-alone compiler
>that generates bytecode directly, using none of the Parrot tools.
>If such a compiler needs an extension section, what number do
>they give it? (I'll be using imcc for C#, but others might want
>to do things manually).

Huh? No, you misunderstand. Each chunk of the bytecode has a separate
TOC for stuff like this. The full identifier would be
file/chunk/entry, which should be reasonably guaranteed to be unique.
When the compiler's emitting code to reference a piece of binary data
(which is essentially a big binary string constant, but I realize
that having it in separate segments is terribly useful) it can turn
any human-readable identifier into the internal identifier the engine
needs to look up the actual data.

Brent Dax

unread,

Oct 26, 2002, 5:44:20 AM10/26/02

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski:
# Huh? No, you misunderstand. Each chunk of the bytecode has a separate
# TOC for stuff like this. The full identifier would be
# file/chunk/entry, which should be reasonably guaranteed to be unique.
# When the compiler's emitting code to reference a piece of binary data
# (which is essentially a big binary string constant, but I realize
# that having it in separate segments is terribly useful) it can turn
# any human-readable identifier into the internal identifier the engine
# needs to look up the actual data.

DIRECTORY:
SEG 1 OFFSET: 324
SEG 2 OFFSET: 2496
SEG 3 OFFSET: 32482
...

SEG 1:
TYPE: Line Locations
LENGTH: 2070
DATA: 101011101001...

I was thinking in terms of what TYPE: stores; it seems you were thinking
about how you identify a particular segment. Yeah, you can probably get
away with just numbering the segments, although that might slow things
down a bit when you're looking for a particular type of segment. (In
foo.pbc, the line location segment might be 1, but in bar.pbc, it's 2.)

BTW, my father (a programmer too, although most of his work is with
database-driven programs) suggested a solution that's half-way between
string and number: hash the string and use the hash as the number. With
a good hashing function (say, MD5 with the four chunks XORed together)
you'll probably be able to avoid collisions but still have unique
identifiers.

Gopal V

unread,

Oct 26, 2002, 5:17:28 AM10/26/02

to Dan Sugalski, perl6-i...@perl.org

If memory serves me right, Dan Sugalski wrote:
> Huh? No, you misunderstand. Each chunk of the bytecode has a separate
> TOC for stuff like this. The full identifier would be
> file/chunk/entry, which should be reasonably guaranteed to be unique.
> When the compiler's emitting code to reference a piece of binary data
> (which is essentially a big binary string constant, but I realize
> that having it in separate segments is terribly useful) it can turn
> any human-readable identifier into the internal identifier the engine
> needs to look up the actual data.

Are you suggesting something like JVM's .class's contant pool ? ...
viz a constant pool and use indexes stored in segments (Attributes for
JVM) to get the internal name ? ...

This model however is not very favourable for fast loading if you're going
in for this for everything... If it's just for custom segments, great !!!!.

Gopal

PS: I can't seem to post to perl6-internals :-(
--
The difference between insanity and genius is measured by success

Juergen Boemmels

unread,

Oct 28, 2002, 11:07:25 AM10/28/02

to Brent Dax, Dan Sugalski, perl6-i...@perl.org

"Brent Dax" <bren...@cpan.org> writes:

Thinking a little about it the Type-field the correct way. This would
allow diffrent __DATA__ segments with same type. The name of the
segment is a totally diffrent concept.

> BTW, my father (a programmer too, although most of his work is with
> database-driven programs) suggested a solution that's half-way between
> string and number: hash the string and use the hash as the number. With
> a good hashing function (say, MD5 with the four chunks XORed together)
> you'll probably be able to avoid collisions but still have unique
> identifiers.

The storage size is not really an issue. We use 32-bit opcodes
(sometimes even 64-bit). So we could store the name/type fields as
text in the file. The hash will be generated at load time (or the
first time a lookup_by_name is done.) This could be speed up if we
dump the hash to disc, but this would make the hash function part of
the Packfile definition.

As the bytecode itself knows at what index the segment lies, it won't
normaly lookup_by_name but rather lookup_by_index. If the directory is
aranged clever this can be done without reading the names.

DIRECTORY:
number_of_items
SEG1:
size
offset
flags
varlen_pos
SEG2:
size
offset
flags
varlen_pos
...
SEG1_varlen:
name
type
SEG2_varlen:
name
type
...

Dan Sugalski

unread,

Oct 28, 2002, 10:04:33 PM10/28/02

to Brent Dax, perl6-i...@perl.org

At 10:44 PM -0700 10/25/02, Brent Dax wrote:
>I was thinking in terms of what TYPE: stores; it seems you were thinking
>about how you identify a particular segment. Yeah, you can probably get
>away with just numbering the segments, although that might slow things
>down a bit when you're looking for a particular type of segment. (In
>foo.pbc, the line location segment might be 1, but in bar.pbc, it's 2.)

Well, on thinking a bit about this, there's no reason that we have to
worry--it's perfectly OK for us to declare, unconditionally, that
segment 0 is always bytecode, 1 line number info, and so on, with
everything after position X (for some value of X) left up in the air.
A bit dodgy, true, as it means that any new known segment types we
add in will be floating, but I don't think we're going to end up with
too many performance-critical pieces in the bytecode. (Arguably it's
just the bytecode itself, the symbols, and the constants, as the rest
are looked at under exceptional circumstances or on (rare) demand)

Juergen Boemmels

unread,

Oct 29, 2002, 10:46:11 AM10/29/02

to Perl6 Internals

Dan Sugalski <d...@sidhe.org> writes:

> Well, on thinking a bit about this, there's no reason that we have to
> worry--it's perfectly OK for us to declare, unconditionally, that
> segment 0 is always bytecode, 1 line number info, and so on, with
> everything after position X (for some value of X) left up in the
> air. A bit dodgy, true, as it means that any new known segment types
> we add in will be floating, but I don't think we're going to end up
> with too many performance-critical pieces in the bytecode. (Arguably
> it's just the bytecode itself, the symbols, and the constants, as the
> rest are looked at under exceptional circumstances or on (rare) demand)

*No*

This really kills extendability, or makes it at least very ugly. It
needs to prealloc a certain number of segments. Each of this has a
fixed semantic. Extending means consuming on of the preallocated
fields, or using some segment beyond the preallocated area but then it
needs a type field. In fact the preallocated segments also have a
type-field: the position in the packfile.

I'm fine with a numeric type for the core, and have some extension
type with named types if it is a speed issue, but I really dont like
this positional approach. BTW: COFF and ELF use named sections.

Brent Dax

unread,

Oct 29, 2002, 4:48:32 PM10/29/02

to boem...@physik.uni-kl.de, Perl6 Internals

boem...@physik.uni-kl.de:
# > Well, on thinking a bit about this, there's no reason that
# we have to
# > worry--it's perfectly OK for us to declare, unconditionally, that
# > segment 0 is always bytecode, 1 line number info, and so on, with
# > everything after position X (for some value of X) left up
# in the air.
# > A bit dodgy, true, as it means that any new known segment
# types we add
# > in will be floating, but I don't think we're going to end
# up with too
# > many performance-critical pieces in the bytecode. (Arguably
# it's just
# > the bytecode itself, the symbols, and the constants, as the
# rest are
# > looked at under exceptional circumstances or on (rare) demand)
#
# *No*
#
# This really kills extendability, or makes it at least very
# ugly. It needs to prealloc a certain number of segments. Each
# of this has a fixed semantic. Extending means consuming on of
# the preallocated fields, or using some segment beyond the
# preallocated area but then it needs a type field. In fact the
# preallocated segments also have a
# type-field: the position in the packfile.

How about this structure:

HEADER
SEGMENT 0
CHUNK 0 (DIRECTORY)
SIZE:
DATA:
CHUNK 0 ENTRY
TYPE: DIRECTORY (type 0)
OFFSET:
CHUNK 1 ENTRY
TYPE: e.g. BYTECODE (type 3)
OFFSET:
CHUNK 2 ENTRY
TYPE: e.g. CONSTTABLE (type 1)
OFFSET:
CHUNK 3 ENTRY
TYPE: e.g. FIXUP (type 2)
OFFSET:
CHUNK 4 ENTRY
TYPE: e.g. LINETABLE (type 4)
OFFSET:
CHUNK 1
SIZE:
DATA:
CHUNK 2
SIZE:
DATA:
CHUNK 3
SIZE:
DATA:
CHUNK 4
SIZE:
DATA:
SEGMENT 1
SEGMENT 2
SEGMENT 3
SEGMENT 4

Each chunk just holds its size and its data--the type is stored in the
directory. Chunk 0 is the only chunk with fixed meaning--it's always
the directory. There should only be one chunk per segment of the given
type. We'll reserve some of the types--say, up to 127 to be safe--and
let any outside tools use chunk numbers above that.

Dan Sugalski

unread,

Oct 29, 2002, 4:55:01 PM10/29/02

to Juergen Boemmels, Perl6 Internals

At 11:46 AM +0100 10/29/02, Juergen Boemmels wrote:
>Dan Sugalski <d...@sidhe.org> writes:
>
>> Well, on thinking a bit about this, there's no reason that we have to
>> worry--it's perfectly OK for us to declare, unconditionally, that
>> segment 0 is always bytecode, 1 line number info, and so on, with
>> everything after position X (for some value of X) left up in the
>> air. A bit dodgy, true, as it means that any new known segment types
>> we add in will be floating, but I don't think we're going to end up
>> with too many performance-critical pieces in the bytecode. (Arguably
>> it's just the bytecode itself, the symbols, and the constants, as the
>> rest are looked at under exceptional circumstances or on (rare) demand)
>
>*No*
>
>This really kills extendability, or makes it at least very ugly. It
>needs to prealloc a certain number of segments.

No, it doesn't. It needs to preallocate a few entries in the TOC at
the start of the chunk, but that's it. Not that much waste, even if
some of the metadata's in the TOC.

The point is to have a file format that does what we need it
to--present executable bytecode data to the interpreter--as fast as
possible. Everything else is secondary to that. The rest of the
metadata's needed, so it's there, but access to it doesn't need to be
as fast.

Yes, it'll only be a few microseconds, but it's a few microseconds
versus a few bytes in the file. Disk space is less dear than cycles.

> Each of this has a
>fixed semantic. Extending means consuming on of the preallocated
>fields, or using some segment beyond the preallocated area but then it
>needs a type field. In fact the preallocated segments also have a
>type-field: the position in the packfile.
>
>I'm fine with a numeric type for the core, and have some extension
>type with named types if it is a speed issue, but I really dont like
>this positional approach. BTW: COFF and ELF use named sections.

Sure, but bell-bottoms are in, people drink absinthe, and there are
folks that do street luge in San Francisco. Just because it's done by
other systems doesn't make it the right answer.

Brent Dax

unread,

Oct 29, 2002, 5:41:55 PM10/29/02

to brian wheeler, boem...@physik.uni-kl.de, Perl6 Internals

brian wheeler:
# Is this really necessary? Seems like a chicken-and-egg
# thing: to know which chuck the directory is in, you need to
# read the directory.
# However, since you've defined that the first chunk (0) is
# always the directory, there's really no need to have it in
# the directory since you know it has to be the first chunk.

It's essentially there as padding, so that you can do
directory->entries[chunkid] to get the entry for chunkid.

# Out of curiosity, would I need a separate segments if I was
# going to have multiple versions of the program (say, one
# debugging and one
# optimized) in the same file? It looks that way. Will the
# segment/chunk ids's be consistent between builds & how do I
# know what they will be in advance (for dynamically loading
# the 'debugging' version on demand) ?

*shrugs* Dan's the one who thinks multiple segments are important, not
me. :^)

Brian Wheeler

unread,

Oct 29, 2002, 5:04:11 PM10/29/02

to Brent Dax, boem...@physik.uni-kl.de, Perl6 Internals

Is this really necessary? Seems like a chicken-and-egg thing: to know
which chuck the directory is in, you need to read the directory.
However, since you've defined that the first chunk (0) is always the
directory, there's really no need to have it in the directory since you

know it has to be the first chunk.

Out of curiosity, would I need a separate segments if I was going to
have multiple versions of the program (say, one debugging and one
optimized) in the same file? It looks that way. Will the segment/chunk
ids's be consistent between builds & how do I know what they will be in
advance (for dynamically loading the 'debugging' version on demand) ?

Brian

Juergen Boemmels

unread,

Nov 4, 2002, 5:16:01 PM11/4/02

to Dan Sugalski, Perl6 Internals

Dan Sugalski <d...@sidhe.org> writes:

[...]

> No, it doesn't. It needs to preallocate a few entries in the TOC at
> the start of the chunk, but that's it. Not that much waste, even if
> some of the metadata's in the TOC.
>
>
> The point is to have a file format that does what we need it
> to--present executable bytecode data to the interpreter--as fast as
> possible. Everything else is secondary to that. The rest of the
> metadata's needed, so it's there, but access to it doesn't need to be
> as fast.
>
>
> Yes, it'll only be a few microseconds, but it's a few microseconds
> versus a few bytes in the file. Disk space is less dear than cycles.

[...]

Last weekend I tried to implement something like this. It took several
attemps to get the segemented bytecode, the preallocated toc-entries
and the need for speed together.

First idea was to have a global list of segments each with its own
TOC. But this has the problem, that the fixup between the diffrent
segments needs an extra indirection, and it limits the extenability to
adding chunks to a segment. A global table with a fixed number of
chunks per segments violates the rule of no arbitary limits. An other
problem is that that unpacking should be as seldom as possible before
the first bytecode is executed; other segements should only loaded on
demand.

My proposed solution is this:
There exists only on global TOC. This consits of a list of fixed sized
items one for each chunk of data. If no wordsizetransformation or
endianization are necessary this can be memmapped and each item can be
accessed by array lookup.

Each chunk is a binary blob of data in the packfile identified by a
(offset,size) pair in the TOC. Chunks don't overlap. They also take a
type field wich identifies the way to unpack the data.

Chunks can be grouped together. Therefore each directory item has a
skip field. The following items belong to its group. The group-head is
responsible for handling its children. The bytecode can have its
children at fixed positon, e.g CONSTANT 1, FIXUP 2, LINE_INFO 3, ...

Here is an example

items: 7
TOC:
item1:
offset: chunk1
size:
type: BYTECODE
skip: 4
item2:
offset: chunk2
size:
type: CONSTANT
skip: 1
item3:
offset: chunk3
size:
type: FIXUP
skip: 1
item4:
offset: chunk4
size:
type: LINE_INFO
skip: 1
item5:
offset: chunk5
size:
type: BYTECODE
skip: 2
item6:
offset: chunk6
size:
type: CONSTANT
skip: 1
item7:
offset: chunk7
size:
type: foobar
skip: 1

chunk1:
...
chunk2:
...
....

The first bytecode chunk can be found very fast. The TOC is memmapped,
and the chunks are searched for the first top level bytecode
segment. In a typical packfile this will be the first chunk.
If its turns out that we need _many_ segments, and they need to be
accessed in random order, it is possible to add a sibling table for
fast access of the individual bytecode segements. This is possible
without changing the general bytecode format, and without decreasing
performance.

One side note: It might be nice to have individual version numbers for
each chunk of data. A change of core.ops does not change the packfile
format, only the data in the bytecode section is invalid. But this
might go to far.

Comments?