I've added in the infrastructure needed to implement fixed opcode numbers.
There's now a file ops.num that holds the opcode name/number pairs for all
ops with a fixed number. Right now it's only got two entries (one of which
really ought to be redone) but I'd like to get all the extant standard ops
in there.
ops2c.pl has been adjusted to number appropriately, and all tests pass.
Note that I have *not* adjusted the JIT's numbering of the opcodes, and
the JIT currently fails with the ops.num file that's checked in. Once it
works right (don't touch ops.num until then!), and volunteers are welcome,
we can fill the file in right and be done with it.
Dan
> I've added in the infrastructure needed to implement fixed opcode numbers.
> There's now a file ops.num that holds the opcode name/number pairs for all
> ops with a fixed number.
I don't know yet, what are the goals of this patch. There is not any
sign in the list, that ops should be numbered like that and so on ...
WTF
Second, you for sure did ignore all comments in core.ops and my
summaries, how various things are *working* now.
This patch breaks all predereferenced cores as well as dynamic opcode
libraries at first sight.
If you don't have really *very* strong arguments for this patch then
please just undo it - now - thanks.
> Dan
leo
The goals are to assign permanent numbers to the opcodes. What the
numbers are is generally irrelevant, but they must be constant across
all systems for the lifetime of parrot.
>Second, you for sure did ignore all comments in core.ops and my
>summaries, how various things are *working* now.
No, I didn't. All I did was run through the generated output and
strip out values, making sure that end was 0. There really shouldn't
be any other magic numbers for opcodes, though I see there are some.
I can understand the noop code, though I don't see much reason to
make it a magic number. The rest I'm not seeing any need for magic
numbers--certainly not for the event checking ops, nor for the
wrapper op.
If they're needed now, then this is the time to make them not needed..
>This patch breaks all predereferenced cores as well as dynamic opcode
>libraries at first sight.
And the JIT, yes. (Though I was unaware the prederef cores broke)
That's fine. Fixing them is simple enough to do.
>If you don't have really *very* strong arguments for this patch then
>please just undo it - now - thanks.
I do. It needs doing, and it needs doing now, before we make more
changes to the runloops, to make sure things aren't more difficult to
fix later. If it's just a matter of changing a few op numbers in the
ops.num file, then that's fine--this is the time to change them.
--
Dan
--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk
At first glance, gut reaction, that seems "really hard"(tm) and
"really limiting"(tm)...
since there's going to be a combination of dynamic oplibs that are
going to have to be dynamically numbered anyway. (right?)
So why not treat the core oplib as dynamic? (With the exception of
the handful that need to be fixed, like end.)
-R
> The goals are to assign permanent numbers to the opcodes. What the
> numbers are is generally irrelevant, but they must be constant across
> all systems for the lifetime of parrot.
That's fine, basically. But there are some points, that we might
consider:
* sort op numbers by usage, so that we have better code locality
* sort ops by category. This could be useful to disable one block
of ops (e.g. IO) for the safe core
* leave them as they are *but* consider the numbers fixed - we
add new ops only to one opsfile (e.g. new.ops), that get appended
to core.pm, so that all other op numbers are fixed.
* Do this change later, when we have complete opcode support - a lot
of e.g. string ops related to PMCs are just missing now.
There is currently no urgent need, to have fix opcode numbers - we
don't have Parrot 1.00 (or even 0.1.0).
*If* we renumber opcodes, then lib/Parrot/OpLib/core.pm should be sorted.
This file is the base (and holds the numbering) of all utilities using
ops. When this file is ordered to our needs, *no* other changes (and
code duplication) in other utils is needed. For this reason, I reverted
your patches to ops2c and jit2 - they are just in the wrong files.
>>Second, you for sure did ignore all comments in core.ops and my
>>summaries, how various things are *working* now.
> No, I didn't. All I did was run through the generated output and
> strip out values, making sure that end was 0. There really shouldn't
> be any other magic numbers for opcodes, though I see there are some.
> I can understand the noop code, though I don't see much reason to
> make it a magic number. The rest I'm not seeing any need for magic
> numbers--certainly not for the event checking ops, nor for the
> wrapper op.
We have exactly one magic opcode that is "end". The other opcodes are
*constant*. The "noop" is considered constant for "hyterical reasons"
and as outlined in my summary to dynamic oplibs can be used for
optimizations (replace some code with op_func_table[CORE_OPS_noop]).
The C<wrapper__>, C<prederef__>, and C<check_events__> opcodes are
constant too, because they are inserted for dynamic opcodes,
prederefencing, and for event checking (which is not rediffed yet).
I don't care, what opcode numbers these have, they are defined in an
enum in oplib.h - but for debugging its by far simpler to seen an
opcode number 6 (prederef__) then to have 1227.
> If they're needed now, then this is the time to make them not needed..
They are needed and will be needed and they are constant - not magic.
>>This patch breaks all predereferenced cores as well as dynamic opcode
>>libraries at first sight.
> And the JIT, yes. (Though I was unaware the prederef cores broke)
> That's fine. Fixing them is simple enough to do.
Yes, s. above, generate core.pm in the desired order.
leo
Why not this way:
Have a small number of _real_ core.ops which have fixed assigned
numbers below say 256. This ops never change during the lifetime of
parrot. All other libs are inited (not necessary loaded) at byte-code
loadtime. The bytecode has a list of needed oplibs with the acompaning
base offset. The ops of that oplib are added to core starting at base
offset.
This scheme is extendable: Oplibs can append new ops to the end. If a
bytecode doesn't know anything about this new ops it will overwrite
this ops with ops from another oplib. But thats not a problem because
the bytecode does not need this new ops. Problems might be two
independ bytecodes using different versions of the oplibs. Then you
have either to renumber the bytecode or use two independent
opcode-tables.
We can for sure define a base-lib which has all whats now in core,
which is loaded by default starting at 256. A typical bytecode which
just uses the base-lib has no extra load cost, because the lib is
already there.
Comments
boe
--
Juergen Boemmels boem...@physik.uni-kl.de
Fachbereich Physik Tel: ++49-(0)631-205-2817
Universitaet Kaiserslautern Fax: ++49-(0)631-205-3906
PGP Key fingerprint = 9F 56 54 3D 45 C1 32 6F 23 F6 C7 2F 85 93 DD 47
Why is it hard? If we have a native interface and a well designed set of
basic opcodes, what else do we need?
My opinion on this has not changed in 2 years of Parrot.
I see no benefit to writing a VM if we are going to allow
all these "added" opcodes.
Only if the opcode libs are implemented in pure Parrot, where
they can be packaged along with the bytecode file, do I see
a real win for us, but even then, I can't understand why they
should be ops rather than subs or such. I'm afraid we
will just end up with the Perl5 situation where so many modules
require C to build and install, and we are also making the VM
more and more complex.
I've asked this before: Please, someone give me an example
where a dynamic opcode lib gives us something
that a well designed set of core ops and an extension
interface does not.
Please also include in your example the semantics of packaging
a portable bytecode.
-Melvin
> > The goals are to assign permanent numbers to the opcodes. What the
> > numbers are is generally irrelevant, but they must be constant across
> > all systems for the lifetime of parrot.
>
> At first glance, gut reaction, that seems "really hard"(tm) and
> "really limiting"(tm)...
Luckily, on second glance, it turns out to be neither. :)
While new ops get added, old ones rarely, if ever, get removed. Perl 5's
set has been stable, as has Python's, the JVM, the x86, Alpha, SPARC, and
POWER's...
> since there's going to be a combination of dynamic oplibs that are
> going to have to be dynamically numbered anyway. (right?)
Yes and no. Dynamic ops will go into slots, put in by name, with their
number fixed at compile-time. We'd prefer to do this as little as possible
for startup speed reasons, as it does slow down instantiating an
interpreter.
> So why not treat the core oplib as dynamic? (With the exception of
> the handful that need to be fixed, like end.)
Speed and complexity mainly. There's no win from the flexibility.
Dan
> Dan Sugalski <d...@sidhe.org> wrote:
> > At 1:07 AM +0200 10/17/03, Leopold Toetsch wrote:
> >>
> >>I don't know yet, what are the goals of this patch. There is not any
> >>sign in the list, that ops should be numbered like that and so on ...
>
> > The goals are to assign permanent numbers to the opcodes. What the
> > numbers are is generally irrelevant, but they must be constant across
> > all systems for the lifetime of parrot.
>
> That's fine, basically. But there are some points, that we might
> consider:
None of the numbers (well, OK, besides end) are fixed. I threw them in the
file in alphabetic order and renumbered specifically to make sure the JIT
broke good and proper. (Any renumbering broke the JIT, so I wanted to
make sure that they didn't get overlooked) I don't much care what
order they're in the final
> * Do this change later, when we have complete opcode support - a lot
> of e.g. string ops related to PMCs are just missing now.
> There is currently no urgent need, to have fix opcode numbers - we
> don't have Parrot 1.00 (or even 0.1.0).
Missing ops aren't a big deal--any op that isn't in the list gets a number
assigned to it, so there shouldn't be anything that needs to be done for
new ops besides occasionally going through and sticking them in the ops
file.
> *If* we renumber opcodes, then lib/Parrot/OpLib/core.pm should be sorted.
> This file is the base (and holds the numbering) of all utilities using
> ops. When this file is ordered to our needs, *no* other changes (and
> code duplication) in other utils is needed. For this reason, I reverted
> your patches to ops2c and jit2 - they are just in the wrong files.
Put them *back*, please, unless you're going to fix core.pm. This is one
of those cases where, like it or not, you get to deal with what I've done.
If the numbering should go into core.pm, fine, but until core.pm
returns explicit numbers (and nothing should be counting on the
ordering from it) leave it where it is.
> We have exactly one magic opcode that is "end". The other opcodes are
> *constant*. The "noop" is considered constant for "hyterical reasons"
> and as outlined in my summary to dynamic oplibs can be used for
> optimizations (replace some code with op_func_table[CORE_OPS_noop]).
> The C<wrapper__>, C<prederef__>, and C<check_events__> opcodes are
> constant too, because they are inserted for dynamic opcodes,
> prederefencing, and for event checking (which is not rediffed yet).
There's no need for wrapper__--the proper op numbers should be inserted
when the code is generated. I'll go dig, but I don't understand the reason
for prederef__ since anything the deref core needs should be done at
runtime. check_events__ is just another op and, once again, should just
get thrown in as need be.
Dan
> Why not this way:
>
> Have a small number of _real_ core.ops which have fixed assigned
> numbers below say 256. This ops never change during the lifetime of
> parrot. All other libs are inited (not necessary loaded) at byte-code
> loadtime. The bytecode has a list of needed oplibs with the acompaning
> base offset. The ops of that oplib are added to core starting at base
> offset.
Yep, that's the plan. (And has been for years, though it has been quite a
while since the discussion came up last)
Dan
I know we've had these discussions before but refresh my memory:
I am confused between:
a) Dynamic oplibs that are "on-demand" but always included in Parrot
b) Dynamic oplibs that are add-ons and not included in Parrot
I think (a) is a big win, especially for memory sensitive embedded platforms.
(b) is a big lose. It invites people to spend time writing custom ops rather
than reimplement their language in pure Parrot.
(b) also muddies the water when you try to explain to someone why
Parrot bytecodes are "sorta portable."
-Melvin
> At 12:53 PM 10/17/2003 +0200, Juergen Boemmels wrote:
> >Robert Spier <rsp...@pobox.com> writes:
> >
> > > > The goals are to assign permanent numbers to the opcodes. What the
> > > > numbers are is generally irrelevant, but they must be constant across
> > > > all systems for the lifetime of parrot.
> > >
> > > At first glance, gut reaction, that seems "really hard"(tm) and
> > > "really limiting"(tm)...
>
> Why is it hard? If we have a native interface and a well designed set of
> basic opcodes, what else do we need?
>
> My opinion on this has not changed in 2 years of Parrot.
> I see no benefit to writing a VM if we are going to allow
> all these "added" opcodes.
There are three reasons, two technical and one political.
The first, and perhaps least important (though we'll see with that) is to
allow very low-overhead subs. The current calling conventions are fine,
but there are going to be some time-critical cases that can't be rolled
into a single sub invocation. Making it an op function cuts out a lot of
the overhead.
The second reason is the political one -- it gives language designers an
out to modify the machine if they really need to do so.
The third is a variant of sorts on the second--there are languages that
are radically different from our core set that will likely be best served
by having a different set of ops handy. Prolog and Haskell spring to mind
and, while I don't know that they'll need a new set I can certainly see
the possibility.
Yes, it does mean that if library code written in ML (or whatever) uses
the ML ops that you'll need the ML op pack, but that's fine. It's a bit
more extreme than requiring an installed bytecode library, since hopefully
we'll be able to avoid that for standalone executables, but its no more
extreme than requiring a C library of one sort or another to be installed
for a program to run. (While Parrot is going to reduce the need for C it
won't eliminate it, and I don't see much difference between needing the
PDL C extension and the ML op library for a program to run)
Dan
> At 08:55 AM 10/17/2003 -0400, Dan Sugalski wrote:
> >On Fri, 17 Oct 2003, Juergen Boemmels wrote:
> >
> > > Why not this way:
> > >
> > > Have a small number of _real_ core.ops which have fixed assigned
> > > numbers below say 256. This ops never change during the lifetime of
> > > parrot. All other libs are inited (not necessary loaded) at byte-code
> > > loadtime. The bytecode has a list of needed oplibs with the acompaning
> > > base offset. The ops of that oplib are added to core starting at base
> > > offset.
> >
> >Yep, that's the plan. (And has been for years, though it has been quite a
> >while since the discussion came up last)
>
> I know we've had these discussions before but refresh my memory:
>
> I am confused between:
>
> a) Dynamic oplibs that are "on-demand" but always included in Parrot
> b) Dynamic oplibs that are add-ons and not included in Parrot
The only difference is that opcodes in the a) set are in libraries on-disk
that we build when we build parrot, and the ops in the b) set are in
libraries on disk that we installed potentially after the fact.
Which op libs are in which set depends on the distribution you're running.
It's definitely possible that we'll ship an all-in-one distrib with a big
set of optional ops, just as it's possible we'll ship a small one and
you'll need to install one or more langauge-specific sets of ops after the
fact.
Dan
> Juergen Boemmels <boem...@physik.uni-kl.de> wrote:
>
> > Have a small number of _real_ core.ops which have fixed assigned
> > numbers below say 256.
>
> The problem with this approach is the JIT/EXEC subsystem. Dynamically
> loaded oplibs and JIT don't play together. To make this work, it would
> need probably a total rewrite of JIT code.
> The switched core has some overhead too, you can't expand a switch
> statement like the other table based opcode dispatches.
The switched core's not that big a deal for this -- the default: case just
does a lookup in the table and dispatch that way.
Dan
> I've asked this before: Please, someone give me an example
> where a dynamic opcode lib gives us something
> that a well designed set of core ops and an extension
> interface does not.
Can you explain the difference? Dynamic opcode libraries are extensions
to our core opcode set. Its debatable what shall be an opcode or better
be a callable function. But opcodes are the primary execution parts of
the VM. When speed matters, a piece of code is better a separate opcode
then a function bound to some PMC.
> -Melvin
leo
> Have a small number of _real_ core.ops which have fixed assigned
> numbers below say 256.
The problem with this approach is the JIT/EXEC subsystem. Dynamically
loaded oplibs and JIT don't play together. To make this work, it would
need probably a total rewrite of JIT code.
The switched core has some overhead too, you can't expand a switch
statement like the other table based opcode dispatches.
> Comments
> boe
leo
>> *If* we renumber opcodes, then lib/Parrot/OpLib/core.pm should be sorted.
>> This file is the base (and holds the numbering) of all utilities using
>> ops. When this file is ordered to our needs, *no* other changes (and
>> code duplication) in other utils is needed. For this reason, I reverted
>> your patches to ops2c and jit2 - they are just in the wrong files.
> Put them *back*, please, unless you're going to fix core.pm.
I'll reorder core.pm (generation). Then jit2h (and other utils) aren't
broken. This puts the renumbering in one place. Your approach was just
wrong IMHO.
> If the numbering should go into core.pm, fine, but until core.pm
> returns explicit numbers (and nothing should be counting on the
> ordering from it) leave it where it is.
core.pm is generated by ops2pm and is used by all utilities that deal
with opcode libraries. So generating core.pm with the fixed numbering of
ops.num seems to be by far the simplest approach and doesn't break all
possible run cores.
>> We have exactly one magic opcode that is "end". The other opcodes are
>> *constant*.
> There's no need for wrapper__--the proper op numbers should be inserted
> when the code is generated.
When you run some code e.g. with the CGP core and you load an oplib,
that doesn't have a CGP flavor - only functions, then the wrapper__
opcode inside the CGP core calls the dynamic opcode function of the
dynamic oplib.
Or when running -j: JIT emits a call to the wrapper__ op, that then
executes the real function. The wrapper__ op might go away *if* we
demand, that all oplibs are generated in all run core flavors, and *if*
the oplib PMC is in the metadata, which depends on the freeze/thaw
interface. For now the wrapper__ is needed and has to be a constant
(well known) number.
> ... I'll go dig, but I don't understand the reason
> for prederef__ since anything the deref core needs should be done at
> runtime.
The prederefed copy of the bytecode is filled with the prederef__
opcode. When executed, it calls do_prederef, which fills in the real
function/opcode label/opcode number. Before this change, predereferncing
was done in advance for CGP and switch core. This isn't possible any
more because of dynamic oplibs. Predereferencing most be done just in
time.
> ... check_events__ is just another op and, once again, should just
> get thrown in as need be.
All these are plain ops, nothing special, except that their number is
defined in oplib.h. If ops renumbering patches/generates this enum with
matching opcode numbers all will continue to work.
But as outlined for debugging its just simpler, to have some small
numbers then 1227, so I'd prefer to have these ops (0-8) just in that
order.
> Dan
leo
(Note: I'd prefer to stay away from AST discussion here, I'm aware that we
eventually wish to pass AST directly to IMCC, but I'd like to shelve
that for
a different thread)
1. .class, .field and .method directive support
These will have to change the packfile format because we need to create the
class layout and symbol table at assembly time. We can probably do this
without
designing the Class PMC yet, we'll just have to fake it with a Hash PMC.
I'd rather have code like this:
.class Foo
.field f1
.field f2
.method Bar proto
#body
...
.endmethod
# No body
.nativemethod Baz noproto
.endclass
Than this:
.sub _init
$P0 = new Class
$P0[":class"] = "TestClass"
store_global "TestClass", $P0
# field
$P0["i"] = 0
# field
$P0["s"] = 0
$I0 = addr _TestClass__SampleMethod
$P1 = new Sub
$P1 = $I0
# method
$P0["SampleMethod"] = $P1
end
.end
#method Main
.sub _TestClass__SampleMethod
...
.end
Now you tell me which you'd rather debug. :)
Implicitly this creates all symbols for the class and at load time the correct
PMCs will be created (Method PMCs for the methods, plain types or PMCs for
the fields) and correct method addresses setup.
This abstracts how we implement classes and objects into the IMCC
compiler so we can change it without too much breakage to the
high level languages.
2. Abstraction of subroutine call and return mechanisms based on
compiler directives. I haven't completely figured out how this will look,
but the general idea is, given the correct directive to .sub or .method,
we should be able to hide all the details of the whole .pcc_call thing.
This is less important than (1) but I think it is warranted. Leo and I
already discussed this and we both think it is a good idea.
Probably rather than .sub and .pcc_sub we combine both into
.sub and provide directives like (proto, nonproto, parrotcall, fastcall)
where parrotcall might be the standard continuation calling convention
and the default for any .sub declared with no convention.
I will most likely start on (1) very soon as I'm to the point with Cola
that I need it and I don't like the hash code I'm currently emitting
directly from the code generator to make classes and objects.
My goal is to abstract enough so that PIR could be retargetted with
little pain.
-Melvin
I think we both know the difference. The ops are inlined, fast, cacheable
and jitted. Extensions may not be. If an extension is pure Parrot, then
it can also be jitted. The biggest overhead I see is when we have to
fetch the routine through a PMC lookup, that is too slow.
Here is the crux: This isn't a VM designed for raw speed. It is dynamic and
symbolic, and optimized in the right places, but that does not mean we have
to cut corners and make it messy. I can only see justification for extending
the opcode set in the case of some weird language that has something
Parrot doesn't support. In that case, I still say the proper thing to do is
patch
Parrot and include those ops in the standard distro.
Maybe what we are really debating is: What do we include in the standard Parrot
distribution. The fact that ops sit in separate libs doesn't bug me so much
as the idea of lots of little oplibs sitting on CPAN.
-Melvin
> Juergen Boemmels <boem...@physik.uni-kl.de> wrote:
>
> > Have a small number of _real_ core.ops which have fixed assigned
> > numbers below say 256.
>
> The problem with this approach is the JIT/EXEC subsystem. Dynamically
> loaded oplibs and JIT don't play together. To make this work, it would
> need probably a total rewrite of JIT code.
I'm really no expert in the JIT system. But from my last step-thru
session in the debugger I remeber that build_asm calls out to a
function based on the opcode. Ah, here:
(op_func[cur_opcode_byte].fn) (jit_info, interpreter);
At build_asm time the oplib must already be loaded. The oplib must
then have funcions for JIT-emitting the code, otherwise it should just
emit calls to the slow core functions.
This means that the author of an oplib must also include a
JIT-emitting table (for all supported platforms). This raises the
level for writing oplibs, but the worst case is that the new oplib
under jit is as slow as the function-core (All ops call
functions). Ideally the JIT-emitting functions could be mechanically
created out of the *.ops files.
> The switched core has some overhead too, you can't expand a switch
> statement like the other table based opcode dispatches.
Thats definitly a problem.
But this can be solved by first finding out to which oplib a code
belongs (This can be done really fast by using bisection or something
similar). Then call the right switch function and stay in this
function as long as you are in this oplib.
Pseudocode:
while (next_op) {
if (op < base || op > base+num_ops) return
switch (op-base) {
...
}
}
As oplib-switches are slow it might be a good idea to include the core
ops to the switch function, so the core is always dispatched fast.
> > Comments
> > boe
>
> leo
> Put them *back*, please, unless you're going to fix core.pm.
Done. op2pm.pl now has the renumbering. Its simpler and cleaner.
Now, that 1237 opcodes are fixed numbered, shall we remove the
finger-printing? Currently we just don't read PBCs, when the
finger-print (generated from core.pm) doesn't match.
> Dan
leo
We should keep the fingerprinting, since we can still have an issue where
new bytecode can't run on old interpreters, but it's probably time to take
another look at how we set the fingerprint.
Dan
The fingerprint is an ugly hack:
It builds an MD5-hash over the signatures and stores the first 10
bytes in the reserved slots of the parrot header. The reason why it
was introduced was that the bytecode_numbers were floating, and each
change led to hard to debug errors. When these opcodes are fixed now
then the fingerprint should go out.
If we want to keep the fingerprint it should be generated from the
ops.num file, and it should be moved out of the header. The only
reason I put it there was that by this time we used the nonextendable
Version 0 bytecodeformat and had two independ ways of creating
assembly (which is now dead).
bye
boe
>> We should keep the fingerprinting, since we can still have an issue where
>> new bytecode can't run on old interpreters, but it's probably time to take
>> another look at how we set the fingerprint.
> If we want to keep the fingerprint it should be generated from the
> ops.num file,
Appending opcodes to ops.num doesn't break existing byte code. I'm
thinking of this: We have a new file named COMPATIBLE or such, and do a
MD5 sum over this file. Whenever we make an incompatible change that
breaks existing byte code, the change is documented in that file, which
automatically invalidates existing PBCs.
> ... and it should be moved out of the header.
Yes. I'm thinking of a Version PMC, living in the metadata.
VTABLE_is_same() could check, if Parrot's and the PBCs version aren't
conflicting. VTABLE_cmp() could compare if a loaded extension or module
is compatible. The Version PMC could hold a STRING or an Array of
STRINGs, that represent the version or version range(s).
Proposals for such version checking are very welcome.
> bye
> boe
leo