Hi all, I have a newbie question. If the answer exists in a doc, just point the way (I browsed the docs directory). What is the design rationale for so many opcodes in parrot? What are the criteria for adding/deleting them?
Matt Greenwood wrote: > I have a newbie question. If the answer exists in a doc, just > point the way (I browsed the docs directory). What is the design > rationale for so many opcodes in parrot?
Let me try as another newbie... ;-)
Since the opcodes of parrot are not directly supported by any existing hardware, at least not now ;-), they have to be mapped to native code during execution. This costs something per parrot-operation. So if there are many different opcodes in parrot with powerful functionality behind them, this overhead does not hurt so much, because a parrot instruction gets a lot of stuff done. At least I heard this kind of explanation for Perl5, which uses something slightly like parrot internally as well.
Maybe this reduces the answer by the real experts to a yes/no? ;-)
Matt Greenwood <Matt.Greenw...@twosigma.com> wrote: > Hi all, > I have a newbie question. If the answer exists in a doc, just > point the way (I browsed the docs directory). What is the design > rationale for so many opcodes in parrot?
We have four different register types. They have to be covered by opcode, which leads to a lot of opcode permutations:
$ grep -w add docs/ops/math.pod =item B<add>(inout INT, in INT) =item B<add>(inout NUM, in INT) =item B<add>(inout NUM, in NUM) =item B<add>(in PMC, in INT) =item B<add>(in PMC, in NUM) =item B<add>(in PMC, in PMC) =item B<add>(out INT, in INT, in INT) =item B<add>(out NUM, in NUM, in INT) =item B<add>(out NUM, in NUM, in NUM) =item B<add>(in PMC, in PMC, in INT) =item B<add>(in PMC, in PMC, in NUM) =item B<add>(in PMC, in PMC, in PMC)
We could of course only provide the very last one but that would prohibit any optimizations. Opcodes with native types running in the JIT code are may tenths faster then their PMC counterparts.
> ... What are the criteria for > adding/deleting them?
[Matt == Matt.Greenw...@twosigma.com on Thu, 11 Mar 2004 18:06:56 -0500]
Matt> What is the design rationale for so many opcodes in parrot?
Completeness and performance. Many of the opcodes are type-specific variants of other multi-type opcodes.
Given that 99+% of parrot code will be automatically generated from language compilers, the performance benefits of additional specialized opcodes outweighs the inability to keep all the opcodes in a human's head at once.
Matt> What are the criteria for adding/deleting them?
Consensus among parrot developers. To be an opcode, a particular function should really need to be implemented in C to work properly.
I completely agree that you would have multiple *of the same* opcode for the different types. I guess the question I was (too delicately) asking, is why you have opcodes that are usually in standard libraries, and even some that aren't. For example; fact, exsec..., why have both concat and add...?
> -----Original Message----- > From: Leopold Toetsch [mailto:l...@toetsch.at] > Sent: Friday, March 12, 2004 2:07 AM > To: Matt Greenwood > Cc: perl6-intern...@perl.org > Subject: Re: newbie question....
> Matt Greenwood <Matt.Greenw...@twosigma.com> wrote: > > Hi all, > > I have a newbie question. If the answer exists in a doc, just > > point the way (I browsed the docs directory). What is the design > > rationale for so many opcodes in parrot?
> We have four different register types. They have to be covered by > opcode, which leads to a lot of opcode permutations:
> $ grep -w add docs/ops/math.pod > =item B<add>(inout INT, in INT) > =item B<add>(inout NUM, in INT) > =item B<add>(inout NUM, in NUM) > =item B<add>(in PMC, in INT) > =item B<add>(in PMC, in NUM) > =item B<add>(in PMC, in PMC) > =item B<add>(out INT, in INT, in INT) > =item B<add>(out NUM, in NUM, in INT) > =item B<add>(out NUM, in NUM, in NUM) > =item B<add>(in PMC, in PMC, in INT) > =item B<add>(in PMC, in PMC, in NUM) > =item B<add>(in PMC, in PMC, in PMC)
> We could of course only provide the very last one but that would > prohibit any optimizations. Opcodes with native types running in the JIT > code are may tenths faster then their PMC counterparts.
> > ... What are the criteria for > > adding/deleting them?
>Hi all, > I have a newbie question. If the answer exists in a doc, just >point the way (I browsed the docs directory). What is the design >rationale for so many opcodes in parrot? What are the criteria for >adding/deleting them?
Whether we have a lot or not actually depends on how you count. (Last time I checked the x86 still beat us, but that was a while back) In absolute, unique op numbers we have more than pretty much any other processor, but that is in part because we have *no* runtime op variance.
For example, if you look you'll see we have 28 binary "add" ops. .NET, on the other hand, only has one, and most hardware CPUs have a few, two or three. However... for us each of those add ops has a very specific, fixed, and invariant parameter list. The .NET version, on the other hand, is specified to be fully general, and has to take the two parameters off the stack and do whatever the right thing is, regardless of whether they're platform ints, floats, objects, or a mix of these. With most hardware CPUs you'll find that several bits in each parameter are dedicated to identifying the type of the parameter (int constant, register number, indirect offset from a register). In both cases (.NET and hardware) the engine needs to figure out *at runtime* what kind of parameters its been given. Parrot, on the other hand, figures out *at compiletime*.
Now, for hardware this isn't a huge deal--it's a well-known problem, they've a lot of transistors (and massive parallelism) to throw at it, and it only takes a single pipeline stage to go from the raw to the decoded form. .NET does essentially the same thing, decoding the parameter types and getting specific, when it JITs the code. (And runs pretty darned slowly when running without a JIT, though .NET was designed to have a JIT always available)
Parrot doesn't have massive parallelism, nor are we counting on having a JIT everywhere or in all circumstances. We could waste a bunch of bits encoding type information in the parameters and figure it all out at runtime, but... why bother? Since we *know* with certainty at compile (or assemble) time what the parameter types are, there's no reason to not take advantage of it. So we do.
It's also important to note that there's no less code involved (or, for the hardware, complexity) doing it our way or the decode-at-runtime way--all the code is still there in every case, since we all have to do the same things (add a mix of ints, floats, and objects, with a variety of ways of finding them) so there's no real penalty to doing it our way. It actually simplifies the JIT some (no need to puzzle out the parameter types), so in that we get a win over other platforms since JIT expenses are paid by the user every run, while our form of decoding's only paid when you compile.
Finally, there's the big "does it matter, and to whom?" question. As someone actually writing parrot assembly, it looks like parrot only has one "add" op--when emitting pasm or pir you use the "add" mnemonic. That it gets qualified and assembles down to one variant or another based on the (fixed at assemble time) parameters is just an implementation detail. For those of us writing op bodies, it just looks like we've got an engine with full signature-based dispatching (which, really, we do--it's just a static variant), so rather than having to have a big switch statement or chain of ifs at the beginning of the add op we just write the specific variants identified by function prototype and leave it to the engine to choose the right variant.
Heck, we could, if we chose, switch over to a system with a single add op with tagged parameter types and do runtime decoding without changing the source for the ops at all--the op preprocessor could glom them all together and autogenerate the big switch/if ladder at the head of the function. (We're not going to, of course, but we could. Heck, it might be worth doing if someone wanted to translate parrot's interpreter engine to hardware, though it'd have bytecode that wasn't compatible with the software engine)
As for what the rationale is... well, it's a combination of whim and necessity for adding them, and brutal reality for deleting them.
Our ops fall into two basic categories. The first, like add, are just basic operations that any engine has to perform. The second, like time, are low-level library functions. (Where the object ops fall is a matter of some opinion, though I'd put most of them in the "basic operation" category)
For something like hardware, splitting standard library from the CPU makes sense--often the library requires resources that the hardware doesn't have handy. (I wouldn't, for example, want to contemplate implementing time functions with cross-timezone and leap-second calculations with a mass 'o transistors. The System/360 architecture has a data-formatting instruction that I figure had to tie up a good 10-15% of the total CPU transistors when it was first introduced) Hardware is also often bit-limited--opcodes need to fit in 8 or 9 bits.
For things like the JVM or .NET, opcodes are also bit-limited (though there's much less of a real reason to do so) since they only allocate a byte for their opcode number. Whether that's a good idea or not depends on the assumptions underlying the design of their engines--a lot of very good people at Sun and Microsoft were involved in the design and I fully expect the engines met their design goals.
Parrot, on the other hand, *isn't* bit-limited, since our ops are 32 bits. (A more efficient design on RISC systems where byte-access is expensive) That opens things up a bunch.
If you think about it, the core opcode functions and the core low-level libraries are *always* available. Always. The library functions also have a very fixed parameter list. Fixed parameter list, guaranteed availability... looks like an opcode function to me. So they are. We could make them library functions instead, but all that'd mean would be that they'd be more expensive to call (our sub/method call is a bit heavyweight) and that you'd have to do more work to find and call the functions. Seemed silly.
Or, I suppose, you could think of it as if we had *no* opcodes at all other than end and loadoplib. Heck, we've a loadable opcode system--it'd not be too much of a stretch to consider all the opcode functions other than those two as just functions with a fast-path calling system. The fact that a while bunch of 'em are available when you start up's just a convenience for you.
So, there ya go. We've either got two, a reasonable number the same as pretty much everyone else, an insane number of them, or the question itself is meaningless. Take your pick, they're all true. :) -- Dan
--------------------------------------"it's like this"------------------- Dan Sugalski even samurai d...@sidhe.org have teddy bears and even teddy bears get drunk
>I completely agree that you would have multiple *of the same* opcode for >the different types. I guess the question I was (too delicately) asking, >is why you have opcodes that are usually in standard libraries, and even >some that aren't. For example; fact, exsec...,
I answered this in some detail, but the short answer is "There's no reason not to"
>why have both concat and >add...?
Erm... because they do completely different things?
> > -----Original Message----- >> From: Leopold Toetsch [mailto:l...@toetsch.at] >> Sent: Friday, March 12, 2004 2:07 AM >> To: Matt Greenwood >> Cc: perl6-intern...@perl.org >> Subject: Re: newbie question....
>> Matt Greenwood <Matt.Greenw...@twosigma.com> wrote: >> > Hi all, >> > I have a newbie question. If the answer exists in a doc, just >> > point the way (I browsed the docs directory). What is the design >> > rationale for so many opcodes in parrot?
>> We have four different register types. They have to be covered by >> opcode, which leads to a lot of opcode permutations:
>> $ grep -w add docs/ops/math.pod >> =item B<add>(inout INT, in INT) >> =item B<add>(inout NUM, in INT) >> =item B<add>(inout NUM, in NUM) >> =item B<add>(in PMC, in INT) >> =item B<add>(in PMC, in NUM) >> =item B<add>(in PMC, in PMC) >> =item B<add>(out INT, in INT, in INT) >> =item B<add>(out NUM, in NUM, in INT) >> =item B<add>(out NUM, in NUM, in NUM) >> =item B<add>(in PMC, in PMC, in INT) >> =item B<add>(in PMC, in PMC, in NUM) >> =item B<add>(in PMC, in PMC, in PMC)
>> We could of course only provide the very last one but that would >> prohibit any optimizations. Opcodes with native types running in the >JIT >> code are may tenths faster then their PMC counterparts.
>> > ... What are the criteria for >> > adding/deleting them?
>> On demand :)
>> > Thanks, >> > Matt
>> leo
-- Dan
--------------------------------------"it's like this"------------------- Dan Sugalski even samurai d...@sidhe.org have teddy bears and even teddy bears get drunk
How, exactly, is taking two strings, making a third string that's big enough to contain both, and copying the contents of those two strings into the third one like taking two numbers, doing a binary OR with carry, and storing the result in a third number?
Some languages overload addition to do both. Other languages don't; in fact, a Perl add and a Perl concat (to take one example) behave very differently from one another.
Generally speaking, it's better for compilers to do a bit of extra work to figure out the argument types involved than it is for them to throw away information they already have. (Besides, it's not that big a deal with PMCs--a PythonString can put the same code in its concat_*() and add_*() vtable entries.)
-- Brent "Dax" Royal-Gordon <br...@brentdax.com> Perl and Parrot hacker
> How, exactly, is taking two strings, making a third string that's big > enough to contain both, and copying the contents of those two strings > into the third one like taking two numbers, doing a binary OR with > carry, and storing the result in a third number?
Firstly, you have made an assumption that the addition here is equivalent to OR and carry, which may be correct for certain representations of integral datatypes, but certainly isn't for any kind of floating point arithmetic that I know of.
Secondly, you missed the point that I was making. The current add opcodes defined in parrot are the following:
add (in PMC, in PMC, in PMC) add(in PMC, in INT) add(in PMC, in NUM) add(in PMC, in PMC) add(in PMC, in PMC, in INT) add(in PMC, in PMC, in NUM) add(inout INT, in INT) add(inout NUM, in INT) add(inout NUM, in NUM) add(out INT, in INT, in INT) add(out NUM, in NUM, in INT) add(out NUM, in NUM, in NUM)
I was simply asking why there wasn't an
add(out STR, in STR, in STR)
which seems reasonable. This is not a question of operator overloading, but rather semantics - that's all.
> Some languages overload addition to do both. Other languages don't; in > fact, a Perl add and a Perl concat (to take one example) behave very > differently from one another.
Ahh yes, but this includes implicit type conversion, which is not what you want to do in Parrot (if I am to understand Dan correctly)
DanS> Right now it's flat-out disallowed in parrot, and I'm also DanS> comfortable with that. (Plan on keeping it that way, honestly)
> Generally speaking, it's better for compilers to do a bit of extra work > to figure out the argument types involved than it is for them to throw > away information they already have. (Besides, it's not that big a deal > with PMCs--a PythonString can put the same code in its concat_*() and > add_*() vtable entries.)
Agreed, though in this case it's the opposite. The compiler doesn't need to do any extra work because it knows exactly what argument types it has.
> Firstly, you have made an assumption that the addition here is > equivalent to OR and carry, which may be correct for certain > representations of integral datatypes, but certainly isn't for any > kind of floating point arithmetic that I know of.
True enough, but I think I got my point across--concatenation is a fundamentally different operation from addition.
> Secondly, you missed the point that I was making. The current add > opcodes defined in parrot are the following: > (various combinations of PMC, INT, and NUM) > > I was simply asking why there wasn't an > > add(out STR, in STR, in STR) > > which seems reasonable. This is not a question of operator > overloading, but rather semantics - that's all.
I suppose that depends on what you want it to do. If you want it to convert $2 and $3 to integers, add them, convert the result to a string, and put it in $1, then the answer is "that's not a common enough operation to warrant adding the extra opcodes"--especially since the I/S/N registers aren't supposed to be used for anything but optimizations.
If you want it to concatenate $2 and $3 and insert the result into $1, and remove the "concat" opcode altogether...well, the answer stems from the existence of add(in PMC, in PMC, in PMC). What should that do--integer addition, or string concatenation? Remember, some of our languages don't overload add for strings. We need a separate concat(in PMC, in PMC, in PMC), so we might as well have concat(out STR, in STR, in STR) too.
-- Brent "Dax" Royal-Gordon <br...@brentdax.com> Perl and Parrot hacker
> For example, if you look you'll see we have 28 binary "add" ops. > .NET, on the other hand, only has one, and most hardware CPUs have a
Actually, there are three opcodes: add, add.ovf, add.ovf.un (the last two throw an exception on overflow with signed or unsigned addition: does parrot have any way to detect oveflow?).
> few, two or three. However... for us each of those add ops has a very > specific, fixed, and invariant parameter list. The .NET version, on > the other hand, is specified to be fully general, and has to take the > two parameters off the stack and do whatever the right thing is, > regardless of whether they're platform ints, floats, objects, or a > mix of these. With most hardware CPUs you'll find that several bits
Well, not really: add is specified for fp numbers, 32-bit ints, 64-bit ints and pointer-sized ints. Addition of objects or structs is handled by the compiler (by calling the op_Addition static method if it exists, otherwise the operation is not defined for the types). Also, no mixing is allowed, except between 32-bit ints ant pointer-sized ints, conversions, if needed, need to be inserted by the compiler.
> in each parameter are dedicated to identifying the type of the > parameter (int constant, register number, indirect offset from a > register). In both cases (.NET and hardware) the engine needs to > figure out *at runtime* what kind of parameters its been given.
Well, on hardware the opcodes are really different, even if it may look like they have a major opcode and a sub-opcode specifying the type.
> the decoded form. .NET does essentially the same thing, decoding the > parameter types and getting specific, when it JITs the code. (And > runs pretty darned slowly when running without a JIT, though .NET was > designed to have a JIT always available)
Yes, so it doesn't matter:-) It's like saying that x86 code runs slow if you run it in an emulator:-) It's true, but almost nobody cares (especially since IL code can now be run with a jit on x86, ppc, sparc and itanium - s390, arm, amd64 are in the works).
> Parrot doesn't have massive parallelism, nor are we counting on > having a JIT everywhere or in all circumstances. We could waste a > bunch of bits encoding type information in the parameters and figure > it all out at runtime, but... why bother? Since we *know* with > certainty at compile (or assemble) time what the parameter types are, > there's no reason to not take advantage of it. So we do.
Sure, doing things as java does, with different opcodes for different types is entirely reasonable if you design a VM for interpretation (though arguably there should be a limit to the combinatorial explosion of different type arguments). There is only a marginal issue with generics code that the IL way of doing opcodes allows and the java style does not, but it doesn't matter much.
> real penalty to doing it our way. It actually simplifies the JIT some > (no need to puzzle out the parameter types), so in that we get a win > over other platforms since JIT expenses are paid by the user every > run, while our form of decoding's only paid when you compile.
This overhead is negligible (and is completely avoided by using the ahead of time compilation feature of mono).
> Finally, there's the big "does it matter, and to whom?" question. As > someone actually writing parrot assembly, it looks like parrot only > has one "add" op--when emitting pasm or pir you use the "add" > mnemonic. That it gets qualified and assembles down to one variant or
Well, as you mention, someone has to do it and parrot needs to do it anyway for runtime-generated parrot asm (if parrot doesn't do it already I guess it will need to do it anyway to support features like eval etc.). Anyway, if you're going to JIT it doesn't matter if you use one opcode for add or one opcode for each different kind of addition. If you're going to interpret the bytecode, having specific opcodes makes sense.
> For things like the JVM or .NET, opcodes are also bit-limited (though > there's much less of a real reason to do so) since they only allocate > a byte for their opcode number. Whether that's a good idea or not
Don't know about the JVM, but the CLR doesn't have a single byte limit for opcodes: two byte opcodes are already specified (and if you consider prefix opcodes you could say there are 3 and 4 bytes opcodes already: unaligned.volatile.cpblk is such an opcode). Also, the design allows for any number of bytes per opcode, though I don't think that will be ever needed: the CLR is designed to provide a fast implementation of the low-level opcodes and to provide fast method calls: combining the two you can implement rich semantics in a fast way without needing to change the VM. There are still a few rough areas that could use a speedup with specialized opcodes, but there are very few of them and 2-3 additional opcodes will fix them.
> Parrot, on the other hand, *isn't* bit-limited, since our ops are 32 > bits. (A more efficient design on RISC systems where byte-access is > expensive) That opens things up a bunch.
Note that it also uses much more data cache (and disk space): this may become relevant especially if parrot is to target embedded systems. Anyone has done measurments on real-life code to see how much disk space is used (data cache effects could be measured with cpu counters, but it's much more difficult)? For example adding two regs and storing them in a third requires 16 bytes of bytecode in parrot. The same expression takes 4 bytes in IL code in the best case, 7 in more complex but probably more common methods. The maximum is 13 bytes (in the CLR operations happen on the eval stack, so a single byte is enough, but I added the opcodes needed to load two local vars and to store the result: you can consider the CLR a mixed stack and register machine, but, unlike parrot, there can be as much as 65535 registers each with their own type). Anyway, please consider this issue: I'd suggest at least to use a single opcode_t to store the indexes to the argument and result registers for an opcode. This would cut down the space required to 8 bytes, still bigger than IL code, but much more comparable (unless, of course, opcode_t is changed to be 8 bytes on some platforms...).
> functions also have a very fixed parameter list. Fixed parameter > list, guaranteed availability... looks like an opcode function to me. > So they are. We could make them library functions instead, but all > that'd mean would be that they'd be more expensive to call (our > sub/method call is a bit heavyweight) and that you'd have to do more > work to find and call the functions. Seemed silly.
Well, a different solution is to speedup function calls: I imagine nobody would be against that:-)
> So, there ya go. We've either got two, a reasonable number the same > as pretty much everyone else, an insane number of them, or the > question itself is meaningless. Take your pick, they're all true. :)
An issue I think you should consider as well with the current parrot design is this: the last time I built parrot there were 180 vtable slots (in vtable.dump: not sure this is the actual number, but it seems reasonable). 4 of them are because of add, for example. This means that for each type, on a 32 bit system, at least 180*4 bytes are spent on the vtable. How likely is it that the vtable will grow when parrot starts getting some real use with compilers starting to target it? For a moderately complex app that uses 500 different types that amounts to more than 350 KB of memory already just for the vtables. Or are you going to discourage the definition of new PMC types and to do vtable dispatching in a different language-specific way?
Thanks. lupus
-- ----------------------------------------------------------------- lu...@debian.org debian/rules lu...@ximian.com Monkeys do it better
On Fri, Mar 12, 2004 at 10:03:19AM -0500, Dan Sugalski wrote: > At 6:06 PM -0500 3/11/04, Matt Greenwood wrote: > >Hi all, > > I have a newbie question. If the answer exists in a doc, just > >point the way (I browsed the docs directory). What is the design > >rationale for so many opcodes in parrot? What are the criteria for > >adding/deleting them?
> Whether we have a lot or not actually depends on how you count. (Last
Is someone tracking the mailing list and adding questions and (good) answers into the FAQ?