we've started a project (will have a web-page soon) that aims to port the php-language (www.php.net) to run on top of parrot. we've written some initial code and i'm kinda stuck while writing the codegen (i target imc)
my problem is that php is typeless. i am writing code to to some basic type-recognition in my ast to be able to leverage the much speedier I S and N types in parrot, but for obvious reasons i have to make the php basic type an PMC. problem now is that using pmc for everything will make execution much much slower that it could be.
.pcc_sub _main prototyped .sym PerlUndef TEMP0 TEMP0 = new PerlUndef .sym PerlUndef TEMP1 TEMP1 = new PerlUndef .sym PerlUndef TEMP2 TEMP2 = new PerlUndef .sym PerlUndef TEMP3 TEMP3 = new PerlUndef .sym PerlUndef TEMP4 TEMP4 = new PerlUndef .sym PerlUndef i i = new PerlUndef i = 0 LBL0: TEMP1 = 100 I0 = 0 unless i < TEMP1 goto LBL3 I0 = 1 LBL3: unless I0 goto LBL2 TEMP4 = "hallo" print TEMP4 TEMP2 = 5.5 TEMP3 = i + TEMP2 i = clone TEMP3 goto LBL0 LBL2: .end
(i know it can be made better - working on it) - but the basic dilemma is obvious: $i starts as an INT and becomes a FLOAT at runtime. so my compiler made it a PMC. (i know that i'll at one point have to write my own PMC for the PHP base-type, but for now PerlUndef works for me).
because pmc as so much slower that native types i loose most of parrots brilliant speed due to the fact that i don't know the types at compile-time.
i believe you'll face the same problem once perl6 becomes a reality on parrot so here's my question (i've only been on perl6-internals for some week, so sorry if this has been discussed in depth before):
would't it make sense to introduce one additional new type into parrot that can "morph" between the normal types (I, S, N, P)?
typedef struct { ValType type; union { INTVAL i; NUMBER n; STRING s; ... PMC p; } u; } VAL;
and add support for this in the executor in the most efficient way.
if you think this is unneeded could you explain how you are planning to make untyped languages run really fast on parrot and what direction i have to take to use that technique? (PMC will always carry the vtable jump overhead, so they will be slower than what perl or php use currently in the inner execution loop)
i'd also like to add that eg PHP has different semantics when you do binary operations on strings that perl has so this VAL type would have to be as flexible as a PMC. (PHP would have to use a different implementation than Perl6 or Python)
At 08:54 PM 11/5/2003 +0100, Thies C. Arntzen wrote:
> we've started a project (will have a web-page soon) that aims to > port the > php-language (www.php.net) to run on top of parrot. we've written > some
First of all, welcome! :)
I'll read your mail in depth and send a followup reply, but right off I have a suggestion.
In loops, your compiler should be able to analyse the full loop construct and detect what type to make the iterator. In the case you provided you don't need to make $i a PMC.
a) You could do the simple approach which wouldn't require much code analysis and just start $i as an INT, then morph it yourself with another temporary to a FLOAT.
b) The compiler can analyze the for() loop AST and the local basic blocks and detect that $i can simply be defined as a FLOAT register to start with to skip the eventual morph instruction.
Our eventual goal is to add optimization into IMCC that tries to use non-PMC registers where possible. For starters, your high level compiler should be able to analyze any assignment statement or expression to determine the type from the operands. You should be able to use non-PMCs anytime the source operands can be typed (literal or temporary).
In the case that you cannot determine it, then use a PMC. You'll have to do dataflow analysis if you want any more than that because once the data is morphed to a PMC, you'll have trouble typing it again to get it back without inserting some fancy conditional code.
>Actually that was pretty good for an early version. You could help IMCC >out by not creating those PerlUndefs until you're going to assign to >them.
Also, anytime you use a temporary to assing a constant literal, you should be able to use a I/S/N reg. Parrot's ops are polymorphic so you should have a lot of flexibility even with the basic types.
>Vtable jump isn't really that much overhead in a language like C. >Parrot's already running pretty fast. We're making things fast by >keeping it in mind while implementing it, including a rich opcode >and vtable so the number of said slow dispatches are small, among other >various things that I don't know. Maybe Dan will chime in here :-)
Thies, have you actually benchmarked your sample with JIT against PHP?
> we've started a project (will have a web-page soon) that aims to > port the php-language (www.php.net) to run on top of parrot.
Cool!
> we've written some initial code and i'm kinda stuck while > writing the codegen (i target imc)
> my problem is that php is typeless. i am writing code to to some > basic type-recognition in my ast to be able to leverage the much > speedier I S and N types in parrot, but for obvious reasons i > have to make the php basic type an PMC. problem now is that > using pmc for everything will make execution much much slower > that it could be.
Could be? How? The only way it could be faster from the codegen level is if you have a really good data flow analyzer that can tell that something is *only* going to be doing a certain kind of operation. That is a hard, and in general impossible, problem.
Actually that was pretty good for an early version. You could help IMCC out by not creating those PerlUndefs until you're going to assign to them.
> - but the basic dilemma is obvious: $i starts as an INT and becomes a > FLOAT at runtime. so my compiler made it a PMC. (i know that i'll at > one point have to write my own PMC for the PHP base-type, but for now > PerlUndef works for me).
> because pmc as so much slower that native types i loose most of > parrots brilliant speed due to the fact that i don't know the > types at compile-time.
Yep, you do. That is the caveat of dynamic languages.
> i believe you'll face the same problem once perl6 becomes a reality on > parrot
We know about this, too. And that's one of the reasons Perl 6 is introducing an optional type system: so the programmer can tell the compiler when it's allowed to use the [ISN] types instead of PMCs, when they need to be fast (tight loops and such). Without any user guidance, everything would have to be a PMC (save a few special cases).
> so here's my question (i've only been on perl6-internals for some > week, so sorry if this has been discussed in depth before):
> would't it make sense to introduce one additional new type into > parrot that can "morph" between the normal types (I, S, N, P)?
Yeah, it would! We call those PMCs.
Most of the reason [ISN]s are so fast is because of the Just-In-Time compiler. I don't know whether there are plans to make the JIT dynamic, in that it could detect that we have a JITtable PMC on our hands and JIT the following section of code with respect to that. My guess is that there are no such plans, or that it's impossible to do well.
Your morphing type doesn't make any sense, when you look at how things are compiled into byte code. You're asking it to be compiled differently based on the run time type, which is somewhat of a temporal paradox.
> and add support for this in the executor in the most efficient way.
Which would be about as efficient as PMCs currently are.
> if you think this is unneeded could you explain how you are > planning to make untyped languages run really fast on parrot and > what direction i have to take to use that technique? (PMC will > always carry the vtable jump overhead, so they will be slower > than what perl or php use currently in the inner execution loop)
Vtable jump isn't really that much overhead in a language like C. Parrot's already running pretty fast. We're making things fast by keeping it in mind while implementing it, including a rich opcode and vtable so the number of said slow dispatches are small, among other various things that I don't know. Maybe Dan will chime in here :-)
> i'd also like to add that eg PHP has different semantics when you do binary > operations on strings that perl has so this VAL type would have to be > as flexible as a PMC. (PHP would have to use a different implementation > than Perl6 or Python)
[ most already answered, just some additional bits ]
> because pmc as so much slower that native types i loose most of parrots > brilliant speed due to the fact that i don't know the types at > compile-time.
When you get rid of all the temps and the clone[1], the loop with PMCs runs at about 3 times the speed of perl5. So I'd not use the term "slow". It would be faster with native types of course, but its hard to impossible to decide at compile time, that native types do the same thing. I don't know PHP but in perl the + operator could be overloaded and calculate the phase of the moon instead of adding 5.5 ...
> union {
Yeah, that's a PMC.
> ... could you explain how you are planning to > make untyped languages run really fast on parrot and what direction i have > to take to use that technique?
AFAIK you have to work hard on your code generator. We will try to optimize a lot, but at the AST level you have much more information and can safely do it better.
> thies
leo
[1] # for ($i = 0; $i < 10_000_000; $i = $i + 5.5) { } .sub _main .sym var i i = new PerlUndef i = 0 $P0 = new PerlUndef $P0 = 10000000 lp: if i >= $P0 goto ex i = i + 5.5 goto lp ex: end .end
> we've started a project (will have a web-page soon) that aims to port the > php-language (www.php.net) to run on top of parrot. we've written some > initial code and i'm kinda stuck while writing the codegen (i target imc) > my problem is that php is typeless.
As is perl, python, and ruby, FWIW. Luckily that's what Parrot's ultimately designed to run :)
> i am writing code to to some basic > type-recognition in my ast to be able to leverage the much speedier I S and > N types in parrot, but for obvious reasons i have to make the php basic > type an PMC. problem now is that using pmc for everything will make execution > much much slower that it could be.
Well... while that's true, in its own way, it's also misleading, though that's partially our fault. PMCs are Parrot's base type, and what most languages operating on it should use. I, N, and S registers are in for local optimizations--PMCs are the main data element.
> my compiler currently translates:
[Snippage]
There are a number of spots in there that can be improved--we can work that out with you later if you like.
> (i know it can be made better - working on it) - but the basic dilemma is > obvious: $i starts as an INT and becomes a FLOAT at runtime. so my compiler > made it a PMC. (i know that i'll at one point have to write my own PMC for > the PHP base-type, but for now PerlUndef works for me).
It's possible one of the provided types will work out OK as well--if PHP's base variable type has the same semantics as perl's scalar, it seems to make sense to use that, just to skip out on having to rewrite the same code.
> because pmc as so much slower that native types i loose most of parrots > brilliant speed due to the fact that i don't know the types at > compile-time.
Nah, you still get our brilliant speed. You're just blinded by the fact that we're even faster with the primitive types. :)
> would't it make sense to introduce one additional new type into parrot that > can "morph" between the normal types (I, S, N, P)?
That would be a PMC. :)
> if you think this is unneeded could you explain how you are planning to > make untyped languages run really fast on parrot and what direction i have > to take to use that technique?
Use PMCs.
> (PMC will always carry the vtable jump > overhead, so they will be slower than what perl or php use currently in the > inner execution loop)
That, as they say, turns out not to be the case. PMCs are faster than the SVs that perl currently uses, and I'd bet they're quite a bit faster than what PHP uses.
It's counter-intuitive, but the vtable approach has less overhead than the current integrated "check flags and figure out what to do" scheme that perl uses. (I can't speak for PHP, as I don't know the source, but I'd expect that things are the same there)
A function call does, definitely, blow the processor's prefetch pipeline. You make a call and since the destination's likely unknown to the processor's prefetch decoding system the pipeline needs to be flushed and refilled from the beginning. That blows between 3 and 11 cycles depending on the processor. Which, interestingly, is the same amount of time you blow for a mispredicted branch. That means that the vtable dispatch time and an if jump take about the same time, assuming the destination for both is in L1 cache.
At that point, then, you need to factor in the other costs to see what's ultimately more expensive. The vtable dispatch has an indirect pointer fetch, function arg setup, function preamble and function postamble. Luckily function preamble and postamble are generally very low-overhead, and are often nonexistant. (Unless stack space is allocated, but then it's a block entry cost rather than a functione entry cost, and a constant cost for both vtable and flag check schemes, as we can assume that the variables would be created in the body of the flag-check-handling code as well)
The flag check requires the flag word be fetched out of the struct, the flag bit extracted out, and tested. This is, for a single test, slightly faster than the vtable dispatch overhead--it's more than the cost of fetching the vtable function pointer, but less than the cost to fetch the pointer and set up the args for dispatch by a cycle or three, depending on your processor.
The win, then, comes in when you have to do multiple checks at the start of a function. Perl, for example, does a *lot* of checking at the start of each op function (much, but not all, of it hidden behind a few macros) and spends much more time in each op function than Parrot does, because parrot can roll almost all of those flag checks into the single vtable dispatch. The trick is to make the vtable attached to a PMC as specific to the variable type as you can, so those flag checks don't have to be done.
All this essentially ignores multimethod dispatch, which has sufficient overhead in the general case to overwhelm the issues here, but that's a separate problem. :)
> i'd also like to add that eg PHP has different semantics when you do binary > operations on strings that perl has so this VAL type would have to be > as flexible as a PMC. (PHP would have to use a different implementation > than Perl6 or Python)
Ah, OK. That just means a few changed vtable methods for PHP scalars. No biggie, and simple to do with the current system. (Though there may be some fun issues when it comes time to do some interoperation with perl and python code. OTOH, I'm facing that now for other reasons, so I expect we'll have things nicely worked out, or you can help me work 'em out :)
Dan
--------------------------------------"it's like this"------------------- Dan Sugalski even samurai d...@sidhe.org have teddy bears and even teddy bears get drunk
> At 04:10 PM 11/5/2003 -0700, Luke Palmer wrote: > >Actually that was pretty good for an early version. You could help IMCC > >out by not creating those PerlUndefs until you're going to assign to > >them.
> Also, anytime you use a temporary to assing a constant literal, you > should be able to use a I/S/N reg. Parrot's ops are polymorphic > so you should have a lot of flexibility even with the basic types.
> >Vtable jump isn't really that much overhead in a language like C. > >Parrot's already running pretty fast. We're making things fast by > >keeping it in mind while implementing it, including a rich opcode > >and vtable so the number of said slow dispatches are small, among other > >various things that I don't know. Maybe Dan will chime in here :-)
> Thies, have you actually benchmarked your sample > with JIT against PHP?
A bench with a mandelbrot set of depth 500 (script attached)::
php mandel2.php 1.93s user 0.01s system 76% cpu 2.552 total ../../parrot/parrot mandel.imc 1.08s user 0.02s system 41% cpu 2.643 total ../../parrot/parrot -j mandel.imc 0.77s user 0.01s system 57% cpu 1.368 total
Is the result.
> >It's great that you're doing PHP for Parrot.
> >Luke
> Darn right!
Btw, incase people on the list are interested there is a presentation about PHP and Parrot we did at http://www.edwardbear.org/pap.pdf
> > because pmc as so much slower that native types i loose most of > > parrots > > brilliant speed due to the fact that i don't know the types at > > compile-time.
> When you get rid of all the temps and the clone[1], the loop with PMCs > runs at about 3 times the speed of perl5. So I'd not use the term > "slow". It would be faster with native types of course, but its hard to > impossible to decide at compile time, that native types do the same > thing. I don't know PHP but in perl the + operator could be overloaded > and calculate the phase of the moon instead of adding 5.5 ...
Yes, this is a very important point. Current dynamic languages can get a speedup in several ways: *) reducing opcode dispatch overhead (the various mechanisms parrot uses, like cgoto etc. and by jitting the code) *) better VM implementation (for example the use of the vtable instead of multiple checks trick that Dan mentions) *) code specialization
The first two you get for free using parrot or another advanced VM with a JIT (well, the second item may require to properly design the language specific PMCs or classes, but that's not so difficult after so many years with the perl/python/php engines). The latter is what potentially gives the bigger benefit, but it requires lot of work in the compiler frontend (using typed variables make it easier, but to preserve a better feel something like pysco is needed). For example, in the sample code, the compiler could see that the constants are integers and doubles and that the '+' operator hasn't been overloaded, so it could do away with using PMCs (but, again, this is a lot of work).
Parrot already helps with the first two items a lot and it provides most of the needed features for the last item: specialized registers and a JIT. What I think it's missing is a way to perform function/method calls very fast (maybe also allowing inlining). This is needed when mixing different languages that install their own vtable implementations and also inside a dynamic language that uses vtables to multiplex (like it would happen with perl's IV, NV etc variables). But this is a small concern: much more work needs to go into the compiler frontends for php/python/perl etc to take advantage of it.
> # for ($i = 0; $i < 10_000_000; $i = $i + 5.5) { } > .sub _main > .sym var i > i = new PerlUndef > i = 0 > $P0 = new PerlUndef > $P0 = 10000000 > lp: > if i >= $P0 goto ex > i = i + 5.5 > goto lp > ex: > end > .end
I couldn't resist writing an equivalent program that runs on the Mono VM:-) The Perl* classes implement just a few methods to support the sample code, but adding more is mostly an issue of cut&paste. I believe the code accurately simulates what would happen with perl or php: a scalar starts out as an integers and becomes a float mid-way. There are three different loops that simulate how smart a compiler frontend could be: one puts just $i in a 'PMC' (PerlSV in the C# code), the second puts the loop upper bound, too, to match with the above imcc code and the last puts the 5.5 constant in a PerlSV, too.
The run times on my machine are:
mono smart-compiler: 250 ms mono imcc-equiv: 340 ms mono little-dumber-compiler: 460 ms parrot -j -O 4: 580 ms parrot -j: 600 ms parrot: 635 ms perl: 1130 ms php4: 2150 ms
The parrot build was configured with --optimize. Parrot has a very good startup time (~10 ms), while the startup time for mono (running the same program with the loop limit set to 10) is about 100 ms. We obviously need to improve this, but this also means that the actual time to execute the loops above is about 100 ms lower for mono and 10 ms lower for parrot (times were measured with time (1)). My *guess* is that mono executes the same code faster because of better register allocation and faster function calls, I expect parrot to improve on both counts as it matures, at least as long as the vtable functions are implemented in C: anyone with performance numbers for vtable functions implemented in parrot asm? Hope this helps in the discussion (I included the C# code below so that people can check if I missed something of the semantics).
lupus
-- ----------------------------------------------------------------- lu...@debian.org debian/rules lu...@ximian.com Monkeys do it better
using System;
abstract class PerlBase { public abstract PerlBase Add (double v); public abstract PerlBase Add (int v); public abstract PerlBase Add (PerlBase v); public abstract bool LessThan (int v); public abstract bool LessThan (PerlBase v); public abstract PerlIV IsIV (); public abstract PerlNV IsNV ();
}
sealed class PerlIV: PerlBase { internal int value;
public PerlIV (int v) { value = v; }
public override PerlBase Add (double v) { PerlNV res = new PerlNV (v + (double)value); return res; } public override PerlBase Add (int v) { value += v; return this; } public override PerlBase Add (PerlBase v) { PerlIV iv = v.IsIV (); if (iv != null) return Add (iv.value); PerlNV nv = v.IsNV (); if (nv != null) return Add (nv.value); // warn: numeric add with non-numeric val return this; } public override bool LessThan (int v) { return value < v; } public override bool LessThan (PerlBase v) { PerlIV iv = v.IsIV (); if (iv != null) return value < iv.value; PerlNV nv = v.IsNV (); if (nv != null) return value < nv.value; // warn: numeric compare with non-numeric val return false; } public override PerlIV IsIV () { return this; } public override PerlNV IsNV () { return null; }
}
sealed class PerlNV: PerlBase { internal double value;
public PerlNV (double v) { value = v; }
public override PerlBase Add (int v) { value += v; // double + int gives a double return this; } public override PerlBase Add (double v) { value += v; return this; } public override PerlBase Add (PerlBase v) { PerlNV nv = v.IsNV (); if (nv != null) return Add (nv.value); PerlIV iv = v.IsIV (); if (iv != null) return Add (iv.value); // warn: numeric add with non-numeric val return this; } public override bool LessThan (int v) { return value < v; } public override bool LessThan (PerlBase v) { PerlNV nv = v.IsNV (); if (nv != null) return value < nv.value; PerlIV iv = v.IsIV (); if (iv != null) return value < iv.value; // warn: numeric compare with non-numeric val return false; } public override PerlIV IsIV () { return null; } public override PerlNV IsNV () { return this; }
}
class PerlSV: PerlBase { internal PerlBase value;
public PerlSV (int v) { value = new PerlIV (v); }
public PerlSV (double v) { value = new PerlNV (v); }
public override PerlBase Add (double v) { value = value.Add (v); return this; }
public override PerlBase Add (int v) { value = value.Add (v); return this; } public override PerlBase Add (PerlBase v) { value = value.Add (v); return this; } public override bool LessThan (int v) { return value.LessThan (v); } public override bool LessThan (PerlBase v) { return value.LessThan (v); } public override PerlIV IsIV () { return value.IsIV (); } public override PerlNV IsNV () { return value.IsNV (); }
}
class Test { static void Main () { /* * C# code representing: * for ($i = 0; $i < 10_000_000; $i = $i + 5.5) { } * Note: $i starts as an int and becomes a double midway */ PerlBase sv = new PerlSV (0); PerlBase limit = new PerlSV (10000000); // alternative small value to check startup cost // PerlBase limit = new PerlSV (10); // the upper bound is a constant in the source, so the compiler // could easily produce this code: // for (; sv.LessThan (10000000); sv = sv.Add (5.5)) { } // // but we're going with the slower version below, since // the imcc code uses the same 'pattern' and we don't want // to cheat:-) for (; sv.LessThan (limit); sv = sv.Add (5.5)) { } // // even slower version: all operands are objects (but we're faster // than parrot anyway) // PerlBase increment = new PerlSV (5.5); // for (; sv.LessThan (limit); sv = sv.Add (increment)) { } }
> I couldn't resist writing an equivalent program that runs on the Mono > VM:-)
Very interesting. While we don't compete with typed languages, its nice to see, that parrot execution speed is in the region of mono - for such small tight loops.
> My *guess* is that mono executes the same code faster because of better > register allocation and faster function calls,
Register allocation AFAIK. PMCs are not in registers yet (only I- and N-Regs). I dunno on which architecture you ran the test, but FYI JIT/i386 calls vtable functions directly for simple opcodes like add. If we have finally PMCs in registers too, we could safe more cycles...
> ... I expect parrot to > improve on both counts as it matures, at least as long as the vtable > functions are implemented in C: anyone with performance numbers for > vtable functions implemented in parrot asm?
We don't have methods yet.
> Hope this helps in the discussion (I included the C# code below so that > people can check if I missed something of the semantics).
A little thing is missing in your code: if num+num gives an equivalent int, it should get morphed back to a PerlIV.
> Very interesting. While we don't compete with typed languages, its nice
Oh, but mono _will_ also compete with dynamically typed languages! :-)
> to see, that parrot execution speed is in the region of mono - for such > small tight loops.
Well, that was mono executing the same code a dynamic language like php, python or perl would generate for running on the CLR. I only implemented it in C# because I don't have the time to write a perl/php/python compiler:-) If the loop was coded properly in C# it would have been much faster (as it would have been much faster on parrot as well if it used only integer and float registers).
> Register allocation AFAIK. PMCs are not in registers yet (only I- and > N-Regs). I dunno on which architecture you ran the test, but FYI
Sorry, forgot to mention I have a 1.1 Pentium III (we currently have the jit running only on x86, itanium and ppc).
> JIT/i386 calls vtable functions directly for simple opcodes like add. If > we have finally PMCs in registers too, we could safe more cycles...
Yes, I noted that the unoptimized parrot is only slightly slower than the optimized build: this means that most of the code is jitted, good.
> A little thing is missing in your code: if num+num gives an equivalent > int, it should get morphed back to a PerlIV.
Oh, is that a requirement of the Parrot spec or PHP? AFAIK perl doesn't behave that way (though in some configuration it tries to do operations with integer arithmetric, the type is kept as a NV). The quick code I put together isn't designed for such frequent type flip-flops, so I fixed it with a suboptimal solution, but it's still very fast: 400 ms (it was 340 before, for reference parrot takes 580 ms). For those interested, the Add method in PerlNV became:
public override PerlBase Add (double v) { value += v; int c = (int)value; if (value == c) { if (ivcache == null) { ivcache = new PerlIV (c); ivcache.nvcache = this; } else { ivcache.value = c; } return ivcache; } return this; } A better solution would have been to store the values directly in PerlSV and use the value field in it just as a fast dispatch for the different vtables in PerlIV, PerlNV etc.
Thanks!
lupus
-- ----------------------------------------------------------------- lu...@debian.org debian/rules lu...@ximian.com Monkeys do it better