This week's Perl 6 Summary

Piers Cawley

unread,

Feb 25, 2003, 4:46:25 AM2/25/03

to perl6-a...@perl.org, perl6-...@perl.org, perl6-i...@perl.org

The Perl 6 summary for the week ending 20030223
Another week, another Perl 6 Summary, in which you'll find gratuitous
mentions of Leon Brocard, awed descriptions of what Leopold Tötsch got
up to and maybe even a summary of what's been happening in Perl 6 design
and development.

Kicking off with perl6-internals as usual.

Strings and header reuse
Dan responded to prompting from Leo Tötsch about the use and reuse of
string headers. The problem is that most of the string functions that
produce modified strings do it in new string headers; there's no way of
reusing existing string headers. This can end up generating loads of
garbage. Dan's going through the various string handling ops and PMC
interfaces working out what needs to do what, and documenting them, as
well as adding in versions of the ops that take their destination string
headers as an argument. Dan hopes that 'we can make the changes quickly
and get this out of the way once and for all', leading Robert Spier to
mutter something about 'famous last words'.

<http://makeashorterlink.com/?Z13621793>

PXS help
Tupshin Harper has been trying to use an XML parser from within Parrot
and started off by looking at the PXS example (in examples/pxs) but had
problems following the instructions given there as his compiler spat out
errors by the bucket load. Leo Tötsch thought that PXS was probably
deprecated and the native call interface (NCI) was the thing to use.
Being Leo, he provided a port of the PXS Qt example to NCI. Although PXS
appears to be out of sync with the parrot core, nobody was entirely sure
whether it should be removed.

<http://makeashorterlink.com/?V14632793>

Bit rot in parrot/language/*
Tupshin Harper had some problems with some of the language examples not
working well with the most recent versions of Parrot. Leo Tötsch
addressed most of the issues he raised, but there are definitely issues
with the interpreter and the languages getting out of sync.

<http://makeashorterlink.com/?O55624793>

<http://makeashorterlink.com/?S26652793>

Macros in IMCC (part 2)
Jürgen Bömmels extended the macro support in IMCC, implementing
".constant" and adding some docs. The patch was promptly applied.

<http://makeashorterlink.com/?G17656793>

[RFD] IMCC calling conventions
Leo Tötsch posted an RFD covering his understanding of the various
calling conventions that IMCC would have to deal with which sparked some
discussion. I'm now confused as to whether function calls in Parrot will
be caller saves, callee saves, or some unholy mixture of the two.

<http://makeashorterlink.com/?F68641793>

Parrot performance vs the good, the bad and the ugly
Tupshin Harper decided to port primes.pasm to C and Perl 5 to compare
results. Parrot came out very quick indeed. (Close to C). For bonus
points he then took a python primes algorithm that had been ported to C
and ported that to Parrot as well. In full on, all stops pulled out,
knobs turned to 11 mode, Parrot came in at about 50% slower than C and
around 14 times faster than Python. There was some muttering about the
demo being rigged. However, Jim Meyer redid the Perl and Python
implementations to use a loop that duplicated the algorithm used in
primes.pasm and, whilst it improved their performance somewhat, Parrot
was still substantially faster.

This probably won't be the test that's run when Dan and Guido face the
possibility of custard pies at 10 paces or whatever the performance
challenge stake stands at now.

<http://makeashorterlink.com/?I29621793>

Mmm... spaceships...
Leon Brocard patched examples/assembly/life.pasm to use a small
spaceship as its starting pattern. Apparently because it 'can provide
hours of enjoyment at conferences if projected onto a big screen while
answering parrot questions.' Nobody thought to ask him how a spaceship
was going to answer Parrot questions, but Leo Tötsch applied the patch.

<http://makeashorterlink.com/?X2A613793>

Using IMCC as JIT optimizer
Apparently, Leo Tötsch finds it unbearable that 'optimized compiled C is
still faster than parrot -j' so he's been experimenting with adding
smarts to IMCC, making it add hardware register allocation hints to its
emitted bytecode. Sean O'Rourke liked the basic idea, but reckoned that
the information generated by IMCC should really be platform-independent,
suggesting that it'd be okay to pass a control flow graph to the JIT,
but that hardware register allocation for a specific number of registers
would iffy. He suggested that another option would be for IMCC to 'just
rank the Parrot registers in order of decreasing spill cost', then the
JIT could just move the most important parrot registers into
architectural registers.

Dan thought the idea was interesting too, but worried that the JIT might
spend more time optimizing code than it could possibly gain from the
optimization. The discussion got more than a little technical after
this. I'm afraid I'm a bear of little brain when it comes to this sort
of magic, so you'll have to read the whole thread if you're interested
in the details.

The response was pretty positive throughout the discussion, so Leo went
ahead and implemented it. The new, improved version shaved slightly more
than a tenth of a second from the primes.pasm runtime (not a great
percentage win, but the total runtime includes the compilation time)

<http://makeashorterlink.com/?U2B622793>

<http://makeashorterlink.com/?P2C651793>

"invoke"
Steve Fink is bothered by the "invoke" op because it operates implicitly
on P0. He wants to replace it with a new version that takes a PMC
argument. Leo Tötsch is less bothered by the implicit behaviour of the
op, but would like to see an additional "invoke_p" op, which would take
a single PMC argument. So Steve implemented it, but hit a couple of
snags. I'm not quite sure whether these have been overcome yet.

<http://makeashorterlink.com/?U1D642793>

Problems with "Configure.pl --cgoto=0"
Nicholas Clark was unable to build Parrot with computed gotos turned
off. Simon Glover offered a simple patch that fixed Nick's problem. Leo
Tötsch talked about the different reasons for not building a computed
goto core, which depends on both the compiler's capabilities and the
user's choices. This led to a discussion of which cores were now
obsolete. Leo believes that the simple Computed Goto and Prederef cores
have been obsoleted by the CGP core (Computed Goto Prederefed). Nick
thought we should continue to ship code for all the cores, and ensure
that the config system is flexible enough to let anyone build any
arbitrary combination of cores, which convinced Leo. Dan suggested that,
once we'd done this we should revamp the Tinderbox system to run tests
on all the core types.

<http://makeashorterlink.com/?R1E621793>

Non-inline text in parrot assembly
Tupshin Harper wondered if there were any plans for Parrot to support a
distinct ".string" asm section. Leo Tötsch pointed to ".constant" (in
PASM) and ".const" (in IMCC) as ways of keeping strings visually
together. Tupshin realised he'd missed a chunk of documentation
("perldoc assemble.pl" for anyone else who hasn't read it) which he
thinks should probably be moved into docs/parrot_assembly.pod or
somewhere similar.

<http://makeashorterlink.com/?P2F621793>

Access to partial registers?
Tupshin Harper wondered if it was possible and/or meaningful to read and
write from a part of a register (eg. a single word) in PASM. Answer: Not
at the moment, what do you want? We can always add an intreg.ops set.

<http://makeashorterlink.com/?W20742793>

Meanwhile, over in perl6-language
Things were, once more quiet (all of 16 messages). I think we're all
awaiting the coming of the Prophet Zarquon (or possibly the next
Apocalypse, whichever comes soonest.)

Arrays, lists, referencing
David Storrs suggested some possible semantics for "(4, 1, 2) + 7",
noting that he doesn't like the Perl5's current behaviour (evaluates to
9). Michael Lazzaro thinks it should evaluate to 10 (length of the list
+ 7) or possibly throw a syntax error. This then morphed into a
discussion of pass by value, pass by reference and pass by constant
reference, which Allison Randal told us would be addressed in the
upcoming Apocalypse 6.

<http://makeashorterlink.com/?Y41731793>

Err...
That's it
Acknowledgements, Announcements and Apologies
Mmm... the comfy chair... it's the only place to write a summary in.
Especially when you're plied with lashings of Earl Grey tea and
entertained by the antics of a couple of adolescent cats. Things could
be lots worse.

I'd like to apologize to Danny O'Brien for last weeks mention of 'a
Brainf*ck compiler, in Brainf*ck, for the Brainf*ck interpreter supplied
with Parrot, a virtual machine named after a joke, written for a
language that doesn't yet exist.' Apparently this sprained Danny's head.
And I extend similar apologies to anyone else whose head was sprained by
that concept.

The American Odyssey web page is still more of a beautiful idea than an
actual link you can click. Hopefully it'll spring into existence before
we set off, but I wouldn't bet on it.

Aspell is, once more, the weapon of proofreading choice.

If you appreciated this summary, please consider one or more of the
following options:

* Send money to the Perl Foundation at
<http://donate.perl-foundation.org/> and help support the ongoing
development of Perl.

* Get involved in the Perl 6 process. The mailing lists are open to
all. <http://dev.perl.org/perl6/> and <http://www.parrotcode.org/>
are good starting points with links to the appropriate mailing
lists.

* Send feedback, flames, money, job offers or his and hers Mini
Coopers to p6summ...@bofh.org.uk

This week's summary was again sponsored by Darren Duncan. Thanks Darren.
If you'd like to become a summary sponsor, drop me a line at
p6summ...@bofh.org.uk.

--
Piers

Sean O'Rourke

unread,

Feb 26, 2003, 12:31:39 PM2/26/03

to perl6-i...@perl.org

First off, thanks to our relentless..., er, tireless summarizer for
continuing to digest and clarify our wandering discussion.

On Tue, 25 Feb 2003, Piers Cawley wrote:
> Using IMCC as JIT optimizer
> Apparently, Leo Tötsch finds it unbearable that 'optimized compiled C is
> still faster than parrot -j' so he's been experimenting with adding
> smarts to IMCC, making it add hardware register allocation hints to its
> emitted bytecode. Sean O'Rourke liked the basic idea, but reckoned that
> the information generated by IMCC should really be platform-independent,
> suggesting that it'd be okay to pass a control flow graph to the JIT,

This isn't really my idea, but is instead an area of active research. A
good jumping-off point is http://citeseer.nj.nec.com/krintz01using.html.

> Dan thought the idea was interesting too, but worried that the JIT might
> spend more time optimizing code than it could possibly gain from the
> optimization.

Dan -- you might be interested in
http://www.usenix.org/events/javavm02/chen_m.html (if you have a USENIX
subsription or a nearby university library). They stuff a full data-flow
compiler into a JVM and, by carefully minimizing the number of passes,
make it end up faster than a lightweight JIT on a number of programs.
Granted, (IIRC) the real wins are on longer-running programs, so the
result isn't as relevant to Parrot, but it _does_ show that there's room
to put a fair amount of optimization into a JIT.

/s

Jason Gloudon

unread,

Feb 26, 2003, 12:41:21 PM2/26/03

to Sean O'Rourke, perl6-i...@perl.org

On Wed, Feb 26, 2003 at 09:31:39AM -0800, Sean O'Rourke wrote:

> Dan -- you might be interested in
> http://www.usenix.org/events/javavm02/chen_m.html (if you have a USENIX

Research wants to be free:

http://www-hydra.stanford.edu/publications/JVM02.pdf

--
Jason

Dan Sugalski

unread,

Feb 26, 2003, 1:11:48 PM2/26/03

to Jason Gloudon, Sean O'Rourke, perl6-i...@perl.org

And wants to be mine. Snagged, thanks.

I do realize that a good optimizing JIT is a win in some cases, and
I'd love to have one. The problem is engineering time--I'm not
willing to presume on Leo, Daniel, and everyone else who's done JIT
work to get two JITs, one for quick programs and one for longer ones.
And with that limitation, I'd rather have a lower-overhead JIT with a
win for the shorter programs than a high-overhead one with a win for
long-running programs.
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Leopold Toetsch

unread,

Feb 28, 2003, 2:54:06 AM2/28/03

to Dan Sugalski, Sean O'Rourke, perl6-i...@perl.org

Dan Sugalski wrote:

> ... And with that

> limitation, I'd rather have a lower-overhead JIT with a win for the
> shorter programs than a high-overhead one with a win for long-running
> programs.

I see that limitation. But currently we have a high overhead JIT. The
problem is not so much program run time, but load time.

One example: t/op/stacks_33.pasm (8242 lines) because of macros
expands to 38955 lines, giving 4102 basic blocks and 6150 edges
connecting them.

compile/run options and timings (first 4 include running)
plain 1.07
-P 1.09
-j 2.4
-Oj 2.3
-ox.pbc / -j 1.07 + 1.3
-ox.pbc -Oj /-j 2.1 + 0.2

So writing out minimal CFG (blocks & Branch targets) + register
usage gives 6 times the startup speed for this -Oj compiled PBC file.
Program run time is ~0.

BTW, running the -Oj compiled PBC with a normal core does succeed
(including correct output), albeit there are a lot of out of bound
register accesses (which go to high integer regs)

PC=12; OP=82 (set_n_ic); ARGS=(N-2=0, 0)
PC=15; OP=82 (set_n_ic); ARGS=(N-4=0, 1024)
PC=18; OP=79 (set_n_n); ARGS=(N3=0, N-4=1024)
PC=21; OP=79 (set_n_n); ARGS=(N2=0, N-3=0)
PC=24; OP=79 (set_n_n); ARGS=(N1=0, N-2=0)
PC=27; OP=79 (set_n_n); ARGS=(N0=0, N-1=0)
PC=30; OP=678 (pushn)

I think, that the -b option should have a check for this.

(timings from a PIII/600, imcc -O3 compiled)

leo

Dan Sugalski

unread,

Feb 28, 2003, 12:18:05 PM2/28/03

to Leopold Toetsch, Sean O'Rourke, perl6-i...@perl.org

At 8:54 AM +0100 2/28/03, Leopold Toetsch wrote:
>Dan Sugalski wrote:
>
>>... And with that limitation, I'd rather have a lower-overhead JIT
>>with a win for the shorter programs than a high-overhead one with a
>>win for long-running programs.
>
>I see that limitation. But currently we have a high overhead JIT.
>The problem is not so much program run time, but load time.

Damn. Okay, what sort of metadata would be appropriate to aid in
this? If it means having the assembler, IMCC, or some external
utility write a chunk that identifies the basic blocks and edges,
then I'm all for it.

Leopold Toetsch

unread,

Feb 28, 2003, 3:28:55 PM2/28/03

to Dan Sugalski, Sean O'Rourke, perl6-i...@perl.org

Dan Sugalski wrote:

> At 8:54 AM +0100 2/28/03, Leopold Toetsch wrote:

>> I see that limitation. But currently we have a high overhead JIT. The
>> problem is not so much program run time, but load time.

> Damn. Okay, what sort of metadata would be appropriate to aid in this?
> If it means having the assembler, IMCC, or some external utility write a
> chunk that identifies the basic blocks and edges, then I'm all for it.

gprof does indicate that the branch target calculation is the main culprit.
My -Oj hack writes out currently 6 opcode_t per BB:
- bb->begin (with highbit set for a branch target)
- bb->end
- 4 * registers_used

(end is somehow redundant, the latter 4 could be shifted into one op).

The plain jit optimizer would need:
- bb->begin (implying bb->end)
- (bb->end)
- bb->end->branch_target (where the ->end branches to)
- flags (branch source or target per block boundary), could also be
coded into offsets

From this info, jit optimizer could build its internal sections (parts
of blocks that are JITed or not). A BB is at least one section, but
could be split into more. The register usage scan and allocation is all
the same (two linear scans over all ops) and another run for actual code
generation.
Here could also be some need for improvement, e.g. register usage could
as well be passed by imcc (top N first usage per block - albeit this is
different to current usage calculation per section). Sean already did
propose this variant. This could save one scan through all ops.

Timing estimations WRT *big* programs are all rather vague, we just
don't have them yet. We badly need a at least medium implemented HL
*with* some RL test cases for this. The Java spec suite implemented in a
supported HL would be nice to compare :) Ook.

leo