Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Initial feedback on PAST-pm, or Partridge

0 views
Skip to first unread message

Allison Randal

unread,
Nov 26, 2006, 11:30:32 PM11/26/06
to Perl 6 Internals
Overall, the POST implementation is usable and I really like the new HLL
compiler module. I've got Punie working with the new toolchain to the
point that it's generating valid PIR code for many low-level constructs,
but some of the high-level constructs that worked under the previous
toolchain I still don't have working. I've done everything I can do with
straightforward translations of the existing code, and am now to the
point where I'll have to do major conceptual refactors to fit with the
new toolchain.

I've already accumulated a good quantity of feedback for Patrick, so I
figured I'd go ahead and send it out now. (Especially since some of my
comments may result in changes that will make it much easier to finish
porting Punie.)

I had to poke into the guts of HLLCompiler, the new PAST, and the new
POST a fair bit in the process of getting Punie to work with them, so my
comments here are a mixture of user experience and implementation
details. I've grouped my comments into general categories.

------

Available node types:

- There's no PAST::Stmt node type? I only see PAST::Stmts and PAST::Op.
But statements are composed of multiple ops. So, everything is an op? I
was using PAST::Stmt and PAST::Exp for a similar purpose to what
POST::Ops performs. I've hacked it to use PAST::Stmts for this purpose,
but it doesn't quite work.

- There's no PAST::Label node type? How do you represent labels in the
HLL source?

- Is there no way to indicate what type of variable a PAST::Var is?
Scalar/Array/Hash? (high-level types, not low-level types)

---

Meaningful naming: (Be kind to your compiler writers.)

- In the PAST nodes, I grok 'name' as the operator/function name of a
PAST::Op and as the HLL variable name of a PAST::Var, but making it the
value of a PAST::Val is going to far. It was 'value' in the old PAST,
which makes more sense. You're passing named parameters into 'init', so
I can't see a reason not to use a more meaningful name for the attribute.

- In PAST nodes, the attribute 'ctype' isn't actually storing a C
language type. Better name?

- The attribute 'vtype' is both variable type in POST::Var and value
type in POST::Val. Handy generalization, but it's not clear from the
name that 'vtype' is either of those things.

- The values for both 'ctype' and 'vtype' are obscure. Better to
establish a general system for representing types, than to include raw
Parrot types or 1-letter codes in the AST.

- In PAST nodes, consider the audience when choosing attribute names
like 'ismy' (PAST::Var). Something like 'islexical' or 'isdeclaration'
(I'm not sure which you mean), is friendlier to non-Perl users, and
actually clearer even for Perl users.

- In PAST nodes again, I'm not clear on what 'pirop' (PAST::Op)
represents. Is it the literal name of a PIR opcode, or a generic
representation of standard low-level operations? I'm more in favor of
the latter. Better still, give compiler-writers a standard format lookup
table they can write to allow the PAST to POST tranformation to select
the right PIR operation from the HLL op name. (See the comments on
boundaries of abstraction.)

- In PAST nodes, the 'clone' method is now 'init'.'clone' was a terrible
name, I agree, but 'init' isn't quite right either.

- In PAST nodes, the 'add_child' method is now 'push'. I liked
'add_child' better, but, maybe what we really want is not a method at
all, but a :vtable entry for an array push? Seems likely, since there's
really not any other array-like behavior the syntax-tree nodes need to have.

- On module naming, I quickly regretted the naming of past2post.tg and
past2post_gen.pir (and all the related names) and changed them to
POSTGrammar.tg, POSTGrammar.pir, etc. in Punie. The .tg files are
modules, they're just modules written in a different language, so we
should standardize on module-style naming. Consider names like
POST/Grammar.tg and POST/Grammar.pir, or Partridge/Compiler/AST.tg and
Partridge/Compiler/AST.pir (looking at it from the perspective of the
compilation source rather than the compilation result).

---

Clear boundaries between components: (Fuzzy boundaries of abstraction
make it difficult to allow for other implementations of the AST/OST or
customization of the compiler object.)

- The 'compile' method doesn't belong in the PAST object, it belongs in
HLLCompiler.

- The 'compile' method also doesn't belong in the main compiler
executable, it belongs in HLLCompiler.

- Merge them into one 'compile' method in HLLCompiler.

- Provide an 'init' method for HLLCompiler that lets the compiler writer
set which modules HLLCompiler will use for each stage of compilation.
This will cover the majority of compilers without requiring each
compiler writer to define their own 'compile' routine.

- Customization of HLLCompiler should be handled by creating a subclass
of HLLCompiler. (The current 'register' strategy is somewhat fragile.)

- It would be easier to maintain (and create) the list of HLL to PIR
operator associations in something like a YAML file than embedded in the
parser grammar file. The associations aren't needed until the
PAST-to-POST transformation stage, leaving us with an AST free of
Parrot-isms so, a) introspection into the AST keeps the user purely on
the HLL level, rather than immediately plunging them into Parrot
internals, and b) the AST could be passed along to some other compiler
backend (like Pugs).

At the very least, the 'pirop' property on parser rules could be handled
by the PAST-to-POST transformation, so the compiler writer doesn't have
to manually pull those values out of the parser grammar's optable when
creating the AST. (If the parser grammar module was specified in
HLLCompiler's 'init', then the compiler object would know where to look
for the optable.)

---

Refactor into smaller units: (Easier to test, easier to maintain, easier
to debug, and easier to create subclasses.)

- In HLLCompiler, instead of a monolithic 'command_line' method, split
the components into independent methods all called from the
'command_line' method. Good candidates are the file-reading and
source-preparation code, the interactive interpretation code, and the
code to compile a source file. (Comment: avoid 'bsr' and 'ret' anywhere
and everywhere you can avoid them.)

- In HLLCompiler, split the 'compile' method out into independent
methods for each compilation stage ('compile_ast', 'compile_ost',
'compile_pir', etc.), all called from 'compile'.

---

Useful error detection: (Be kind to your compiler writers.)

- Returning the source code string from the 'compile' method in
HLLCompiler when no compiler is registered isn't helpful. Throw an
exception or give an error. Give the compiler writer a clue to what's wrong.

- Provide distinct errors/exceptions for failures at each stage of
compilation to make it easy to figure out which stage is failing.

---

Side comments:

- In PGE grammars, what is the "{ ... }" at the end of every proto
declaration supposed to do? It seems like a dead weight. Is that where
the Perl 6 code defining the operator supposed to go? Can we allow
'proto' declarations to end in a semi-colon when there is no Perl 6 code?

------

Allison

Patrick R. Michaud

unread,
Nov 27, 2006, 2:21:44 AM11/27/06
to Allison Randal, Perl 6 Internals
On Sun, Nov 26, 2006 at 08:30:32PM -0800, Allison Randal wrote:
> I had to poke into the guts of HLLCompiler, the new PAST, and the new
> POST a fair bit in the process of getting Punie to work with them, so my
> comments here are a mixture of user experience and implementation
> details. I've grouped my comments into general categories.

Excellent. Just as a general overall response -- PAST-pm is by
no means "finished", so many of the items that seem to be missing
are simply cases of "I haven't gotten to them yet so they aren't
implemented yet."

> Available node types:
>
> - There's no PAST::Stmt node type? I only see PAST::Stmts and PAST::Op.
> But statements are composed of multiple ops. So, everything is an op?

At present there's no PAST::Stmt node type, but one can be easily
added. I thought about putting one in based on Punie's use of
PAST::Stmt, but I hadn't quite figured out exactly _why_ it's
important so I thought I'd leave it out until I actually needed it
somewhere. In many ways ops are already composed of multiple ops,
so a statement can be considered just another op. (But I do see
why someone would want a PAST::Stmt abstraction -- on the other
hand, I didn't see how it changed the resulting POST/PIR output.)

> - There's no PAST::Label node type? How do you represent labels in the
> HLL source?

I just haven't gotten to this part yet.

> - Is there no way to indicate what type of variable a PAST::Var is?
> Scalar/Array/Hash? (high-level types, not low-level types)

Sure, that's what 'vtype' is -- it indicates the type of value
that the variable ought to hold.

My plan has been to follow the Perl6 concept of "implementation types"
and "value types" within PAST. Thus far I've only put in the support
for the value types, as the "vtype" attribute (and vtype can be any
high-level type the language happens to support). I'm expecting
to add an "itype" attribute at some point when we're a bit farther
along; I'm still working out the details.

> ---
> Meaningful naming: (Be kind to your compiler writers.)

I totally agree, and I'm not yet wedded to any particular naming
scheme.

> - In the PAST nodes, I grok 'name' as the operator/function name of a
> PAST::Op and as the HLL variable name of a PAST::Var, but making it the
> value of a PAST::Val is going to far. It was 'value' in the old PAST,
> which makes more sense. You're passing named parameters into 'init', so
> I can't see a reason not to use a more meaningful name for the attribute.

I don't have a problem with switching it to 'value', I went with
'name' primarily because every PAST::Node has a name and so it just
made sense to use it there. But let me make another weak argument
in favor of 'name'. If a HLL programmer writes

$a = 1.23456789E6;

then the rhs becomes a PAST::Val node. How should we represent the
value? The parse-to-past translation could evaluate the contents of
"1.23456789E1" and store the result in 'value' as (.Float) 12.3456789,
but unfortunately when convert that .Float back into a string for
use as PIR code it comes out as "12.3457" -- i.e., the code looks
like:

$P0 = new .Float
$P0 = 12.3457
set_global '$a', $P0

I decided that in a number of cases like this, what we really want
to retain in PAST::Val is a precise string representation of the
value that goes in the resulting output, and not a native
representation that may lose precision in translation through
POST/PIR. So, what we're really storing is the value's "name"
and not its "value". (I did say it was a weak argument.)

Anyway, we can switch to 'value' if that's ultimately better;
I was just thinking that 'name' might be equally appropriate.

> - In PAST nodes, the attribute 'ctype' isn't actually storing a C
> language type. Better name?

It really stands for "constant type", and is one of 'i', 'n', or
's' depending on whether it can be treated as an int, num, or
string when being handled as a constant in PIR.

> - The attribute 'vtype' is both variable type in POST::Var and value
> type in POST::Val. Handy generalization, but it's not clear from the
> name that 'vtype' is either of those things.

I think you meant PAST::Var/PAST::Val here, as there isn't a POST::Var
or POST::Val. But 'vtype' really stands for "value type" in both
cases -- it's the type of value returned by either a PAST::Var
or PAST::Val node.

> - The values for both 'ctype' and 'vtype' are obscure. Better to
> establish a general system for representing types, than to include raw
> Parrot types or 1-letter codes in the AST.

Ultimately I expect that the types that appear in 'vtype' will
be the types defined by the HLL itself. For example, in perl6
one would see 'vtype'=>'Str' to indicate a Perl 6 string constant.
Unfortunately it's been difficult to illustrate this in real code
because of the HLL classname conflicts that I've been reporting
in other contexts.

I agree the values and name for 'ctype' are a bit obscure, and
will gladly accept any suggestions for improving it. The 'ctype'
attribute is really just code optimization in the final output,
and it does assume some knowledge of the target. If no ctype
is specified, past-pm assumes that the constant value must
first be placed into a PMC in order to be useful. With
a ctype present, then past-pm can match up the (PIR) opcode
contexts in which the value can be directly used as an
int/num/string in an operation. It's the difference between

# $b + 2 # $b + 2
get_global $P0, '$b' get_global $P0, '$b'
new $P2, .Undef new $P1, .Integer
add $P2, $P0, 2 assign $P1, 2
new $P2, .Undef
add $P2, $P0, $P1

or

# say 3, 4, 5 # say 3, 4, 5
"say"(3, 4, 5) new $P1, .Integer
assign $P1, 3
new $P2, .Integer
assign $P2, 4
new $P3, .Integer
assign $P3, 5
"say"($P1, $P2, $P3)


> - In PAST nodes, consider the audience when choosing attribute names
> like 'ismy' (PAST::Var). Something like 'islexical' or 'isdeclaration'
> (I'm not sure which you mean), is friendlier to non-Perl users, and
> actually clearer even for Perl users.

"Ismy" means "isdeclaration" here, and I can go ahead and change it.

> - In PAST nodes again, I'm not clear on what 'pirop' (PAST::Op)
> represents. Is it the literal name of a PIR opcode, or a generic
> representation of standard low-level operations? I'm more in favor of
> the latter. Better still, give compiler-writers a standard format lookup
> table they can write to allow the PAST to POST tranformation to select
> the right PIR operation from the HLL op name. (See the comments on
> boundaries of abstraction.)

I think past-pm already has exactly what you want here, but it
may not be entirely clear. First, 'pirop' does exactly what you
request in 'Better still, ...' -- it provides a way for the compiler
writer to identify the right PIR operation from the HLL op name.
In particular, in the operator-precedence specification a
compiler writer writes:

proto infix:+ is pirop('add') { ... }
proto infix:- is pirop('sub') { ... }

and this provides an easy way for the parse-to-past transformation
to associate the correct PIR operation from the HLL op name.
Essentially, the transformation looks for a 'pirop' trait on
the operator, and if found it puts it in the 'pirop' attribute
of the corresponding PAST::Op node.

The values of 'pirop' are really generic representations of
standard low-level operators. Unfortunately, PIR is not as
regular as we might like it to be -- some PIR operations will
work only with pmc operands, some will work with a variety of
int/num/string/pmc operands, and still others won't work with
pmcs at all. So, POST.pir has a lookup table (%pirtable)
that takes the generic name given by 'pirop' and does any
necessary coercions to get the operands to match. So far
this table is incomplete -- I've been adding entries only
as I need them.

The idea is that a compiler writer can use 'pirop' to
specify the mapping of HLL operators into PIR opcodes
directly in the grammar files where the HLL operators are
being defined. Furthermore, the compiler writer doesn't have
to keep track of the low-level details for each PIR opcode;
i.e., when specifying 'pirop'=>'concat' the past-pm code
generation knows that concat needs string oprands. (However,
if a compiler writer needs a specific PIR opcode, then they
could specify it with something like 'pirop'=>'concat_p_sc'.)

> - In PAST nodes, the 'clone' method is now 'init'.'clone' was a terrible
> name, I agree, but 'init' isn't quite right either.

Currently Parrot uses '__init' as the method for initializing
new objects, thus I think 'init' is at least consistent with Parrot.

> - In PAST nodes, the 'add_child' method is now 'push'. I liked
> 'add_child' better, but, maybe what we really want is not a method at
> all, but a :vtable entry for an array push? Seems likely, since there's
> really not any other array-like behavior the syntax-tree nodes need to have.

I went with 'push' because I know I'm also going to want an 'unshift'
operation, and those seemed more descriptive than the generic 'add_child'.
They also correspond well to the Parrot ops.

I've also thought about doing 'push' as a :vtable entry, and we can
still easily do that, but there are at least two items in favor of
keeping a method-based approach: (1) :vtable in subclassed items
still has some issues to be addressed (e.g., RT #40626), and
(2) when we get a high-level transformation language into TGE, it's
very likely that the operations on nodes will be method-based
and not opcode-based.

> ---
>
> Clear boundaries between components: (Fuzzy boundaries of abstraction
> make it difficult to allow for other implementations of the AST/OST or
> customization of the compiler object.)
>
> - The 'compile' method doesn't belong in the PAST object, it belongs in
> HLLCompiler.

> ...

After a lot of thought and false starts, I ended up taking a
different approach to compilation than the "HLLCompiler specifies
the complete sequence of transformations". Essentially I've taken
the approach that a "compiler" is simply something that transforms
a source data structure into a target data structure, and so
what we really have is a sequence of "compilers". To this end,
I really wanted to call my compiler base class 'Compiler'
and not 'HLLCompiler', but unfortunately that classname is already
used by Parrot for something else and so 'HLLCompiler' is what I
chose until that could be resolved. The 'HLL' probably implies
more than I intended to imply.

So, the 'Abc' compiler really is just something that converts the
'bc' language into a PAST structure, after doing that it simply
hands the result off to the 'PAST-pm' compiler. Similarly,
the 'PAST' compiler translates into POST and hands the result
off to the POST compiler, and POST simply does its thing and
returns a PIR or executable result.

> - The 'compile' method also doesn't belong in the main compiler
> executable, it belongs in HLLCompiler.
> - Merge them into one 'compile' method in HLLCompiler.

> - Customization of HLLCompiler should be handled by creating a subclass
> of HLLCompiler. (The current 'register' strategy is somewhat fragile.)

I don't have any problem with having each language subclass
HLLCompiler and override the 'compile' method in each, I'll
work on that soon. Of course, the method still ends up one way
or another in the main compiler executable, it may simply change
the namespace.


> - Provide an 'init' method for HLLCompiler that lets the compiler writer
> set which modules HLLCompiler will use for each stage of compilation.
> This will cover the majority of compilers without requiring each
> compiler writer to define their own 'compile' routine.

Because of the multi-stage approach I've taken, the compile
routines are already fairly short, and to me they're not at all
onerous for a compiler writer to create. For each of languages/abc/,
languages/APL/, and languages/perl6/ the 'compile' method is
less than 30 lines of PIR. (And it will only require a couple
of lines of code to abstract the existing call to 'compile' methods
of PAST/POST to instead use PAST/POST compilers.)

I also think that many compilers may end up with compiler-specific
option flags or other items that need to be taken care of, and it
seems to me that this is more easily handled by a method definition
than a module specification.

> - It would be easier to maintain (and create) the list of HLL to PIR
> operator associations in something like a YAML file than embedded in the

> parser grammar file. [...]

Hmm. My feeling was that it was easier to put the operator
associations in the parser grammar file, but I can see the value
of placing them somewhere else, and I definitely would like to
keep Parrot-isms out of the AST as much as possible.

OTOH, there are many times when for optimization reasons or
other items it's useful to be able to drop some Parrot hints
directly into the AST (e.g., the 'ctype' attribute above), and
so I think that as long as full program semantics are captured
in the AST without any Parrot-specific items, it's okay to have
Parrot-specific items available in the AST as compiler hints
simply because it's sometimes easier to place them there than
elsewhere.

> At the very least, the 'pirop' property on parser rules could be handled
> by the PAST-to-POST transformation, so the compiler writer doesn't have
> to manually pull those values out of the parser grammar's optable when
> creating the AST.

Agreed -- I'll work on this.

> (If the parser grammar module was specified in
> HLLCompiler's 'init', then the compiler object would know where to look
> for the optable.)

I'm thinking this is really a parameter to the AST compiler, along
with some useful support routines to make it easy to grab this
information from a YAML file or other source. (In fact, it would
be easy to supply a utility routine to construct the
HLL operator -> pirop translation from the optable, so that
compiler writers that wanted to specify the pirop in the
parser grammar could easily do so and still not have to manually
pull the values across.)


> ---
> Refactor into smaller units: (Easier to test, easier to maintain, easier
> to debug, and easier to create subclasses.)
>
> - In HLLCompiler, instead of a monolithic 'command_line' method, split
> the components into independent methods all called from the
> 'command_line' method.

Agreed; HLLCompiler was thrown together quickly, and it was done
before it was even clear that we would have a standard compiler
object.

> - In HLLCompiler, split the 'compile' method out into independent
> methods for each compilation stage ('compile_ast', 'compile_ost',
> 'compile_pir', etc.), all called from 'compile'.

Again, I tend to think of this as being all separate compilers,
each of which automatically call its default next stage until
compiler options tell it to do otherwise.


> ---
> Useful error detection: (Be kind to your compiler writers.)
>
> - Returning the source code string from the 'compile' method in
> HLLCompiler when no compiler is registered isn't helpful. Throw an
> exception or give an error. Give the compiler writer a clue to what's wrong.

I just threw in a quick default, and yes, throwing an exception
would be a better default. Sorry.

> - Provide distinct errors/exceptions for failures at each stage of
> compilation to make it easy to figure out which stage is failing.

Agreed -- however, exception handling in Parrot still needs
implementation and better flushing out (this is what prompted
my question about the status of exception handling implementation
in last week's #parrotsketch, and my comment that I'm likely to
need them fairly soon.)

> ---
> Side comments:
>
> - In PGE grammars, what is the "{ ... }" at the end of every proto
> declaration supposed to do? It seems like a dead weight. Is that where
> the Perl 6 code defining the operator supposed to go? Can we allow
> 'proto' declarations to end in a semi-colon when there is no Perl 6 code?

PGE grammars are trying to closely follow the Perl 6 syntax,
so the "{ ... }" is the Perl6 "yada-yada-yada" body that gets
used in function prototypes. And yes, I could see some compilers
choosing to specify the code directly in the declaration
(which is why 'proto' can also be 'sub' or the other Perl 6
subroutine keywords).

But in the end, I didn't allow simple semicolon terminators
simply because it wasn't valid Perl 6 syntax, and in many cases
I think that having subtle differences isn't ideal as people
may get confused about what is allowed where. But I don't have
a large objection to modifying the PGE::Grammar compiler to
represent empty declarations with semicolons as well as
yada-yada-yada blocks.

Pm

Patrick R. Michaud

unread,
Nov 27, 2006, 2:22:38 AM11/27/06
to Allison Randal, Perl 6 Internals
On Sun, Nov 26, 2006 at 08:30:32PM -0800, Allison Randal wrote:
> Overall, the POST implementation is usable and I really like the new HLL
> compiler module. I've got Punie working with the new toolchain to the
> point that it's generating valid PIR code for many low-level constructs,
> but some of the high-level constructs that worked under the previous
> toolchain I still don't have working.

Also, out of curiosity, which high-level constructs in punie aren't
working?

Pm

Allison Randal

unread,
Nov 27, 2006, 4:13:52 AM11/27/06
to Patrick R. Michaud, Perl 6 Internals
I'll split my replies into separate threads to make it easier to wrap
our brains around individual chunks.

Patrick R. Michaud wrote:
>>
>> Clear boundaries between components: (Fuzzy boundaries of abstraction
>> make it difficult to allow for other implementations of the AST/OST or
>> customization of the compiler object.)
>>
>> - The 'compile' method doesn't belong in the PAST object, it belongs in
>> HLLCompiler.
>> ...
>
> After a lot of thought and false starts, I ended up taking a
> different approach to compilation than the "HLLCompiler specifies
> the complete sequence of transformations". Essentially I've taken
> the approach that a "compiler" is simply something that transforms
> a source data structure into a target data structure, and so
> what we really have is a sequence of "compilers". To this end,
> I really wanted to call my compiler base class 'Compiler'
> and not 'HLLCompiler', but unfortunately that classname is already
> used by Parrot for something else and so 'HLLCompiler' is what I
> chose until that could be resolved. The 'HLL' probably implies
> more than I intended to imply.
>
> So, the 'Abc' compiler really is just something that converts the
> 'bc' language into a PAST structure, after doing that it simply
> hands the result off to the 'PAST-pm' compiler. Similarly,
> the 'PAST' compiler translates into POST and hands the result
> off to the POST compiler, and POST simply does its thing and
> returns a PIR or executable result.

Let's take a couple steps back. The compiler module is really like
Test::Builder. It's the infrastructure code that provides standard
functionality to all compiler writers. Standardization is good, it means
we don't have 500 incompatible implementations of 'ok'. (Actually, we
still have non-standard implementations of 'ok' floating around, and
they're a major headache. All the more reason to standardize the
compiler tools early on.)

With tests, each test file does one thing (tests a chunk of code, says
'ok' or 'not ok' multiple times). The individual tests don't need to
each duplicate the infrastructure code. Test::Harness provides the
infrastructure, progresses through all the tests, maintains
meta-information as it goes, and summarizes at the end.

With compiler modules, the individual PGE and TGE modules each do one
thing, take in the "source code" in one form and output it in another
form. There's no need to re-write the infrastructure code into the
syntax tree modules for every stage of compilation. Let
Compiler::Builder (or Compiler::Harness, or whatever we call it) handle
the infrastructure.

>> - The 'compile' method also doesn't belong in the main compiler
>> executable, it belongs in HLLCompiler.
>> - Merge them into one 'compile' method in HLLCompiler.
>> - Customization of HLLCompiler should be handled by creating a subclass
>> of HLLCompiler. (The current 'register' strategy is somewhat fragile.)
>
> I don't have any problem with having each language subclass
> HLLCompiler and override the 'compile' method in each, I'll
> work on that soon. Of course, the method still ends up one way
> or another in the main compiler executable, it may simply change
> the namespace.

The point is that 99% of compiler writers shouldn't need to write any
code for the 'compile' method at all.

>> - Provide an 'init' method for HLLCompiler that lets the compiler writer
>> set which modules HLLCompiler will use for each stage of compilation.
>> This will cover the majority of compilers without requiring each
>> compiler writer to define their own 'compile' routine.
>
> Because of the multi-stage approach I've taken, the compile
> routines are already fairly short, and to me they're not at all
> onerous for a compiler writer to create. For each of languages/abc/,
> languages/APL/, and languages/perl6/ the 'compile' method is
> less than 30 lines of PIR. (And it will only require a couple
> of lines of code to abstract the existing call to 'compile' methods
> of PAST/POST to instead use PAST/POST compilers.)

a) Most compilers will simply cut-n-paste an existing 'compile' routine
from an existing compiler. Cut-n-paste programming is a "code smell" and
a maintenance headache.

b) Why require the compiler writer to write 30 lines of code when they
could write one? The entire core executable for a compiler could consist
of nothing but:

.sub '__onload' :load :init
# load your modules
$P1 = new [ 'HLLCompiler' ]
$P1.'init'('language'=>'punie', 'parse_grammar'=>'Punie::Parser',
'ast_grammar'=>'Punie::AST::Grammar')
.end
.sub 'main' :main
.param pmc args
$P0 = compreg 'punie'
$P1 = $P0.'command_line'(args)
.return ($P1)
.end

That's a great selling point to new compiler writers. (And I'd be even
happier if we could export the 'main' routine from HLLCompiler instead
of cut-n-pasting it.)

> I also think that many compilers may end up with compiler-specific
> option flags or other items that need to be taken care of, and it
> seems to me that this is more easily handled by a method definition
> than a module specification.

Some will, but subclassing Compiler::Builder is a familiar and
straightforward process, and will give them all the flexibility they
need to customize its behavior, not just the 'compile' routine. Optimize
for the common case, be flexible enough for the complex case.

>> (If the parser grammar module was specified in
>> HLLCompiler's 'init', then the compiler object would know where to look
>> for the optable.)
>

> I'm thinking this is really a parameter to the AST compiler...

It's infrastructure code. Any stage of compilation may need access to
the optable, so the information on where to find it belongs in the
meta-object that is governing all the compilation stages. (Generating
the optable I'll leave for a different thread.)

>> - In HLLCompiler, split the 'compile' method out into independent
>> methods for each compilation stage ('compile_ast', 'compile_ost',
>> 'compile_pir', etc.), all called from 'compile'.
>
> Again, I tend to think of this as being all separate compilers,
> each of which automatically call its default next stage until
> compiler options tell it to do otherwise.

Standardized infrastructure code good. Make Ogg-itect happy. :)


Once we have a standardized infrastructure, it opens up lots of
possibilities. Like, how about a subclass of Compiler::Builder that
accumulates statistics about the time spent on each stage of compilation
and reports it at the end of the compile? Or language smoke-testing
reports on the website broken down by compile stage? ("This test was
successful through the POST stage, but this one never made it through
the parse.")

Allison

Allison Randal

unread,
Nov 27, 2006, 1:52:13 PM11/27/06
to Patrick R. Michaud, Perl 6 Internals
Patrick R. Michaud wrote:
>
> Also, out of curiosity, which high-level constructs in punie aren't
> working?

What I've found so far are:

- The top-level AST structure is off: my temporary hack to replace
PAST::Stmt and PAST::Exp with PAST::Stmts is producing extra temporary
variables in the PIR output. I need to refactor the top few tiers of
transformation rules, and maybe refactor the Punie parser grammar.

- Conditionals are handled completely differently in the new PAST, so
Punie needs some replumbing in the AST transformation for those.

- Comma lists are also handled completely differently.

So, it's not a matter of missing features (aside from PAST::Label), it's
just a matter of adapting the code to a different way of thinking. I'll
work through these in the next few days and let you know what I find as
I go.

Allison

Patrick R. Michaud

unread,
Nov 27, 2006, 2:22:38 PM11/27/06
to Allison Randal, Perl 6 Internals
On Mon, Nov 27, 2006 at 10:52:13AM -0800, Allison Randal wrote:
> Patrick R. Michaud wrote:
> >
> >Also, out of curiosity, which high-level constructs in punie aren't
> >working?
>
> What I've found so far are:
>
> - The top-level AST structure is off: my temporary hack to replace
> PAST::Stmt and PAST::Exp with PAST::Stmts is producing extra temporary
> variables in the PIR output. I need to refactor the top few tiers of
> transformation rules, and maybe refactor the Punie parser grammar.

I'll gladly add PAST::Stmt and PAST::Exp nodes if that's at all
useful. Just because they're there doesn't mean a compiler has to
use them. :-)

> - Comma lists are also handled completely differently.

PAST itself doesn't know anything about comma lists -- it just
thinks of comma as being an operator like any other operator.
In perl6 the infix:, operator has 'list' associativity, so that
it ends up with a variable arity. However, I recognize that some
languages might need to keep the notion that commas are left-associative
with arity 2, so perhaps we need some form of 'list' pasttype
that would combine the operands together somehow?

> So, it's not a matter of missing features (aside from PAST::Label), it's
> just a matter of adapting the code to a different way of thinking. I'll
> work through these in the next few days and let you know what I find as
> I go.

That'd be great. I'm working on some refactors of HLLCompiler and
PAST right now, I don't think any of these will break existing code.

Pm

Patrick R. Michaud

unread,
Nov 27, 2006, 5:43:21 PM11/27/06
to Allison Randal, Perl 6 Internals
On Mon, Nov 27, 2006 at 01:13:52AM -0800, Allison Randal wrote:
> .sub '__onload' :load :init
> # load your modules
> $P1 = new [ 'HLLCompiler' ]
> $P1.'init'('language'=>'punie', 'parse_grammar'=>'Punie::Parser',
> 'ast_grammar'=>'Punie::AST::Grammar')
> .end
> .sub 'main' :main
> .param pmc args
> $P0 = compreg 'punie'
> $P1 = $P0.'command_line'(args)
> .return ($P1)
> .end
>
> [...]

>
> Standardized infrastructure code good. Make Ogg-itect happy. :)

We definitely want Ogg-itect to remain happy. :-)

Now implemented in r15882 as shown above, sans the helper 'init'
method (which I'll add later tonight). Examples are in
languages/perl6/ and languages/abc/ .

Time permitting tonight I will also refactor the monolithic
'command_line' method of HLLCompiler into separate shorter methods.

Pm

Allison Randal

unread,
Nov 27, 2006, 7:50:23 PM11/27/06
to Patrick R. Michaud, Perl 6 Internals
This fragment of response is about types, layers of abstraction and
tracking information as the stages of compilation progress.

And, I probably haven't said it enough yet, but the work you've done
here is absolutely wonderful, Patrick. There's nothing like a solid
chunk of working code to push the design to the next stage of evolution. :)

Patrick R. Michaud wrote:
> On Sun, Nov 26, 2006 at 08:30:32PM -0800, Allison Randal wrote:
>
>> - Is there no way to indicate what type of variable a PAST::Var is?
>> Scalar/Array/Hash? (high-level types, not low-level types)
>
> Sure, that's what 'vtype' is -- it indicates the type of value
> that the variable ought to hold.
>
> My plan has been to follow the Perl6 concept of "implementation types"
> and "value types" within PAST. Thus far I've only put in the support
> for the value types, as the "vtype" attribute (and vtype can be any
> high-level type the language happens to support). I'm expecting
> to add an "itype" attribute at some point when we're a bit farther
> along; I'm still working out the details.

Hrm... you've really got two HLL types: the container type
(scalar/array/hash) and the value type (Str, Int, Foo::Bar, Array, Hash,
Matrix, Custom::Hash, etc).

You've also essentially got two PIR types: the container type
(int/num/str/pmc) and the value type (int, num, str, or some pmc type).

By "implementation type" do you mean the PIR value type?


A YAML config file to map HLL value types to PIR value types for a
particular compiler would be another nice addition. PAST doesn't need to
know anything about PIR types.


>> - In PAST nodes, the attribute 'ctype' isn't actually storing a C
>> language type. Better name?
>
> It really stands for "constant type", and is one of 'i', 'n', or
> 's' depending on whether it can be treated as an int, num, or
> string when being handled as a constant in PIR.

Okay, 'const_type' is a better name.

>> - The attribute 'vtype' is both variable type in POST::Var and value
>> type in POST::Val. Handy generalization, but it's not clear from the
>> name that 'vtype' is either of those things.
>
> I think you meant PAST::Var/PAST::Val here, as there isn't a POST::Var
> or POST::Val.

Indeed I did. Though, why isn't there a POST::Var or POST::Val? POST has
both variables and values.

> But 'vtype' really stands for "value type" in both
> cases -- it's the type of value returned by either a PAST::Var
> or PAST::Val node.

Hmm... If a PAST::Var is, say, an integer constant, will it have the
same 'value_type' as an integer PAST::Val?

(Definitely go with the longer name instead of 'vtype'.)

>> - The values for both 'ctype' and 'vtype' are obscure. Better to
>> establish a general system for representing types, than to include raw
>> Parrot types or 1-letter codes in the AST.
>
> Ultimately I expect that the types that appear in 'vtype' will
> be the types defined by the HLL itself. For example, in perl6
> one would see 'vtype'=>'Str' to indicate a Perl 6 string constant.
> Unfortunately it's been difficult to illustrate this in real code
> because of the HLL classname conflicts that I've been reporting
> in other contexts.

What bug # is that? It's hard to imagine how an HLL type name that's
only stored in an AST would conflict with a Parrot class name. Or, are
you assuming that the HLL type names have to be the same as the Parrot
class names? Shouldn't need to be the same, you just need a config file
mapping between the two.

> I agree the values and name for 'ctype' are a bit obscure, and
> will gladly accept any suggestions for improving it. The 'ctype'
> attribute is really just code optimization in the final output,
> and it does assume some knowledge of the target. If no ctype
> is specified, past-pm assumes that the constant value must
> first be placed into a PMC in order to be useful. With
> a ctype present, then past-pm can match up the (PIR) opcode
> contexts in which the value can be directly used as an
> int/num/string in an operation. It's the difference between
>
> # $b + 2 # $b + 2
> get_global $P0, '$b' get_global $P0, '$b'
> new $P2, .Undef new $P1, .Integer
> add $P2, $P0, 2 assign $P1, 2
> new $P2, .Undef
> add $P2, $P0, $P1
>
> or
>
> # say 3, 4, 5 # say 3, 4, 5
> "say"(3, 4, 5) new $P1, .Integer
> assign $P1, 3
> new $P2, .Integer
> assign $P2, 4
> new $P3, .Integer
> assign $P3, 5
> "say"($P1, $P2, $P3)

Okay, if ctype is an optimization hint, then you don't actually need to
list the specific types (i/n/s) in the PAST nodes. All you need is the
name of the HLL value type, and a small bit of config info for that type
name. Whether a particular HLL type can be used directly as an int, num,
or string, and which it can be used as, is always consistent for that
type. Int can be used as a low-level integer, and Matrix can never be
used as a low-level constant.

So, PAST provides the HLL type name, a configuration file provides
details about that type, and the PAST-to-POST transformation decides
when to use direct values (for the HLL types that allow it).


>> - In PAST nodes again, I'm not clear on what 'pirop' (PAST::Op)
>> represents. Is it the literal name of a PIR opcode, or a generic
>> representation of standard low-level operations? I'm more in favor of
>> the latter. Better still, give compiler-writers a standard format lookup
>> table they can write to allow the PAST to POST tranformation to select
>> the right PIR operation from the HLL op name. (See the comments on
>> boundaries of abstraction.)
>
> I think past-pm already has exactly what you want here, but it
> may not be entirely clear. First, 'pirop' does exactly what you
> request in 'Better still, ...' -- it provides a way for the compiler
> writer to identify the right PIR operation from the HLL op name.
> In particular, in the operator-precedence specification a
> compiler writer writes:
>
> proto infix:+ is pirop('add') { ... }
> proto infix:- is pirop('sub') { ... }
>
> and this provides an easy way for the parse-to-past transformation
> to associate the correct PIR operation from the HLL op name.
> Essentially, the transformation looks for a 'pirop' trait on
> the operator, and if found it puts it in the 'pirop' attribute
> of the corresponding PAST::Op node.

Aye, that's how I have it working now. (Actually 'n_add' instead of
'add', because 'add' didn't work, so I cargo-culted from the perl6
implementation.)

> The values of 'pirop' are really generic representations of
> standard low-level operators. Unfortunately, PIR is not as
> regular as we might like it to be -- some PIR operations will
> work only with pmc operands, some will work with a variety of
> int/num/string/pmc operands, and still others won't work with
> pmcs at all. So, POST.pir has a lookup table (%pirtable)
> that takes the generic name given by 'pirop' and does any
> necessary coercions to get the operands to match. So far
> this table is incomplete -- I've been adding entries only
> as I need them.

Okay, good. This is a nice abstraction layer. And, I note it can work
equally well whether the optable is generated from the parser grammar or
from a separate config file. Also good.

> The idea is that a compiler writer can use 'pirop' to
> specify the mapping of HLL operators into PIR opcodes
> directly in the grammar files where the HLL operators are
> being defined. Furthermore, the compiler writer doesn't have
> to keep track of the low-level details for each PIR opcode;
> i.e., when specifying 'pirop'=>'concat' the past-pm code
> generation knows that concat needs string oprands. (However,
> if a compiler writer needs a specific PIR opcode, then they
> could specify it with something like 'pirop'=>'concat_p_sc'.)

Reasonable. The association to PIR opcode names has to be declared
somewhere. We can probably come up with better syntax than Parrot's
cryptic internal 'concat_p_sc', but it's good enough to start.

>> - It would be easier to maintain (and create) the list of HLL to PIR
>> operator associations in something like a YAML file than embedded in the
>> parser grammar file. [...]
>
> Hmm. My feeling was that it was easier to put the operator
> associations in the parser grammar file, but I can see the value
> of placing them somewhere else, and I definitely would like to
> keep Parrot-isms out of the AST as much as possible.
>
> OTOH, there are many times when for optimization reasons or
> other items it's useful to be able to drop some Parrot hints
> directly into the AST (e.g., the 'ctype' attribute above), and
> so I think that as long as full program semantics are captured
> in the AST without any Parrot-specific items, it's okay to have
> Parrot-specific items available in the AST as compiler hints
> simply because it's sometimes easier to place them there than
> elsewhere.

Sounds like we're in philosophical agreement. I'm okay with having a
limited amount of Parrot-specific information in the AST, if it's
extraneous to representing the semantics of the source code. At the same
time, if the compiler hints are stored in an optable that's accessible
from all stages of compilation, I don't see the advantage of annotating
them in the AST. It just spends additional processor time and storage to
create an unused copy of the information. So, "allowed but rare" would
be my rule of thumb.

Still, that question is completely separate from the question of where
the compiler writer declares the optable information. For now, let's
take both options on that one: keep the traits on the operator
precedence parser rules, but provide a config file format to generate
optables independently. (We probably need to provide the second option
anyway, since not every compiler writer will use PAST, or even PGE.)

>> At the very least, the 'pirop' property on parser rules could be handled
>> by the PAST-to-POST transformation, so the compiler writer doesn't have
>> to manually pull those values out of the parser grammar's optable when
>> creating the AST.
>
> Agreed -- I'll work on this.

Excellent.

Allison

Allison Randal

unread,
Nov 27, 2006, 8:28:59 PM11/27/06
to Patrick R. Michaud, Perl 6 Internals
Patrick R. Michaud wrote:
>
> I'll gladly add PAST::Stmt and PAST::Exp nodes if that's at all
> useful. Just because they're there doesn't mean a compiler has to
> use them. :-)

Well, I came to the conclusion that PAST::Exp was useless a while ago.
(Its entire point of existence was as a dummy node to be factored out at
the PAST-to-POST stage.) I do think PAST::Stmt is useful, but I want to
take a stab at refactoring it out first.

Oh, I should have mentioned that the patch I sent in to remove the dummy
'root' rule from the POST::Grammar was part of what was making Punie
work (because Punie's top-level node isn't a PAST::Block, it's a
PAST::Stmts). I can refactor that out, but in this case it seemed to
make more sense to refactor the compiler tool (since the other languages
still worked with the change).

> I'm working on some refactors of HLLCompiler and
> PAST right now, I don't think any of these will break existing code.

Break away. I'm fine with the implementation shifting under the Punie
port, since it means progress.

Allison

Patrick R. Michaud

unread,
Nov 27, 2006, 11:20:34 PM11/27/06
to Allison Randal, Perl 6 Internals
On Mon, Nov 27, 2006 at 05:28:59PM -0800, Allison Randal wrote:
> Patrick R. Michaud wrote:
> >
> >I'll gladly add PAST::Stmt and PAST::Exp nodes if that's at all
> >useful. Just because they're there doesn't mean a compiler has to
> >use them. :-)
>
> Well, I came to the conclusion that PAST::Exp was useless a while ago.
> (Its entire point of existence was as a dummy node to be factored out at
> the PAST-to-POST stage.) I do think PAST::Stmt is useful, but I want to
> take a stab at refactoring it out first.

Excellent. Let me know when/if you want PAST::Stmt added in, and any
attributes you want it to have.

> Oh, I should have mentioned that the patch I sent in to remove the dummy
> 'root' rule from the POST::Grammar was part of what was making Punie
> work (because Punie's top-level node isn't a PAST::Block, it's a
> PAST::Stmts). I can refactor that out, but in this case it seemed to
> make more sense to refactor the compiler tool (since the other languages
> still worked with the change).

POST really needs to have a POST::Sub at the top of the tree,
so the purpose of the 'root' rule in POST::Grammar is (going to be)
to create a POST::Sub for the tree if the lower transformations
don't happen to return one. I'll add that code shortly, and then
things should work properly even if the top-level node in PAST
isn't a PAST::Block.

Pm

Allison Randal

unread,
Nov 28, 2006, 12:20:08 AM11/28/06
to Patrick R. Michaud, Perl 6 Internals
Patrick R. Michaud wrote:
>
> Now implemented in r15882 as shown above, sans the helper 'init'
> method (which I'll add later tonight). Examples are in
> languages/perl6/ and languages/abc/ .

Definitely an improvement. Hmm... okay, I see what you're going for.
Creating subclass of HLLCompiler for every stage of compilation is
heavyweight, but it's definitely nice to be able to say "give me a
compiler for this tree".

So, with a thumbs up on that modification, I've attached a patch that
does two things: a) keeps strict functionality boundaries so the
controller object does the controlling, and the compiler objects for
PAST and POST do only compiling; and b) makes it possible to override
the grammar used for the PAST-to-POST transformation. ABC passes all its
tests, and Perl6 doesn't fail any more tests than it was failing before.
(I made it a patch because it's a refactor that's easy to show but
convoluted to explain.)

chromatic's suggestion is to replace the series of manual calls in
HLLCompiler's 'compile' method with an iterator over an array of
compiler tasks. Then, a compiler-writer can insert another task (perhaps
a tree-based optimizer between the PAST and POST stages), by calling a
method to specify that the new task is 'before' or 'after' another task
(much like the precedence levels of PGE rules). His idea is a good next
step, but I wanted to keep the change set small, so didn't implement it
here.

Allison

refactor_partridge_boundaries.patch

Allison Randal

unread,
Nov 28, 2006, 1:13:21 AM11/28/06
to Patrick R. Michaud, Perl 6 Internals
This fragment of a reply is the random bits that didn't make it into
other topic-centered replies.

Patrick R. Michaud wrote:
> On Sun, Nov 26, 2006 at 08:30:32PM -0800, Allison Randal wrote:
>
> Excellent. Just as a general overall response -- PAST-pm is by
> no means "finished", so many of the items that seem to be missing
> are simply cases of "I haven't gotten to them yet so they aren't
> implemented yet."

Understood, it's a work in progress. Which makes this the perfect time
to influence it's future. :)

>> - There's no PAST::Label node type? How do you represent labels in the
>> HLL source?
>
> I just haven't gotten to this part yet.

Okay, I'll add it when I need it, if you haven't already added it by then.


> I don't have a problem with switching it to 'value', I went with
> 'name' primarily because every PAST::Node has a name and so it just
> made sense to use it there. But let me make another weak argument
> in favor of 'name'. If a HLL programmer writes
>
> $a = 1.23456789E6;
>
> then the rhs becomes a PAST::Val node. How should we represent the
> value? The parse-to-past translation could evaluate the contents of
> "1.23456789E1" and store the result in 'value' as (.Float) 12.3456789,
> but unfortunately when convert that .Float back into a string for
> use as PIR code it comes out as "12.3457" -- i.e., the code looks
> like:
>
> $P0 = new .Float
> $P0 = 12.3457
> set_global '$a', $P0
>
> I decided that in a number of cases like this, what we really want
> to retain in PAST::Val is a precise string representation of the
> value that goes in the resulting output, and not a native
> representation that may lose precision in translation through
> POST/PIR. So, what we're really storing is the value's "name"
> and not its "value". (I did say it was a weak argument.)

Fair. And agreed that PAST::Val should store the raw parsed constant,
not an evaluated form.

> Anyway, we can switch to 'value' if that's ultimately better;
> I was just thinking that 'name' might be equally appropriate.

Yeah, let's go with 'value'. The only case I can think of that might
need to use a value as a name is Ruby, where you can call a method on a
literal:

2.class

But in that case, I think you'd end up representing '2' as a constant
Var named '2' anyway (perhaps with a PIR value type of RubyLiteralInt).

> "Ismy" means "isdeclaration" here, and I can go ahead and change it.

Excellent!

> Currently Parrot uses '__init' as the method for initializing
> new objects, thus I think 'init' is at least consistent with Parrot.

Where it's inconsistent is in the arguments each takes, so you can't use
the current 'init' methods as :vtable('init') methods. I'm half-way
inclined to see that as a limitation in Parrot that needs to be fixed
rather than a problem with these classes.

> I've also thought about doing 'push' as a :vtable entry, and we can
> still easily do that, but there are at least two items in favor of
> keeping a method-based approach: (1) :vtable in subclassed items
> still has some issues to be addressed (e.g., RT #40626), and

Yes, hold off on this fix until :vtable works, but put it into the draft
PDD.

> (2) when we get a high-level transformation language into TGE, it's
> very likely that the operations on nodes will be method-based
> and not opcode-based.

Well, the operations will be in a middle-level-language syntax. Whether
the MLL uses a methody syntax or a procedural syntax doesn't matter,
since either can be translated to either syntax in PIR.

Besides, using :vtable we can get both a method and a :vtable entry for
the price of one method definition.

>> Clear boundaries between components: (Fuzzy boundaries of abstraction
>> make it difficult to allow for other implementations of the AST/OST or
>> customization of the compiler object.)

- One more comment in this department: move PIR generation out of the
POST node objects. A tree-grammar that outputs PIR code strings isn't a
final solution, but it's a more maintainable intermediate step than
mingled syntax tree representation and code generation (remember P6C?).
A clear boundary between the OST and PIR generation will also push us
closer to the final solution.


>> - Provide distinct errors/exceptions for failures at each stage of
>> compilation to make it easy to figure out which stage is failing.
>
> Agreed -- however, exception handling in Parrot still needs
> implementation and better flushing out (this is what prompted
> my question about the status of exception handling implementation
> in last week's #parrotsketch, and my comment that I'm likely to
> need them fairly soon.)

Yes, exceptions need work, and soon.

>> - In PGE grammars, what is the "{ ... }" at the end of every proto
>> declaration supposed to do?

[...]


> But in the end, I didn't allow simple semicolon terminators
> simply because it wasn't valid Perl 6 syntax, and in many cases
> I think that having subtle differences isn't ideal as people
> may get confused about what is allowed where. But I don't have
> a large objection to modifying the PGE::Grammar compiler to
> represent empty declarations with semicolons as well as
> yada-yada-yada blocks.

Excellent.

Allison

Patrick R. Michaud

unread,
Nov 28, 2006, 2:17:25 AM11/28/06
to Allison Randal, Perl 6 Internals
On Mon, Nov 27, 2006 at 09:20:08PM -0800, Allison Randal wrote:
> Patrick R. Michaud wrote:
> >
> >Now implemented in r15882 as shown above, sans the helper 'init'
> >method (which I'll add later tonight). Examples are in
> >languages/perl6/ and languages/abc/ .
>
> So, with a thumbs up on that modification, I've attached a patch that
> does two things: a) keeps strict functionality boundaries so the
> controller object does the controlling, and the compiler objects for
> PAST and POST do only compiling; and b) makes it possible to override
> the grammar used for the PAST-to-POST transformation. ABC passes all its
> tests, and Perl6 doesn't fail any more tests than it was failing before.
> (I made it a patch because it's a refactor that's easy to show but
> convoluted to explain.)
>
> chromatic's suggestion is to replace the series of manual calls in
> HLLCompiler's 'compile' method with an iterator over an array of
> compiler tasks.

I very much agree with chromatic -- indeed, this is mainly why I didn't
go with putting "ostgrammar" methods into the HLLCompiler object
before. Having HLLCompiler effectively hardcode a sequence
of parser-astgrammar-ostgrammar feels a bit heavy-handed to me,
almost saying that "we really expect you to always have exactly
the sequence source->parse->ast->ost->pir->bytecode, and you're
definitely using TGE for the intermediate steps".

I guess if we expect a lot of compilers to be making language-specific
derivations or replacements of the ast->ost stage then putting the
ost specifications into HLLCompiler makes some sense, but I
totally agree with chromatic that a more generic approach is
needed here. And what I had been aiming for in terms of "array
of compiler tasks" was something like "array of compiler stages",
where each compiler stage is itself a "compiler" (in the compreg
and HLL compiler sense) that does the transformation to the
next item in the list. And each compiler stage knows the
details of how it performs its transformation, whether that's using
TGE or some other method. Putting transformation details like
the ostbuilder and apply steps into HLLCompiler still feels wrong
to me somehow, although I did come around to agreeing with the
idea that the commonly repeated details for source->parse and
parse->ast belong in the default 'compile' method for compiler
objects.

Part of me really wishes that each compiler task would end
up being a standardized 'apply' or 'compile' subroutine
or method of each stage. In other words, to have compilation
effectively become a sequence like:

.local pmc code
# source to parse tree
$P0 = get_hll_global ['Perl6::Grammar'], 'apply'
code = $P0(code, adverbs :flat :named)

# parse tree to ast
$P0 = get_hll_global ['Perl6::PAST::Grammar'], 'apply'
code = $P0(code, adverbs :flat :named)

# ast to ost
$P0 = get_hll_global ['POST::Grammar'], 'apply'
code = $P0(code, adverbs :flat :named)

# ost to result
$P0 = get_hll_global ['POST::Compiler'], 'apply'
code = $P0(code, adverbs :flat :named)

Here the 'apply' functions in Perl6::PAST::Grammar and
POST::Grammar are simply imported from TGE and do the steps
of creating the builder object and then applying the grammar.
The 'apply' function in Perl6::Grammar would just be a
standardized start rule for the parser grammar (and can
be directly specified as such in the .pg file).

If we could standardize at this level, then a compiler simply
specifies the sequence of things to be applied, and the above
instructions could be implemented with a simple iterator over
the sequence. This is _really_ what I was attempting to get at
by having separate compiler objects for PAST, POST, and friends,
except that instead of calling the standard function 'apply'
I was using 'compile'. Part of me thinks that 'apply' and
'compile' are pretty much the same thing, in the sense that
both refer to using some sort of transformer "thing" to
change from a source representation into an equivalent target.

-----

At any rate, even if we go with the approach outlined in the
patch, I have to say that I'm not at all keen on the method
names 'astcompile', 'ostcompile', etc. in the patch.
When I read 'astcompile' it sounds to me like it's a method
to compile an ast into something else, when in fact the method
in the patch is compiling some source into an ast. (By analogy,
we speak of "Perl 6 compiler" and "PIR compiler" as being
things that consume Perl 6 and PIR, not the things that that
produce Perl 6 or PIR.)

So at the very least I'd prefer to have those methods called
'get_ast' or 'make_ast' or something much less likely to
cause confusion. Indeed, the reason why I went with simple
'parse' and 'ast' method names in the original is because the
method name tells me what it is that I'm getting back, much like
an accessor.

Pm

Patrick R. Michaud

unread,
Nov 28, 2006, 2:51:57 AM11/28/06
to Allison Randal, Perl 6 Internals
On Mon, Nov 27, 2006 at 10:13:21PM -0800, Allison Randal wrote:
> This fragment of a reply is the random bits that didn't make it into
> other topic-centered replies.

...and some quick responses before turning in for the night...

> >Currently Parrot uses '__init' as the method for initializing
> >new objects, thus I think 'init' is at least consistent with Parrot.
>
> Where it's inconsistent is in the arguments each takes, so you can't use
> the current 'init' methods as :vtable('init') methods. I'm half-way
> inclined to see that as a limitation in Parrot that needs to be fixed
> rather than a problem with these classes.

Having dealt with this in both PGE and at least two PAST
implementations, I certainly see it as a Parrot limitation.
Ultimately I want to have a method that can accept variable
arguments so that I can initialize a newly created object.
I chose 'init' because it seemed like the natural/obvious
name for such a method, but if there's a better name I'll
gladly switch. I haven't found the Parrot :vtable('init')
to be all that useful, since there's not a parameterized
version of it beyond passing a single PMC. And getting
arguments into a single PMC isn't all that fun or useful.

But come to think of it, if we had something like Capture PMCs
available as a standard type (and an easy way to generate
them in PIR), then the existing :vtable('init') would be
quite sufficient. To steal from Perl 6's C<< \(...) >>
capture syntax:

$P0 = new 'Foo::Bar', \(param1, param2, 'abc'=>param3)

.sub 'init' :vtable
.param pmc args
# initialize self based on array/hash components of args pmc
# ...


> >I've also thought about doing 'push' as a :vtable entry, and we can
> >still easily do that, but there are at least two items in favor of
> >keeping a method-based approach:

> >(2) when we get a high-level transformation language into TGE, it's
> >very likely that the operations on nodes will be method-based
> >and not opcode-based.
>
> Well, the operations will be in a middle-level-language syntax. Whether
> the MLL uses a methody syntax or a procedural syntax doesn't matter,
> since either can be translated to either syntax in PIR.

My point is simply that it's far easier to go from a MLL
(whatever syntax) to PIR method calls than to generate specific
Parrot opcodes, because method calls have a very regular
syntax that Parrot opcodes don't.

> - One more comment in this department: move PIR generation out of the
> POST node objects. A tree-grammar that outputs PIR code strings isn't a
> final solution, but it's a more maintainable intermediate step than
> mingled syntax tree representation and code generation (remember P6C?).

I never really dealt with P6C. :-), Still, I can see about
moving the code generation out of the POST node objects; I may
do it as a lower priority though, since I don't think that
aspect is driving many design or implementation decisions for
us at this point.

> >>- In PGE grammars, what is the "{ ... }" at the end of every proto
> >>declaration supposed to do?
> [...]
> >But in the end, I didn't allow simple semicolon terminators
> >simply because it wasn't valid Perl 6 syntax, and in many cases
> >I think that having subtle differences isn't ideal as people
> >may get confused about what is allowed where. But I don't have
> >a large objection to modifying the PGE::Grammar compiler to
> >represent empty declarations with semicolons as well as
> >yada-yada-yada blocks.
>
> Excellent.

"Excellent" as in ...?
[ ] "Go ahead and allow semicolons, since you don't have
a large objection."
[ ] "Your explanation is excellent, stick with the yadas
to avoid the subtle contrasts to Perl 6."

Pm

Leopold Toetsch

unread,
Nov 28, 2006, 2:12:57 PM11/28/06
to perl6-i...@perl.org, Patrick R. Michaud, Allison Randal
Am Dienstag, 28. November 2006 08:51 schrieb Patrick R. Michaud:
> I'm half-way
>
> > inclined to see that as a limitation in Parrot that needs to be fixed
> > rather than a problem with these classes.
>
> Having dealt with this in both PGE and at least two PAST
> implementations, I certainly see it as a Parrot limitation.

This was discussed already more then one time. Last was IMHO:

http://groups.google.at/group/perl.perl6.internals/browse_frm/thread/e68dc0a0a96585b7/b536997757a3043b?lnk=gst&q=instantiate+toetsch+new&rnum=2#b536997757a3043b

leo

Leopold Toetsch

unread,
Nov 28, 2006, 4:27:10 PM11/28/06
to perl6-i...@perl.org, Patrick R. Michaud, Allison Randal
Am Dienstag, 28. November 2006 08:51 schrieb Patrick R. Michaud:
> But come to think of it, if we had something like Capture PMCs
> available as a standard type (and an easy way to generate
> them in PIR), then the existing :vtable('init') would be
> quite sufficient.

Another note. Yes, a core Capture PMC would help *in combination* with
re-coding calling-conv's internals. These internals are a bit suboptimal
currently as they are using to 'arrays' of information: the variable sized
opcode part (holding involved registers and constants) and the signature PMC
(with other call signature details). Unifying with and improving the latter
into a Capture would speed-up the argument passing code and simplify such
Capture-based new/init vtables. Please blame me for the current
imeplementation ;)

leo

Allison Randal

unread,
Nov 29, 2006, 6:24:07 PM11/29/06
to Patrick R. Michaud, Perl 6 Internals
Patrick R. Michaud wrote:
> On Mon, Nov 27, 2006 at 09:20:08PM -0800, Allison Randal wrote:
>>
>> chromatic's suggestion is to replace the series of manual calls in
>> HLLCompiler's 'compile' method with an iterator over an array of
>> compiler tasks.
>
> I very much agree with chromatic -- indeed, this is mainly why I didn't
> go with putting "ostgrammar" methods into the HLLCompiler object
> before. Having HLLCompiler effectively hardcode a sequence
> of parser-astgrammar-ostgrammar feels a bit heavy-handed to me,
> almost saying that "we really expect you to always have exactly
> the sequence source->parse->ast->ost->pir->bytecode, and you're
> definitely using TGE for the intermediate steps".

The patch I sent is the first step toward making chromatic's suggestion
work. The problem with the current implementation is that each stage
decides what the next stage will be. If the PAST-to-POST transformation
calls the POST-to-PIR transformation before returning, then you can't
easily insert an additional stage between the two.

> I guess if we expect a lot of compilers to be making language-specific
> derivations or replacements of the ast->ost stage then putting the
> ost specifications into HLLCompiler makes some sense, but I
> totally agree with chromatic that a more generic approach is
> needed here. And what I had been aiming for in terms of "array
> of compiler tasks" was something like "array of compiler stages",
> where each compiler stage is itself a "compiler" (in the compreg
> and HLL compiler sense) that does the transformation to the
> next item in the list. And each compiler stage knows the
> details of how it performs its transformation, whether that's using
> TGE or some other method.

I completely agree on the idea of giving each stage its own compiler,
and making that compiler aware of everything it needs to know to perform
its own stage of compilation. I also completely agree on putting as
little code as possible for performing the compilation into the
HLLCompiler module.

Where we diverge is that I don't want the compiler for one stage to know
anything about the next stage. Each stage should operate independently,
and only the HLLCompiler should control the order of stages.

Hm.... actually, I like this a lot better than registering a compiler
for POST and retrieving it by 'compreg'. I would push it one step
farther, though. Instead of setting 'astgrammar' in HLLCompiler's 'init'
method, set 'astcompiler'.

The revised method for a stage (using the parse-tree-to-AST as an
example) would be as follows, where the method only performs error
checks to make sure that it got a valid class name, creates a compiler
object for that stage, and calls 'compile'. (Here I'm using the naming
scheme from below.)

.sub 'compile_parse_tree' :method
.param pmc source
.param pmc adverbs :slurpy :named
.local string ptcompiler_name
.local pmc ptcompiler
ptcompiler_name = self.'ptcompiler'()
unless ptcompiler_name goto err_no_ptcompiler
$I0 = find_type ptgrammar_name
ptcompiler = new $I0
.return ptcompiler.'compile'(source)

err_no_ptcompiler:
$P0 = new .Exception
$P0['_message'] = 'Missing ptcompiler in compiler'
throw $P0
.end

For now, we create a separate compiler object for each tree grammar, but
ultimately TGE could generate the appropriate 'compile' method in each
generated tree grammar class.

> Part of me thinks that 'apply' and
> 'compile' are pretty much the same thing, in the sense that
> both refer to using some sort of transformer "thing" to
> change from a source representation into an equivalent target.

Yeah, both good, but neither seems quite right: 'apply' is so generic
that it's nearly meaningless, and 'compile' is perfect when the grammar
is being used as a stage in the compiler tools, but seems odd when it's
being used to transform other kinds of trees. chromatic suggests
'transform' which I like best of all.

> At any rate, even if we go with the approach outlined in the
> patch, I have to say that I'm not at all keen on the method
> names 'astcompile', 'ostcompile', etc. in the patch.
> When I read 'astcompile' it sounds to me like it's a method
> to compile an ast into something else, when in fact the method
> in the patch is compiling some source into an ast. (By analogy,
> we speak of "Perl 6 compiler" and "PIR compiler" as being
> things that consume Perl 6 and PIR, not the things that that
> produce Perl 6 or PIR.)
>
> So at the very least I'd prefer to have those methods called
> 'get_ast' or 'make_ast' or something much less likely to
> cause confusion. Indeed, the reason why I went with simple
> 'parse' and 'ast' method names in the original is because the
> method name tells me what it is that I'm getting back, much like
> an accessor.

Yeah, I had the same problem. The reason I changed the method name from
'ast' is that I initially thought it was transforming the AST, when it
was actually generating the AST (and even after I knew what it was doing
the name confused me a couple times). We have the same problem in the
modules too. POST::Grammar is the grammar that creates a POST tree, but
POST::Compiler is the compiler that transforms a POST tree to something
else.

So, when we name a particular stage, are we naming it by what it
produces, or naming it by the input it takes? When we visualize the
compiler stages, it's all about completed constructs: the parse tree,
the AST, the OST, the PIR source, but the code is all about the
transitions between the stages. How about we standardize around your
concept of naming the stage by what it consumes (i.e. "Perl 6
compiler"). That would give us:

Parsing stage: method named 'parse', grammar is (for example)
'Perl6::Grammar' output is a parse tree.

Parse tree stage: method named 'compile_parse_tree', compiler is
'ParseTree::Compiler', grammar is 'ParseTree::Grammar', output is an AST.

AST stage: method named 'compile_ast', compiler is 'AST::Compiler',
grammar is 'AST::Grammar', output is OST.

OST stage: method named 'compile_ost', compiler is 'OST::Compiler',
grammar is 'OST::Grammar', output is PIR.

PIR stage: simple method named 'run_pir' that compiles and runs PIR
code. (Could call the method 'compile_pir', it's more standard, but less
clear.)

---

For Punie, I'm thinking to standardize on:
Punie::Grammar
Punie::Compiler::ParseTree
Punie::Compiler::AST
Punie::Compiler::OST

(After adding the 'compile' or 'transform' method to TGE's generator so
we only need one class for each stage, instead of separate 'Compiler'
and 'Grammar' classes.) Or maybe 'Punie::Grammar' should be
'Punie::Compiler::Parser' instead. 'Punie::Compiler' would be a subclass
of HLLCompiler if Punie needed one, but it doesn't need one at this point.

Allison

Allison Randal

unread,
Dec 7, 2006, 1:33:45 AM12/7/06
to Patrick R. Michaud, Perl 6 Internals
Patrick R. Michaud wrote:
>
> But come to think of it, if we had something like Capture PMCs
> available as a standard type (and an easy way to generate them in
> PIR), then the existing :vtable('init') would be quite sufficient.
> To steal from Perl 6's C<< \(...) >> capture syntax:
>
> $P0 = new 'Foo::Bar', \(param1, param2, 'abc'=>param3)
>
> .sub 'init' :vtable .param pmc args # initialize self based on
> array/hash components of args pmc # ...

It's a reasonable solution. Have to think a bit more about the syntax
for creating them. We have talked about giving PIR some short-cut syntax
for creating data structures, as syntactic sugar for the basic 'push'
and keyed set operations. It hasn't come to anything yet, but this could
be tied into it. Largely, it's the fundamental question of "Is PIR an
assembly language, or an MLL for humans?" The answer is probably "Both."

We can avoid modifying PIR's fundamental syntax by requiring the
initializer argument to be created separately:

$P0 = new 'SigHash'
$P0.set(param1, param2, 'abc'=>param3)
$P1 = new 'Foo::Bar', $P0

But, that's one more step than what you're doing now:

$P0 = new 'Foo::Bar'

$P0.init(param1, param2, 'abc'=>param3)

An improvement might be through changes to the OO model:

$P0 = find_type 'Foo::Bar' # returns a class object
$P1 = $P0.new(param1, param2, 'abc'=>param3) # new is a class method

> My point is simply that it's far easier to go from a MLL (whatever
> syntax) to PIR method calls than to generate specific Parrot opcodes,
> because method calls have a very regular syntax that Parrot opcodes
> don't.

I would have disagreed a couple months ago, as opcodes were simpler to
generate in the old PAST/POST. But with the new implementation I agree.

>> - One more comment in this department: move PIR generation out of
>> the POST node objects. A tree-grammar that outputs PIR code strings
>> isn't a final solution, but it's a more maintainable intermediate
>> step than mingled syntax tree representation and code generation
>> (remember P6C?).
>
> I never really dealt with P6C. :-)

Lucky you. :) It was great in the early days, and allowed for rapid
prototyping, but it grew...um...organically.

> Still, I can see about moving the code generation out of the POST
> node objects; I may do it as a lower priority though, since I don't
> think that aspect is driving many design or implementation decisions
> for us at this point.

Yes, a lower priority is fine. I suspect that Pheme will drive the
development of POST, since the Pheme compiler will be working with
it directly, rather than treating it as an invisible background step.

>>>> - In PGE grammars, what is the "{ ... }" at the end of every
>>>> proto declaration supposed to do?
>> [...]
>>> But in the end, I didn't allow simple semicolon terminators
>>> simply because it wasn't valid Perl 6 syntax, and in many cases I
>>> think that having subtle differences isn't ideal as people may
>>> get confused about what is allowed where. But I don't have a
>>> large objection to modifying the PGE::Grammar compiler to
>>> represent empty declarations with semicolons as well as
>>> yada-yada-yada blocks.
>> Excellent.
>
> "Excellent" as in ...?
> [ ] "Go ahead and allow semicolons, since you don't have
> a large objection."
> [ ] "Your explanation is excellent, stick with the yadas
> to avoid the subtle contrasts to Perl 6."

I prefer option (A), allowing semicolons. The tricky thing is that we're
adopting syntax from one use case into another use case. The yadas make
perfect sense in the context of a Perl 6 program (where the yada means
that the code body will later be filled in), but they make no sense as
part of a Parrot parser (where the yada can't be filled in, and is just
an artifact).

Not an immediate priority, though. And, maybe Perl 6 will change and
solve the problem for us before we get there. ;)

Allison

Patrick R. Michaud

unread,
Dec 7, 2006, 9:14:23 AM12/7/06
to Allison Randal, Perl 6 Internals
On Wed, Dec 06, 2006 at 10:33:45PM -0800, Allison Randal wrote:
> >>>>- In PGE grammars, what is the "{ ... }" at the end of every
> >>>>proto declaration supposed to do?
> >>[...]
> >>>But in the end, I didn't allow simple semicolon terminators
> >>>simply because it wasn't valid Perl 6 syntax, and in many cases I
> >>>think that having subtle differences isn't ideal as people may
> >>>get confused about what is allowed where. But I don't have a
> >>>large objection to modifying the PGE::Grammar compiler to
> >>>represent empty declarations with semicolons as well as
> >>>yada-yada-yada blocks.
> >>Excellent.
> >
> >"Excellent" as in ...?
> > [ ] "Go ahead and allow semicolons, since you don't have
> > a large objection."
> > [ ] "Your explanation is excellent, stick with the yadas
> > to avoid the subtle contrasts to Perl 6."
>
> I prefer option (A), allowing semicolons. The tricky thing is that we're
> adopting syntax from one use case into another use case. The yadas make
> perfect sense in the context of a Perl 6 program (where the yada means
> that the code body will later be filled in), but they make no sense as
> part of a Parrot parser (where the yada can't be filled in, and is just
> an artifact).

IIUC, PGE's use of yada is actually the same use case as Perl 6.
The yadas in Perl 6 can be stubs to be filled in later, but S03
and S06 indicate that yadas are also used as the body in
function prototypes, i.e., where the function is actually to be
defined somewhere else. To me that feels exactly like what we have
here -- the grammar file is prototyping operator functions
that are defined somewhere else. (And, for several of the existing
compilers, they really *are* function prototypes, in that the function
body comes from a PIR function.)

> Not an immediate priority, though. And, maybe Perl 6 will change and
> solve the problem for us before we get there. ;)

Sounds good to me. It's an easy switch to allow the semicolons
when/if we decide to do that.

Pm

0 new messages