{ => } autocomposition

Autrijus Tang

unread,

Apr 20, 2005, 10:38:34 AM4/20/05

to perl6-l...@perl.org

In Pugs's t/pugsbugs/map_function_return_values.t, iblech added this test:

%ret = map { $_ => uc $_ }, split "", $text;

This fails because it is parsed, undef the {=>} autocomposition rule, into:

# Fails because arg1 is not Code
%ret = map(hash($_ => uc $_), split("", $text));

Instead of the intended:

# Works correctly
%ret = map(sub ($_) { $_ => uc $_ }, split("", $text));

Is this a known issue?

Thanks,
/Autrijus/

Ingo Blechschmidt

unread,

Apr 20, 2005, 11:08:53 AM4/20/05

to perl6-l...@perl.org

Hi,

Autrijus Tang wrote:
> %ret = map { $_ => uc $_ }, split "", $text;

[...]

I suppose my test is wrong.

When I clicked on reply a moment ago, I wanted to propose to change the
hash/code disambiguation rule, so that {...} is always parsed as Code
if the body contains "$_" or "$^...".

But as this change would break the following code, I think we should
leave it as it now.
my @AoH = map {
my $hashref = { $^a => ... };
do_something_with $hashref;
$hashref;
};

--Ingo

--
Linux, the choice of a GNU | Mr. Cole's Axiom: The sum of the
generation on a dual AMD | intelligence on the planet is a constant;
Athlon! | the population is growing.

Larry Wall

unread,

Apr 20, 2005, 11:51:24 AM4/20/05

to perl6-l...@perl.org

On Wed, Apr 20, 2005 at 05:08:53PM +0200, Ingo Blechschmidt wrote:
: Hi,

:
: Autrijus Tang wrote:
: > %ret = map { $_ => uc $_ }, split "", $text;
: [...]
:
: I suppose my test is wrong.
:
: When I clicked on reply a moment ago, I wanted to propose to change the
: hash/code disambiguation rule, so that {...} is always parsed as Code
: if the body contains "$_" or "$^...".
:
: But as this change would break the following code, I think we should
: leave it as it now.
: my @AoH = map {
: my $hashref = { $^a => ... };
: do_something_with $hashref;
: $hashref;

: };

Hmm, what we have said in the past is that the proper way to disambiguate
that is to require an explicit "return" if you mean the closure, but that
doesn't actually work unless there's an explicit "sub". I suppose the
rightest answer at this point is:

%ret = map { ($_ => uc $_) }, split "", $text;

since the => is no longer the top-level operator, but the parens are, if
you believe in parens as an operator and not just a grouper. Or

%ret = map { $_ => uc $_; }, split "", $text;

should presumably also work. That may look like an arbitrary amount
of lookahead, but I tried to define the hash/closure rule in terms of
a semantic analysis rule rather than a syntax rule, such that it's
always parsed as a closure, but at some point in semantic analysis
you can look at the AST and see that the top-level operator is =>,
and throw an implicit "new Hash: do" on the front of the closure,
or whatever operator it is that evaluates a closure for its pairs
and builds a hash from it.

I'd really like to save the overloading of {...} if we can DWTM in most
cases and at least produce decent errors in the other cases. We can
certainly produce a decent "Maybe you meant..." diagnostic on the
map thingy, even if it defaults counterintuitively in that case.

Larry

Larry Wall

unread,

Apr 20, 2005, 12:38:28 PM4/20/05

to perl6-l...@perl.org

On Thu, Apr 21, 2005 at 12:09:18AM +0800, Autrijus Tang wrote:
: Adding a special form Parens that takes one Exp and simply returns
: it is possible, but unless it serves to disambiguate other cases,
: that approach seems more heavy-handed to me.

As someone who is currently trying to write a perfect p5-to-p5 [sic]
translator, you have to somehow remember the parens (and whitespace
(and comments (and constant-folded subtrees))) to have a complete AST
representation of the original text. Or maybe it should be called
a CST instead of an AST in that case...

Though remembering doesn't have to be the default. But I'd like
to point out that adding such annotations to the AST after you've
optimized parts of the tree away is much more difficult than reserving
a spot for them in the first place. I speak from sad experience... :-)

Larry

Autrijus Tang

unread,

Apr 20, 2005, 12:09:18 PM4/20/05

to perl6-l...@perl.org

On Wed, Apr 20, 2005 at 08:51:24AM -0700, Larry Wall wrote:
> That may look like an arbitrary amount of lookahead, but I tried to
> define the hash/closure rule in terms of a semantic analysis rule
> rather than a syntax rule, such that it's always parsed as a closure,
> but at some point in semantic analysis you can look at the AST and see
> that the top-level operator is =>, and throw an implicit "new Hash:
> do" on the front of the closure, or whatever operator it is that
> evaluates a closure for its pairs and builds a hash from it.

Currently in Pugs, the analysis is done when the parser is done parsing a
block, i.e. when there is a Exp already formed:

-- Try to analyze Exp if the block is bare and without formal
-- arguments; extractHash merely looks at the toplevel OP, to
-- see if it matches (pair | =>) or ("," [(pair | =>), ...])
retBlock SubBlock Nothing exp
| Just hashExp <- extractHash exp = return \\{}" [hashExp]

> %ret = map { ($_ => uc $_) }, split "", $text;

> %ret = map { $_ => uc $_; }, split "", $text;

Sadly, both the grouping parens and the trailing semicolon are currently
ignored as semantically insignificant.

I think the latter suggestions works better, as we already make the
distinction between semicolon-separated Stmts (which contains multiple
Exps), and a simple Exp.

Adding a special form Parens that takes one Exp and simply returns
it is possible, but unless it serves to disambiguate other cases,
that approach seems more heavy-handed to me.

Thanks,
/Autrijus/

Autrijus Tang

unread,

Apr 20, 2005, 12:50:56 PM4/20/05

to perl6-l...@perl.org

On Wed, Apr 20, 2005 at 09:38:28AM -0700, Larry Wall wrote:
> As someone who is currently trying to write a perfect p5-to-p5 [sic]
> translator, you have to somehow remember the parens (and whitespace
> (and comments (and constant-folded subtrees))) to have a complete AST
> representation of the original text. Or maybe it should be called
> a CST instead of an AST in that case...

I see. Do you think preserving the /span/ (i.e. the character offset ranges)
of each AST element is enough to do that? That effectively means each
node points to a substring inside the original source string.

That way, p6-to-p6 translators in the future can, at worst, trigger a
reparse under the appropriate subrules level, and obtain the information
we optimized away.

Thanks,
/Autrijus/

Larry Wall

unread,

Apr 20, 2005, 1:24:27 PM4/20/05

to perl6-l...@perl.org

On Wed, Apr 20, 2005 at 10:21:32AM -0700, Larry Wall wrote:
: Except that you've probably thrown away the definition of "appropriate"
: by then as well. :-)

Well, maybe not, since you presumably need to keep track of the current
parser for eval.

Larry

Larry Wall

unread,

Apr 20, 2005, 1:21:32 PM4/20/05

to perl6-l...@perl.org

On Thu, Apr 21, 2005 at 12:50:56AM +0800, Autrijus Tang wrote:
: I see. Do you think preserving the /span/ (i.e. the character offset ranges)

: of each AST element is enough to do that? That effectively means each
: node points to a substring inside the original source string.

Yes, though making a copy isn't bad if it's not the default behavior.
It mostly just needs a "spot".

: That way, p6-to-p6 translators in the future can, at worst, trigger a

: reparse under the appropriate subrules level, and obtain the information
: we optimized away.

Except that you've probably thrown away the definition of "appropriate"

by then as well. :-)

p5-to-p5 actually intercepts the op_free() calls and has a place
for all the optimized trees. But that's not the default behavior.
By default it only costs one empty "madprops" pointer per op node.
(MAD stands for miscellaneous attribute decorations.) Only if we
set the "madskills" option do we generate a linked list of madprops.
Then you don't have to worry about the overhead in the general case,
and you also don't have to worry about your two parses getting out
of sync in the specific case. You just have to know in advance when
you want to remember the details, which I think is not too much of
a constraint.

If you don't want to worry about this in the Haskell version of the
compiler, that's okay, as long as we think about it more when we
bootstrap the Perl 6 version. Certainly if you're remembering
parens in some optional data structure, you can't depend on it
for ordinary disambiguation.

Larry

Autrijus Tang

unread,

Apr 20, 2005, 1:51:53 PM4/20/05

to perl6-l...@perl.org

On Wed, Apr 20, 2005 at 10:21:32AM -0700, Larry Wall wrote:

> On Thu, Apr 21, 2005 at 12:50:56AM +0800, Autrijus Tang wrote:
> : I see. Do you think preserving the /span/ (i.e. the character offset ranges)
> : of each AST element is enough to do that? That effectively means each
> : node points to a substring inside the original source string.
>
> Yes, though making a copy isn't bad if it's not the default behavior.
> It mostly just needs a "spot".

Okay. I'll make it happen then. Is this something we want to make
available at the runtime language level, in addition to the compiler
error/warnings level?

> p5-to-p5 actually intercepts the op_free() calls and has a place
> for all the optimized trees. But that's not the default behavior.
> By default it only costs one empty "madprops" pointer per op node.
> (MAD stands for miscellaneous attribute decorations.) Only if we
> set the "madskills" option do we generate a linked list of madprops.

Wow, nice trick. Mad props to you. :)

> Then you don't have to worry about the overhead in the general case,
> and you also don't have to worry about your two parses getting out
> of sync in the specific case. You just have to know in advance when
> you want to remember the details, which I think is not too much of
> a constraint.

As you noted in the followup mail, I think the compiler needs to
remember the active subrule and source code range anyway, so that
performance penalty is already there.

> If you don't want to worry about this in the Haskell version of the
> compiler, that's okay, as long as we think about it more when we
> bootstrap the Perl 6 version.

I do intend to bootstrap the Perl 6 version by simply compiling
the Pugs source code from Haskell into Perl6 AST, then decompile
it back into Perl 6 code. :-)

That is, unless someone crazy enough comes around and recoded
everything in Perl 6 by hand...

Thanks,
/Autrijus/

Darren Duncan

unread,

Apr 20, 2005, 2:40:32 PM4/20/05

to perl6-l...@perl.org

A clear way to disambiguate a block from a hash-ref when using
map/grep/sort etc is to use a colon before the leading brace for a
block rather than a space, like this:

map:{ $_ => uc $_ }

I read that in the synopsis documents a month back, though I'm having
a bit of trouble finding the reference now. Maybe it has something
to do with adverbs? But I do explicitly remember it being used to
disambiguate.

In any event, I used that form exclusively with my Perl 6 ports to date.

So in that case, the test is wrong.

-- Darren Duncan

Juerd

unread,

Apr 20, 2005, 2:43:14 PM4/20/05

to Darren Duncan, perl6-l...@perl.org

Darren Duncan skribis 2005-04-20 11:40 (-0700):

> A clear way to disambiguate a block from a hash-ref when using
> map/grep/sort etc is to use a colon before the leading brace for a
> block rather than a space, like this:
> map:{ $_ => uc $_ }

I think the best disambiguators for hash/sub interpretation are "hash"
and "sub", even though sub is a little longer than a colon.

Juerd
--
http://convolution.nl/maak_juerd_blij.html
http://convolution.nl/make_juerd_happy.html
http://convolution.nl/gajigu_juerd_n.html

Darren Duncan

unread,

Apr 20, 2005, 2:52:17 PM4/20/05

to perl6-l...@perl.org

At 8:43 PM +0200 4/20/05, Juerd wrote:
>Darren Duncan skribis 2005-04-20 11:40 (-0700):
>> A clear way to disambiguate a block from a hash-ref when using
>> map/grep/sort etc is to use a colon before the leading brace for a
>> block rather than a space, like this:
>> map:{ $_ => uc $_ }
>
>I think the best disambiguators for hash/sub interpretation are "hash"
>and "sub", even though sub is a little longer than a colon.
>Juerd

In this case, I seem to recall reading that the problem was with the
space, and the colon replaces the space.

The problem is that, with 'sort' for example, the block is optional,
and I think it is like an adverb.

For example, this leaves it out:

@bar = sort @foo;

In this case, it is kept:

@bar = sort:{ $a <=> $b } @foo;

In both cases, @foo is the argument, and not the block. If you do
'sub' or etc, that would make this look like a code argument, which
it isn't.

If you did this:

@bar = sort { ... } @foo;

Then the {} would look like a first argument which is a hash ref.

The general principle of using the ':' was meant to work with any
kind of list operator usage, to say how it does its thing, not with
what.

-- Darren Duncan