Re: Quoting interpolation in Perl5 vs. Perl6

14 views
Skip to first unread message

Juerd

unread,
Apr 16, 2005, 11:32:38 AM4/16/05
to Roie Marianer, perl6-l...@perl.org
Roie Marianer skribis 2005-04-16 18:28 (+0300):
> My suggestion is to check for delimiters only when it's ambiguous: Inside a
> variable name (qq x$varxy -> "$var"y), and at the beginning of every
> subscript of a scalar, and every subscript after the first one of an array,
> hash of sub (because in these cases the first subscript is mandatory).
> [...]
> both mean 'a'. If the Perl 5 behaviour is maintained, then there is no chance
> at all of ever mistaking a subscript for a closing delimiter, which makes the
> only special case qq x$varxy.

I wouldn't mind at all if alphanumeric delimiters should go. I have
never seen them used in serious programming, and if they present a
problem with natural parsing, then why keep them around?

Obfuscation is nice, but let's not design the language around that.


Juerd
--
http://convolution.nl/maak_juerd_blij.html
http://convolution.nl/make_juerd_happy.html
http://convolution.nl/gajigu_juerd_n.html

Roie Marianer

unread,
Apr 16, 2005, 11:28:37 AM4/16/05
to perl6-l...@perl.org
Hi all.

I'm trying to get quoting interpolation to work, which means I first have to
understand it a little better.

In Perl 5, as far as I can see, the delimiter of quoting constructs (whether
it's "", '' or qq <delim>) is searched for before the string is parsed. This
means that, for example,
"%hash{"string value"}"
parses as
"%hash{"
which is a syntax error.

Current behaviour in Pugs is to read anything that interpolates until its
logical conclusion, so that for example
"{"a"}"
is 'a', and not a syntax error. I think this behaviour is more useful than the
old behaviour when it's not ambiguous. (By the way, this behaviour is my
fault, so it's not authoritative in any way)

My suggestion is to check for delimiters only when it's ambiguous: Inside a
variable name (qq x$varxy -> "$var"y), and at the beginning of every
subscript of a scalar, and every subscript after the first one of an array,
hash of sub (because in these cases the first subscript is mandatory).

By the way, maybe it's not even an issue: Is it possible for the closing
delimiter of a string to be an opening bracket or brace? In Perl 5, if a
quote is opened by a closing brace it must be closed by the same closing
brace, while in current Pugs an opening closing brace is closed by a closing
opening brace:
Perl5: q]a] Pugs: q]a[


both mean 'a'. If the Perl 5 behaviour is maintained, then there is no chance
at all of ever mistaking a subscript for a closing delimiter, which makes the
only special case qq x$varxy.

Sorry for being long-winded; does this make any sense at all?
--
-Roie
v2sw6+7CPhw5ln5pr4/6$ck2ma8+9u7/8LSw2l6Fi2e2+8t4TNDSb8/4Aen4+7g5Za22p7/8
[ http://www.hackerkey.com ]

Larry Wall

unread,
Apr 16, 2005, 2:30:49 PM4/16/05
to perl6-l...@perl.org
On Sat, Apr 16, 2005 at 06:28:37PM +0300, Roie Marianer wrote:
: Hi all.

:
: I'm trying to get quoting interpolation to work, which means I first have to
: understand it a little better.
:
: In Perl 5, as far as I can see, the delimiter of quoting constructs (whether
: it's "", '' or qq <delim>) is searched for before the string is parsed. This
: means that, for example,
: "%hash{"string value"}"
: parses as
: "%hash{"
: which is a syntax error.
:
: Current behaviour in Pugs is to read anything that interpolates until its
: logical conclusion, so that for example
: "{"a"}"
: is 'a', and not a syntax error. I think this behaviour is more useful than the
: old behaviour when it's not ambiguous. (By the way, this behaviour is my
: fault, so it's not authoritative in any way)

Please rest assured that that behavior, is, in fact, mandated.

: My suggestion is to check for delimiters only when it's ambiguous: Inside a

: variable name (qq x$varxy -> "$var"y), and at the beginning of every
: subscript of a scalar, and every subscript after the first one of an array,
: hash of sub (because in these cases the first subscript is mandatory).

The basic rule of thumb is that we pretend we're a top-down parser
even if we aren't, and we only look for the trailing delimiter when
we're not trying to parse something embedded that would naturally
slurp up the trailing delimiter as part of the internal construct.
Certainly any kind of bracketing structure hides anything inside it
from the delimiter scanner, but so do tokens like identifiers.

: By the way, maybe it's not even an issue: Is it possible for the closing

: delimiter of a string to be an opening bracket or brace?

Nope, if you open with an opener it only looks for the closer.

: In Perl 5, if a

: quote is opened by a closing brace it must be closed by the same closing
: brace, while in current Pugs an opening closing brace is closed by a closing
: opening brace:
: Perl5: q]a] Pugs: q]a[
: both mean 'a'. If the Perl 5 behaviour is maintained, then there is no chance
: at all of ever mistaking a subscript for a closing delimiter, which makes the
: only special case qq x$varxy.

I think I would prefer the Perl 5 behavior here, or maybe we should
simply disallow closers as openers. As Juerd points out, it really
makes little sense to allow alphanumerics either, especially now that
we «allow» 「any」 『Unicode』 〔brackets〕 【to】 《be》 〖used〗.

And inside-out brackets are just going to drive highlighters absolutely
bonkers, though arguably the same could be said for q]a], only more so.

: Sorry for being long-winded; does this make any sense at all?

Yep, it makes any sense at all.

Larry

Larry Wall

unread,
Apr 16, 2005, 3:10:48 PM4/16/05
to perl6-l...@perl.org
On Sat, Apr 16, 2005 at 11:30:49AM -0700, Larry Wall wrote:
: The basic rule of thumb is that we pretend we're a top-down parser

: even if we aren't, and we only look for the trailing delimiter when
: we're not trying to parse something embedded that would naturally
: slurp up the trailing delimiter as part of the internal construct.
: Certainly any kind of bracketing structure hides anything inside it
: from the delimiter scanner, but so do tokens like identifiers.

I think I have to clarify what I mean by that last phrase. Trailing
delimiters are hidden inside any token that has already been started,
but not at the start of a token (where token is taken to be fairly
restrictive). Therefore these are errors:

qq. $foo.bar() .
qq: @foo::bar[] :

However

qq/ &foobar( $a / $b ) /

is just fine, since (...) is looking for its own termination.
Basically we don't have to keep track of sets of terminators (unless
we want to use that info after a syntax error to make hypotheses and
explore alternate realities in the service of better error messages).

Given our plan of a hybrid parser with a bottom-up operator precedence
parser sandwiched between top-down parsers, and assuming that "."
is the tightest operator that the bottom-up expression parser treats as
an operator, it more or less comes down to the fact that anything the
expression parser pulls in as a single term is going to be treated
as a construct that ignores any outer delimiters because it's calling
out to a lower-level top-down parser at that point to parse the term
in question.

Hmm, I guess there's still a little ambiguity in there in the case of
lookahead. And the fact is, a construct like

qq. $foo.bar() .

either has to do some lookhead or some backtracking to determine that
the entire interpolated expression ends with a bracketed construct,
since we've said that

" $foo.bar() "

interpolates $foo.bar(), while

" $foo.bar "

interpolates only $foo. (With similar constraints on array and hash
interpolation.) So it's possible that

qq. $foo.bar() .

could parse okay if we treat the () as a terminator that some grammatical
construct is looking ahead for. But given that $foo is the one interpolator
that doesn't require trailing brackets, it seems like it's terribly
ambigous in this case. However, only dot has that problem, and with

qq: @foo::bar[] :

you know it requires the [] to interpolate at all. So I guess this is one
of those we can argue both ways. The chance of someone writing

qq:@foo::bar[]

when they mean

qq:@foo: :bar[]

seems fairly remote. So my best guess at this point is that we should
let the interpolative lookahead hide the trailing delimiter also, and
that is probably what the user expects in any event, since when they
were writing the expression, the nearby context is the preceding
term, but the distant context is the delimiter, which they've probably
just forgotten is potentially ambiguous. So let's just resolve it
that way without telling them.

I guess this is the one place we're requiring arbitrarily long lookahead
to figure things out, since we interpolate

" @foo::bar::baz::fee::fie::foe[] "

but not

" @foo::bar::baz::fee::fie::foe "

under the current rules. I think the lookahead doesn't have to parse
past the [ (or other opener), though. All it has to decide is whether
the next : (or dot) is to be treated as part of the interpolation. So
this is a syntax error (of the runaway "" variety, presumably):

" @foo::bar::baz::fee::fie::foe[ "

Larry

Roie Marianer

unread,
Apr 16, 2005, 3:16:43 PM4/16/05
to perl6-l...@perl.org
On Saturday 16 April 2005 10:10 pm, Larry Wall wrote:
> So
> this is a syntax error (of the runaway "" variety, presumably):
>
> " @foo::bar::baz::fee::fie::foe[ "
I was with you until that. What about
" @foo::bar::baz::fee::fie::foe[ "1" ] "
Isn't that a valid index into the array? Or is that just true with hashes?

Larry Wall

unread,
Apr 16, 2005, 3:54:43 PM4/16/05
to perl6-l...@perl.org
On Sat, Apr 16, 2005 at 10:16:43PM +0300, Roie Marianer wrote:

: On Saturday 16 April 2005 10:10 pm, Larry Wall wrote:
: > So
: > this is a syntax error (of the runaway "" variety, presumably):
: >
: > " @foo::bar::baz::fee::fie::foe[ "
: I was with you until that. What about
: " @foo::bar::baz::fee::fie::foe[ "1" ] "
: Isn't that a valid index into the array? Or is that just true with hashes?

No, you're right--I was just talking about the case where the user
actually writes an unmatched bracket.

Larry

Autrijus Tang

unread,
Apr 18, 2005, 1:31:12 PM4/18/05
to perl6-l...@perl.org
On Sat, Apr 16, 2005 at 12:10:48PM -0700, Larry Wall wrote:
> I think I have to clarify what I mean by that last phrase. Trailing
> delimiters are hidden inside any token that has already been started,
> but not at the start of a token (where token is taken to be fairly
> restrictive). Therefore these are errors:
>
> qq. $foo.bar() .
> qq: @foo::bar[] :
>
> However
>
> qq/ &foobar( $a / $b ) /
>
> is just fine, since (...) is looking for its own termination.

Consider this:

rx/abc$/
qq/abc$/

After roie's refactoring, both now breaks, whilst in Perl 5, only
the latter break -- qr/abc$/ is just fine. Is it something we need
to special-case for rx?

Thanks,
/Autrijus/

Larry Wall

unread,
Apr 18, 2005, 3:08:41 PM4/18/05
to perl6-l...@perl.org
On Tue, Apr 19, 2005 at 01:31:12AM +0800, Autrijus Tang wrote:

Certainly. rx// does not do any kind of interpolation any more. It is
a language of its own, and $ is just a token in that language. On the
other hand, like Perl 5, that language does have variables in addition
to $, so we still have to distinguish whether the character after the $
indicates a variable. Perl 5's rule was that $ meant "end-of-whatever"
if it was followed by ), | or the end of the interpolated string.
(Or by # or whitespace in /x mode.) Otherwise it's a variable.
Since we're parsing left-to-right, we can't do exactly the same, but
I suspect we can check after the $ for ), ], |, #, whitespace, or the
terminator, which rules out direct use of $/ inside /.../. That's not
a great hardship, since we have the $1 and $<foo> shortcuts for
backrefs, and anything fancier probably wants to be in {...} anyway.

As for qq/abc$/, I think it's okay for that to notice that you're trying
to interpolate a variable that has the same name as the delimiter, and
blow up immediately. While we could let people interpolate such
variables by default, it's probably better to stop and make people
clarify what the mean in that case. There aren't that many puncuational
variables any more. Certainly we don't have $' and $" anymore, which
would be the usual ambiguous cases for normal quotes.

And yes, this is pretty much the opposite of what I said about

" @foo.bar.baz[] "

But in either case we're just trying to figure out what the user expects.
Or doesn't expect, in the case of qq/abc$/.

Larry

Roie Marianer

unread,
Apr 18, 2005, 5:02:45 PM4/18/05
to perl6-l...@perl.org
LW = Larry Wall
AT = Autrijus Tang
LW> I think I have to clarify what I mean by that last phrase. Trailing
LW> delimiters are hidden inside any token that has already been started,
LW> but not at the start of a token (where token is taken to be fairly
LW> restrictive).

AT> Consider this:
AT>
AT> rx/abc$/
AT> qq/abc$/

AT> After roie's refactoring, both now breaks, whilst in Perl 5, only

Actually, it wasn't due to my refactoring. It was because I tried to implement
the rule above, which meant getting rid of the special case (which was
present in qqInterpolatorVar) of a variable whose name _ended_ with the
delimiter.

AT> the latter break -- qr/abc$/ is just fine. Is it something we need
AT> to special-case for rx?

LW> Certainly. rx// does not do any kind of interpolation any more. It is
LW> a language of its own, and $ is just a token in that language.
But rx:P5// should act like qr//, shouldn't it?

LW> I suspect we can check after the $ for ), ], |, #, whitespace, or the
LW> terminator, which rules out direct use of $/ inside /.../.
I'll add a flag for that in rx:P5. In any case, I suspect that the code to
parse rx which is not :P5 will be completely different from what we have now
(rx:P5 is basically a not-so-glorified qq:b(0))

LW> As for qq/abc$/, I think it's okay for that to notice that you're trying
LW> to interpolate a variable that has the same name as the delimiter, and
LW> blow up immediately.
Makes sense, but what exactly do you mean by "blow up"?

Larry Wall

unread,
Apr 18, 2005, 7:08:13 PM4/18/05
to perl6-l...@perl.org
On Tue, Apr 19, 2005 at 12:02:45AM +0300, Roie Marianer wrote:
: But rx:P5// should act like qr//, shouldn't it?

Yes.

: LW> I suspect we can check after the $ for ), ], |, #, whitespace, or the


: LW> terminator, which rules out direct use of $/ inside /.../.
: I'll add a flag for that in rx:P5. In any case, I suspect that the code to
: parse rx which is not :P5 will be completely different from what we have now
: (rx:P5 is basically a not-so-glorified qq:b(0))

More or less. For a number of the backslashes it doesn't matter whether
they get interpolated in the first pass or the second, though you need
to be really careful with things that regexen treat differently, and
especially you mustn't lose track of your backwhacked backwhacks. It
gets pretty messy, which is one of the reasons Perl 6 does it differently.

: LW> As for qq/abc$/, I think it's okay for that to notice that you're trying


: LW> to interpolate a variable that has the same name as the delimiter, and
: LW> blow up immediately.
: Makes sense, but what exactly do you mean by "blow up"?

I suppose that depends on whether you're programming a cruise missile
or not. For most purposes, an appropriate parse failure message
would suffice.

Larry

Reply all
Reply to author
Forward
0 new messages