Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Mutating methods

30 views
Skip to first unread message

Juerd

unread,
Mar 10, 2004, 11:39:33 AM3/10/04
to perl6-l...@perl.org
Perlists,

In Perl 5, lc, lcfirst, quotemeta, uc and ucfirst don't mutate.
chomp and chop do mutate.

I imagine these will all be methods in Perl 6:

$foo.lc
$foo.quotemeta
$foo.chomp

I'd like a mutating version of lc, and a non-mutating version of chomp.
With some nice syntax, if possible.

If there isn't already such a thing in the making, I hereby suggest to
re-introduce C<.=>, to mean more or less the same as in Perl 5, but with
Perl 6's version of the C<.> operator.

In other words: C<$foo.lc> would not mutate and C<$foo.=lc> would.

$foo += 5 ===> $foo = $foo + 5
$foo.=lc ===> $foo = $foo.lc

Makes sense to me.

Especially for C<sort> it would be nice for something like this:

@foo.sort # returns sorted copy

versus

@foo.=sort # sorts inline

I think this syntax reads better than Ruby's exclamation point
(foo.method!), because of the analogy with other mutating operators.

Please excuse me if this or something like this has already been taken
care of - I've searched for messages about it, but haven't found
anything.


Regards,

Juerd

Luke Palmer

unread,
Mar 10, 2004, 2:01:59 PM3/10/04
to Juerd, perl6-l...@perl.org
Juerd writes:
> Perlists,
>
> In Perl 5, lc, lcfirst, quotemeta, uc and ucfirst don't mutate.
> chomp and chop do mutate.
>
> I imagine these will all be methods in Perl 6:
>
> $foo.lc
> $foo.quotemeta
> $foo.chomp
>
> I'd like a mutating version of lc, and a non-mutating version of chomp.
> With some nice syntax, if possible.
>
> If there isn't already such a thing in the making, I hereby suggest to
> re-introduce C<.=>, to mean more or less the same as in Perl 5, but with
> Perl 6's version of the C<.> operator.

I believe this has been discussed before, and that people generally
liked it. Maybe Larry even did (I seem to recall him saying something
positive about it -- but don't think he did just because I said so :-).
It seems likely that this will go in, though.

I'm in the mood for an exercise, so, here's how you implement it.

Let's say we have the Perl grammar:

grammar Perl
{
# ...
has %.assignment_ops is protected;
rule assignment_expression {
<assignment_lhs>
$op := (%.assignment_ops.keys())
$value := <%.assignment_ops{$op}{rule}>
{
$0 := %.assignment_ops{$op}{transform}($0)
if %.assignment_ops{$op}{transform}
}
}

rule method_call {
<term> <'.'> <method_name>
}
}

Or something. I'm just pulling that out of my ear.

Then we'll derive our own grammar with the C<.=> operator in, and plunk
it into Perl's parser.

grammar DotEqualsPerl is Perl
{
submethod BUILD() {
.assignment_ops{'.='} = {
rule => /<method_name>/,
transform => {

Perl::assignment_expression.new(
lhs => .{assignment_lhs},
rhs => Perl::method_call.new(
term => .{assignment_lhs},
method => .{value},
)
)

},
};
}
}

Finally, hooking it into Perl.

use grammar DotEqualsPerl;

Again, this is quite presumptuous about the workings of the Perl::*
classes. The Perl grammar will have to be extremely well documented.

The reason we couldn't just decalre it with C<infix:.=> is because its
right hand side is not a usual expression. That is:

$foo + bar;

Won't parse unless C<bar> is a declared sub, whereas:

$foo.bar;

Will always parse.

Luke

Brent "Dax" Royal-Gordon

unread,
Mar 10, 2004, 2:19:52 PM3/10/04
to Luke Palmer, perl6-l...@perl.org
Luke Palmer wrote:
> The reason we couldn't just decalre it with C<infix:.=> is because its
> right hand side is not a usual expression.

Isn't that what macros are for?

macro infix:.= ($lhs, $rhs) is parsed(/<method_name>/) {
return Perl::assignment_expression.new(
lhs => $lhs,
rhs => Perl::method_call.new(
term => $lhs,
method => $rhs,
)
);
}

TMTOWTDI, I suppose...

--
Brent "Dax" Royal-Gordon <br...@brentdax.com>
Perl and Parrot hacker

Oceania has always been at war with Eastasia.

Larry Wall

unread,
Mar 10, 2004, 2:22:05 PM3/10/04
to perl6-l...@perl.org
On Wed, Mar 10, 2004 at 05:39:33PM +0100, Juerd wrote:
: Perlists,

:
: In Perl 5, lc, lcfirst, quotemeta, uc and ucfirst don't mutate.
: chomp and chop do mutate.
:
: I imagine these will all be methods in Perl 6:
:
: $foo.lc
: $foo.quotemeta
: $foo.chomp
:
: I'd like a mutating version of lc, and a non-mutating version of chomp.
: With some nice syntax, if possible.
:
: If there isn't already such a thing in the making, I hereby suggest to
: re-introduce C<.=>, to mean more or less the same as in Perl 5, but with
: Perl 6's version of the C<.> operator.

Except that C<.> isn't really a binary operator...

On the other hand, draft -1 of A12 has a conjectural

my Dog $dog .= new()

in it, and that's even further out there, since the .new on the right
would in fact be called on a $dog that is undefined!

: In other words: C<$foo.lc> would not mutate and C<$foo.=lc> would.


:
: $foo += 5 ===> $foo = $foo + 5
: $foo.=lc ===> $foo = $foo.lc
:
: Makes sense to me.

Yes, but the fact that you had to change the spacing bothers me.

: Especially for C<sort> it would be nice for something like this:
:
: @foo.sort # returns sorted copy
:
: versus
:
: @foo.=sort # sorts inline
:
: I think this syntax reads better than Ruby's exclamation point
: (foo.method!), because of the analogy with other mutating operators.

Well, I'd like to reserve postfix:! for factorial in any event. :-)

The basic problem with .= shows up when you do put the spaces in:

@foo .= sort()

That makes it look as though sort is a subroutine, and it's not.
That's a direct result of the fact that C<.> is not really a binary
operator. Rather, it's a kind of "operator sigil" that introduces
a unary postfix operator. Method calls are really unary postfix
operators that happen to be able to take extra arguments.

And because the op= syntax is really built for binary operators, it
doesn't totally work for unary operators. Take another unary postfix
operator, for instance, an array subscript:

@array[$x]

you can't just up and say

@array[=$x]

to mean

@array = @array[$x]

to turn it into a mutating operator, because the [$x] wants to
function as a unit, and the = breaks that up. Similarly, ".sort"
wants to function as a unit, but the = breaks that up, visually and
semantically.

However, having said all that, it turns out that A12 will also introduce
other "dot" variants:

$obj.?method # call method if exists, or return undef (0 or 1)
$obj.*method # call all base class methods of that name (0 or more)
$obj.+method # call all base class methods of that name (1 or more)

So a .=method syntax is not so farfetched. It analogies to +=, but it's
really still just a prefix to a unary postfix operator, like the other
dot variants. The interesting question with all of these "dots" is
where spaces are allowed or disallowed. One could make a case that
people will want to write

my Car $obj .= new()

rather than being forced to write

my Car $obj .=new()

It could even be argued that, in the case of this particular operator,
the = is functioning as part of the name, as the ! does in Ruby. So
if (hypothetically) we allow a space after the ordinary dot

@array . sort

then we could also allow things like:

@array . =sort
$obj . *initialize
$obj . ?maybe

But I dislike

$variable . meth()

for the same reason I dislike

$variable .= meth()

because it makes meth look like a subroutine call when it isn't. Regardless
of how fancy they get, method calls are still postfix operators. So I'm
inclined to say that the space is only optional before the dot, and you have
to say

@array .sort
@array .=sort
$obj .*initialize
$obj .?maybe

But that still makes

my Cat $tom .=new()

pretty ugly. Unfortunately we can't just use topicalization to say

my Cat $tom = .new()

because most people won't expect simple assignment to break their
current topic.

So another option is to replace = with something that I<does> set the
topic for the right side. If we used .= for that, then you'd have
to write

@array .= .sort
my Cat $tom .= .new()

Doubtless the first would get shortened to

@array.=.sort

That does admit to constructs like

$foo .= .*bar

which would assign $foo a list of all the return values of $foo.*bar, or

$foo .= .?maybe

which would presumably replace $foo with an undefined value if
it couldn't find $foo.maybe. Those don't seem terribly useful as
mutators though. They'd be much clearer written out long.

Another approach would be to have some kind of "microtopic" that
represented the left side of an ordinary assignment. Suppose for
the sake of argument that the microtopic is ^. Then you could write

@array = ^.sort;

and a constructor would be

my Kanga $roo = ^.new()

But that introduces a new concept that doesn't really generalize well.
So forget that.

Yet another approach is to *replace* dot with something that mutates:

@array!sort
@array?sort

Either of those would work syntactically in that case, since neither !
nor ? is expected as a binary operator. However, the unary cases
don't work:

!sort means "not sort"
?sort means "did sort work?"

We could prefix *those* with a dot, but .? is already taken. That leaves

@array.!sort

which is sort of inside-out Ruby. But then the constructor doesn't
read so well:

my Dino $dinah is fossilized .!new()

Constructors really, really want = for visual reasons...

We could do something *really* crazy and say that if the assignment
operator is immediately followed by a dot, that dot topicalizes to
the left side of the equals rather than the current topic. That
gives us

@array=.sort

and

my Dino $dinah is fossilized = .new()

That would mean that if you really wanted the current topic, you'd have
to say things like

@array = $_.sort
my Dino $dinah is fossilized = (.new())

I think that would be a bad thing to do to the = operator.

Or we could introduce an =. operator. In which case the mutators
look like

@array=.sort
my Dino $dinah is fossilized =.new()

and the regular $_ topicalized ones look like

@array = .sort
my Dino $dinah is fossilized = .new()

But the problem is that those look far too much alike.

So I think we're left with .= as an analog of .* and .?:

@array .=sort
my Dino $dinah is fossilized .=new()

Perhaps it's not too bad to allow spaces after the compound dot operators:

@array . sort
@array .= sort
$obj .? maybe
$obj .* initialize
$obj .+ initialize

After all, the computer won't get confused that a methodname is required
next. Except that we also have to figure out what these mean, if anything:

$obj .$x # indirect method name like Perl 5?
$obj .=$x # indirect mutating operator
$obj .?$x # indirect optional method
$obj .*$x # indirect all method
$obj .+$x # indirect one or more method

&obj .($x) # sub call on code reference
&obj .=($x) # ???
&obj .?($x) # ???
&obj .*($x) # ???
&obj .+($x) # ???

@obj .[$x] # subscript on array reference
@obj .=[$x] # replace array with slice of array
@obj .?[$x] # ???
@obj .*[$x] # ???
@obj .+[$x] # ???

%obj .{$x} # subscript on hash reference
%obj .={$x} # replace hash with slice of hash
%obj .?{$x} # ???
%obj .*{$x} # ???
%obj .+{$x} # ???

and whether a space is allowed after those dotty operators. The whole
reason for .(), .[] and .{} in the first place was to make them look
like postfix ops rather than terms. Allowing space after the dot works
against that. I suspect all the ones marked ??? are simply disallowed
in any event. So we could probably get away with allowing space after .=
as a special case. But maybe we want to discourage that too.

: Please excuse me if this or something like this has already been taken


: care of - I've searched for messages about it, but haven't found
: anything.

It was discussed a long time ago, but nothing substantial came of it.
It certainly needs to be nailed down for A12 though.

Larry

Larry Wall

unread,
Mar 10, 2004, 2:35:47 PM3/10/04
to perl6-l...@perl.org
On Wed, Mar 10, 2004 at 11:19:52AM -0800, Brent Dax Royal-Gordon wrote:
: Luke Palmer wrote:
: >The reason we couldn't just decalre it with C<infix:.=> is because its
: >right hand side is not a usual expression.
:
: Isn't that what macros are for?
:
: macro infix:.= ($lhs, $rhs) is parsed(/<method_name>/) {

Methods are really postfix operators, so that would probably be
something more like:

macro postfix:.= ($lhs, $parsetree)
is parsed(/<ws>? <?method_name> <?method_args>/) {

That's presuming we allow whitespace after the . and .= ops.

(Also, these days you have to say <?foo> to collect the results into $0.)

Larry

Luke Palmer

unread,
Mar 10, 2004, 2:42:00 PM3/10/04
to perl6-l...@perl.org

Hooray! That was something I had been worried about.

But C<?> doesn't seem to fit visually. What's "questionable" about
that?

I can think of a couple that I like better:

<^foo>
<*foo>

<^foo> is my favorite at the moment (even though <*foo> is more
visually pleasing), because it looks like it's transferring the
information ^up^ in the parse tree.

Luke

Damian Conway

unread,
Mar 10, 2004, 2:57:30 PM3/10/04
to perl6-l...@perl.org
Luke Palmer wrote:

> Hooray! That was something I had been worried about.
>
> But C<?> doesn't seem to fit visually. What's "questionable" about
> that?

Nothing questionable, but
everything hypothetical:

<?foo> captures to the
$?foo hypothetical variable


Damian

Larry Wall

unread,
Mar 10, 2004, 2:57:35 PM3/10/04
to perl6-l...@perl.org
On Wed, Mar 10, 2004 at 12:42:00PM -0700, Luke Palmer wrote:
: > (Also, these days you have to say <?foo> to collect the results into $0.)

:
: Hooray! That was something I had been worried about.
:
: But C<?> doesn't seem to fit visually. What's "questionable" about
: that?

It's questionable insofar as it's hypothetical. It maps to $?foo,
which is the name of the (current value of the) capture within any
interior closure:

/<?foo> { say "Guessing $?foo for the moment..." <bar> } /

: I can think of a couple that I like better:


:
: <^foo>
: <*foo>
:
: <^foo> is my favorite at the moment (even though <*foo> is more
: visually pleasing), because it looks like it's transferring the
: information ^up^ in the parse tree.

But $^foo and $*foo mean very different things from hypotheticals.

And in a real sense $?foo is passing guessed information *down*
the match. The guesses only turn out "right" if you get all the way
to the bottom successfully. (That's from the point of view that
you recurse deeper to check anything to the right in a regex, even
when syntactically it's shallower.)

Larry

Luke Palmer

unread,
Mar 10, 2004, 3:32:30 PM3/10/04
to perl6-l...@perl.org

Hmm... that makes sense.

It doesn't feel right, though. After all, we don't say:

($minutes, $seconds) = m/ (? \d\d ) : (? \d\d ) /;

Even though they only stay matched if they get to the end without
backtracking. Capturing (this is really just a clever notation for
captures) is usually about communicating information I<outside> of the
match: to the parent rule, to the the calling scope.

As you showed in your reply about C<.=>:

macro postfix:.= ($lhs, $parsetree)
is parsed(/<ws>? <?method_name> <?method_args>/)

{ ... }

There's nothing about C<?> that makes me think that these are being
stored.

I understand the association with C<$?foo>. But most of the time, when
I'm writing a grammar, I'm catching these rules in order to stick them
in the parse tree, not to do tests on them later on in the rule. The
very essence of rules is hypotheticality, where nothing is permanent
until it gets to the end. I don't think we need a special marker that
says "these do that, too."

Luke

Damian Conway

unread,
Mar 10, 2004, 3:51:05 PM3/10/04
to perl6-l...@perl.org
Luke Palmer wrote:

> I understand the association with C<$?foo>. But most of the time, when
> I'm writing a grammar, I'm catching these rules in order to stick them
> in the parse tree, not to do tests on them later on in the rule. The
> very essence of rules is hypotheticality, where nothing is permanent
> until it gets to the end. I don't think we need a special marker that
> says "these do that, too."

We need the marker to distinguish between hypothetical captures to internal
variables:

/ $?foo:=(abc) $?bar:=(def) /

and non-hypothetical captures to external variable:

/ $foo:=(abc) $bar:=(def) /

And since subrules that capture always capture to hypotheticals, we need the
same marker there.

Damian

Brent "Dax" Royal-Gordon

unread,
Mar 10, 2004, 4:44:18 PM3/10/04
to Damian Conway, perl6-l...@perl.org
Damian Conway wrote:
> / $foo:=(abc) $bar:=(def) /

Am I misreading, or are you suggesting that $foo may contain 'abc' after
running this example, even if the match wasn't successful?

Damian Conway

unread,
Mar 10, 2004, 9:48:03 PM3/10/04
to perl6-l...@perl.org
Brent "Dax" Royal-Gordon wrote:

>> / $foo:=(abc) $bar:=(def) /
>
> Am I misreading, or are you suggesting that $foo may contain 'abc' after
> running this example, even if the match wasn't successful?

No. I re-checked with Larry this morning and he confirmed that all bindings in
rules only "stick" if the rule as a whole succeeds.

What I was trying (obviously rather ineptly ;-) to point out is that we have
to be able to differentiate between the the match object's own internal
hypothetical variables ($?foo, $?bar, @?baz) and any
external-but-temporarily-hypothesized variables ($foo, $bar, @baz).

The syntax we've chosen to do that requires the use of "?" as a secondary
sigil on internal variables. So, since named subrules that capture always
capture to internal variables, it's natural and consistent to use "?" to
indicate capturing subrules as well.

Damian

Matt

unread,
Mar 10, 2004, 10:46:05 PM3/10/04
to perl6-l...@perl.org
I was thinking along the lines of...

String $foo = "hello";
$foo.scramble!
print "$foo\n";
$foo = "hello"
print $foo.scramble ~ "\n";
print $foo;

OUTPUT (or close):
elhlo
hloel
hello

Also, along these same things.. is there a way to apply a method to all
variables/objects of a certain type (e.g. String, Num, etc)? Taking the
above example.. being able to write a method called "Scramble" that can be
called as a method from any String type.

Larry Wall

unread,
Mar 11, 2004, 12:29:02 AM3/11/04
to perl6-l...@perl.org
On Wed, Mar 10, 2004 at 10:46:05PM -0500, matt wrote:
: I was thinking along the lines of...

:
: String $foo = "hello";
: $foo.scramble!

That would be $foo.=scramble in the current scheme of things.

: print "$foo\n";


: $foo = "hello"
: print $foo.scramble ~ "\n";
: print $foo;
:
: OUTPUT (or close):
: elhlo
: hloel
: hello
:
: Also, along these same things.. is there a way to apply a method to all
: variables/objects of a certain type (e.g. String, Num, etc)? Taking the
: above example.. being able to write a method called "Scramble" that can be
: called as a method from any String type.

Two ways, actually. You can 'reopen" the String class and add the method:

class String is extended {
method scramble () returns String {...}
}

or if you consider that underhanded, you can define a multi-sub:

multi sub *scramble (String $s) returns String {...}

If you call that as a method, and there is no ordinary scramble method,
it will "fail soft" to looking for a scramble multimethod, and end up
calling your definition. Or you can just call it directly as a function:

scramble("hello")

Larry

Austin Hastings

unread,
Mar 11, 2004, 1:09:59 AM3/11/04
to Damian Conway, perl6-l...@perl.org

Isn't this backwards?

That is, from the above I get the impression that $?foo is TRANSIENT, while
capturing to $foo will (eventually) be PERMANENT.

So <?foo> is just a shorthand way of saying

$?foo := <foo>

right?

Is hypo-space a flat entity, or do hypothetical scopes nest? If so, do we
have to use repeated ?'s, or will just one suffice?

That is:

rule bar {...}
rule baz {...}
rule foo {...bar...baz...}

if / <?foo> ... <?baz> ... { $?foo.?baz ... $?baz } .../
OR
if / <?foo> ... <?baz> ... { $?foo.baz ... $?baz } .../
OR
if / <?foo> ... <?baz> ... { $?baz ... $?otherbaz } .../


=Austin

Larry Wall

unread,
Mar 11, 2004, 1:55:05 AM3/11/04
to perl6-l...@perl.org
On Thu, Mar 11, 2004 at 01:09:59AM -0500, Austin Hastings wrote:
:
:
: > -----Original Message-----

: > From: Damian Conway [mailto:dam...@conway.org]
: > Sent: Wednesday, 10 March, 2004 09:48 PM
: > To: perl6-l...@perl.org
: > Subject: Re: Mutating methods
: >
: >
: > Brent "Dax" Royal-Gordon wrote:
: >
: > >> / $foo:=(abc) $bar:=(def) /
: > >
: > > Am I misreading, or are you suggesting that $foo may contain
: > 'abc' after
: > > running this example, even if the match wasn't successful?
: >
: > No. I re-checked with Larry this morning and he confirmed that
: > all bindings in
: > rules only "stick" if the rule as a whole succeeds.
: >
: > What I was trying (obviously rather ineptly ;-) to point out is
: > that we have
: > to be able to differentiate between the the match object's own internal
: > hypothetical variables ($?foo, $?bar, @?baz) and any
: > external-but-temporarily-hypothesized variables ($foo, $bar, @baz).
: >
: > The syntax we've chosen to do that requires the use of "?" as a secondary
: > sigil on internal variables. So, since named subrules that capture always
: > capture to internal variables, it's natural and consistent to use "?" to
: > indicate capturing subrules as well.
:
: Isn't this backwards?
:
: That is, from the above I get the impression that $?foo is TRANSIENT, while
: capturing to $foo will (eventually) be PERMANENT.

$?foo is exactly as transient as the $0 in which it resides. So it
really depends on how long $0 lives outside the regex. In the case
of a returned parse tree it could live a very long time.

: So <?foo> is just a shorthand way of saying
:
: $?foo := <foo>
:
: right?

Yes. The ? is actually serving as a scope marker telling Perl not
to scan outside of the current regex for a variable of that name.
If you consider each rule to be its own package, it's kind of an "our"
declaration within the rule.

: Is hypo-space a flat entity, or do hypothetical scopes nest?

Um, the namespace inside a particular rule is flat, just as the
namespace inside a package is flat. That doesn't mean that your code
won't visit those variables in whatever order it jolly well pleases.
Dynamically speaking, every assertion in a regex is recursively matched "inside"
the results of previous successful assertions, regardless of the
lexical structure of the rule. You're often in situations where dynamically
you're going down recursively, while in terms of where you are in the
match, you're going out of brackets or parens. It has to be that way,
or you could never backtrack into a set of brackets or parents.

But once a subrule is matched, all its ? names are bundled up into
a hash in the single "$0"-ish object that becomes aliased (at least
temporarily) to the $?foo in the outer rule. The keys of that hash are
flat for all the names in the particular rule, though of course some
of the values may be nested $0 results from subrules. So effectively
you end up with a hash of hash of hash of hash.... representing the
entire syntax tree. But any given rule can't produce more than one
level of hash (without doing something freaky like rewriting your
hash entries inside a closure).

: If so, do we


: have to use repeated ?'s, or will just one suffice?
:
: That is:
:
: rule bar {...}
: rule baz {...}
: rule foo {...bar...baz...}
:
: if / <?foo> ... <?baz> ... { $?foo.?baz ... $?baz } .../
: OR
: if / <?foo> ... <?baz> ... { $?foo.baz ... $?baz } .../
: OR
: if / <?foo> ... <?baz> ... { $?baz ... $?otherbaz } .../

Well, you don't need "?" to go down the syntax tree, since each $0
can behave as a hash. You don't subscript hashes using "." either.
You subscript hashes with {...} historically, or these days, «...»,
when you want constant subscripts. So what you're looking for is
something like:

if / <?foo> ... <?baz> ... { $?foo{'baz'} ... $?baz } .../

or

if / <?foo> ... <?baz> ... { $?foo«baz» ... $?baz } .../

or even:

if / <?foo> ... <?baz> ... { $0«foo»«baz» ... $0«baz» } .../

Oh, and since the current $0 is actually the topic of any closure,
you can also probably say

if / <?foo> ... <?baz> ... { .«foo»«baz» ... .«baz» } .../

as an analog to

if / <?foo> ... <?baz> ... { .{'foo'}{'baz'} ... .{'baz'} } .../

That's presuming we keep the rule that scalars don't have to include
the sigils. For an array you'd still have to say:

if / @?things:=[ (<ident>) ,? ]+ { ... $0«@?things» ... } /

or

if / @?things:=[ (<ident>) ,? ]+ { ... .«@?things» ... } /

But then it's usually easier just to say

if / @?things:=[ (<ident>) ,? ]+ { ... @?things ... } /

which means exactly the same thing.

Larry

Andy Wardley

unread,
Mar 11, 2004, 6:38:11 AM3/11/04
to Larry Wall, perl6-l...@perl.org
Larry Wall wrote:
> multi sub *scramble (String $s) returns String {...}
[...]

> Or you can just call it directly as a function:
> scramble("hello")

Can you also call scramble as a class method?

class String is extended {
method scramble { ..etc... }
}

String.scramble("hello")

A

Andy Wardley

unread,
Mar 11, 2004, 7:32:47 AM3/11/04
to Larry Wall, perl6-l...@perl.org
Larry Wall wrote:
> Yet another approach is to *replace* dot with something that mutates:
>
> @array!sort
> @array?sort
>
> Either of those would work syntactically in that case, since neither !
> nor ? is expected as a binary operator.

What about ? is as a ternary operator:

@foo?bar:baz;

Or am I missing.something?

A

Uri Guttman

unread,
Mar 11, 2004, 9:00:42 AM3/11/04
to Andy Wardley, Larry Wall, perl6-l...@perl.org
>>>>> "AW" == Andy Wardley <a...@andywardley.com> writes:

AW> What about ? is as a ternary operator:

AW> @foo?bar:baz;

IIRC, that was changed to ?? :: because larry wanted the single ? for
more important uses. also doubling the ? made it more like &&, || which
are related logical ops.

and ?? as the oneshot regex match is totally out.

uri

--
Uri Guttman ------ u...@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org

Gregor N. Purdy

unread,
Mar 11, 2004, 9:49:44 AM3/11/04
to Larry Wall, perl6-l...@perl.org
Larry --

So, will "mutatingness" be a context we'll be able to inquire on
in the implementation of a called routine? Or, could we provide
a specialized distinct implementation for mutating that would get
called if .=X() is used? If we are performing some operation on
large data, and we know the end result is going to clobber the
current object, we could avoid making an extra copy.

I suppose there is some danger here. What if I write a class
that I intend to have value semantics. That is, once an instance's
value is set at construction time, it never changes, although you
can get new instances by invoking its methods. BigInt would
work this way. I can imagine a Point class working this way - you
don't (necessarily) want two objects hanging on to a point, and one
of them to mutate it into a different value out from under the other
one. You wouldn't expect that behavior from other value objects such
as built-in strings.

This points at mutatingness being aimed at the reference (variable)
not the referrent (value), unless it can be different in the case
of value-objects and container-objects...

So, if we had a BigDataContainer class for which it *was* reasonable
to mutate it in place, and we wanted that behavior to trigger on .=
to do an in-place modification:

$bigData .=applyBlockCipher($cipher, $key);

would there be a way to do that without the extra copy implied in:

$bigData = $bigData.applyBlockCipher($cipher, $key);

while leaving

$foo .=someOtherMethod();

equivalent to

$foo = $foo.someOtherMethod();

when $foo's class or someOtherMethod() implementation doesn't do
anything special?


Regards,

-- Gregor

--
Gregor Purdy gre...@focusresearch.com
Focus Research, Inc. http://www.focusresearch.com/

Larry Wall

unread,
Mar 11, 2004, 12:33:28 PM3/11/04
to perl6-l...@perl.org, Andy Wardley

Not unless you write a class method that takes an extra argument.
Otherwise you're passing a class where it expects a string, and a
string where it expects nothing. However, much like in Perl 5 you
can always force which class's method to call with

"hello".String::scramble();

Larry

Larry Wall

unread,
Mar 11, 2004, 2:11:54 PM3/11/04
to perl6-l...@perl.org
On Thu, Mar 11, 2004 at 06:49:44AM -0800, Gregor N. Purdy wrote:
: So, will "mutatingness" be a context we'll be able to inquire on

: in the implementation of a called routine?

Probably not, but it's vaguely possible you could somehow get a
reference to what is being assigned to, if available, and check to see
if $dest =:= $src (where =:= tests to see if two refs point to the
same object). But in general I think most "want" testing is just a
way of making code run slow, because it forces tests to be done at run
time that should be done at compile time or dispatch time. It's better
for the optimizer if you can give it enough type hints and signature
hints to decide things earlier than the body of the sub or method.

: Or, could we provide a specialized distinct implementation


: for mutating that would get called if .=X() is used?

That is much more likely. In general if you don't define both an <op>
and an <op>= then Perl can autogenerate or emulate the missing one for you.

Now in the specific case of . and .= we don't exactly have a normal
binary operator, because the right side is not an expression. So we
may have to provide a way of marking a normal method as a mutator.
Possibly we end up with

method =sort (Array @ary) returns Array {...} # inplace
method sort (Array @ary) returns Array {...} # cloning

That works nicely with the .= vs . distinction, visually speaking.

On the other hand, you might want to do the same with multi subs:

multi sub =sort (Array @ary) returns Array {...} # inplace
multi sub sort (Array @ary) returns Array {...} # cloning

and then it gets a little more problematic syntactically because
multis are called like subroutines:

=sort(@array);

We would have to allow an initial = at the beginning of a term. So far
I've resisted doing that because I don't want

@obj.meth=foo();

to become ambiguous, in case I decide to make the parentheses optional
on method calls with arguments. If I did decide that, and we have
terms beginning with =, it would not be clear whether the above meant

@obj.meth(=foo());

or

@obj.meth=(foo());

The = prefix notation also doesn't work very well for talking about the
name of a routine:

&=sort

That looks an awful lot like a junctive assignment operator...

From a C++-ish perspective, the right thing to do is to differentiate
not by the name but by the declared mutability of the invocant:

multi sub sort (Array @ary is rw) returns Array {...} # inplace
multi sub sort (Array @ary) returns Array {...} # cloning

Or I suppose a case could be made for something that specifically
declares you're returning one of the arguments:

multi sub sort (Array @ary is rw) returns @ary {...} # inplace

After all, it's possible to write a method that mutates its invocant
but *doesn't* return it like a well-behaved mutator should. You don't
always call a mutator in a void context--sometimes you want
to be able to stack mutators:

@array.=sort.=uniq;

So you have to be able to return the mutant as well as mutate it in place.

On the other hand, I'm deeply suspicious of a return signature that
mentions a specific variable. What if the body says to return something
else? Is that just ignored? Do we check it to see if it's the same
item?

So my guess is that it's probably better to have something more specific
for the mutator "template". I think, actually, that I've convinced myself
that a mutator should be marked in its name, and that it should generally
be defined as a standard method rather than a multi sub:

method =sort (Array @ary is rw) {...} # inplace

This would automatically arrange to return the invocant.
It would be illegal to use C<return> in such a routine. And I guess,
since it's an ordinary method, we can leave out the invocant:

method =sort () {...} # inplace

with the assumption that the default invocant on a mutator would
automatically be assumed "rw".

If you do happen to want to define a multi sub mutator, then the
syntax for calling it could be

&«=sort»(@array)

However, we really don't have to special case the = prefix syntax if
we make it something like:

method postfix:.=sort () {...} # inplace
multi sub postfix:.=sort () {...} # inplace

That's getting way up there on the ugliness factor. Might be worth
a new operator category:

method mutate:sort () {...} # inplace
multi sub mutate:sort () {...} # inplace

or

method inplace:sort () {...} # inplace
multi sub inplace:sort () {...} # inplace

or

method rw:sort () {...} # inplace
multi sub rw:sort () {...} # inplace

or

method self:sort () {...} # inplace
multi sub self:sort () {...} # inplace

On the final hand, if people fall in love with both self:sort and =sort, we
could have =sort be a shorthand for self:sort where it's unambiguous.

On the (n+1)st hand, that says we could write it either as

@array.=sort.=uniq

or

@array.self:sort.self:uniq

Perhaps that's okay under TMTOWTDI. I actually find the shorter one
more readable. But then calling it as a sub would always just be

self:sort(@array);

And then,

.self:sort

might or might not be preferred over

.=sort

: If we are performing some operation on


: large data, and we know the end result is going to clobber the
: current object, we could avoid making an extra copy.

Yes, computer performance is desirable. but I think the biggest goal
of the mutating operators is mental performance. The fact is that

$a += 1;

is much easier to understand than

$a = $a + 1;

: I suppose there is some danger here. What if I write a class


: that I intend to have value semantics. That is, once an instance's
: value is set at construction time, it never changes, although you
: can get new instances by invoking its methods. BigInt would
: work this way. I can imagine a Point class working this way - you
: don't (necessarily) want two objects hanging on to a point, and one
: of them to mutate it into a different value out from under the other
: one. You wouldn't expect that behavior from other value objects such
: as built-in strings.
:
: This points at mutatingness being aimed at the reference (variable)
: not the referrent (value), unless it can be different in the case
: of value-objects and container-objects...
:
: So, if we had a BigDataContainer class for which it *was* reasonable
: to mutate it in place, and we wanted that behavior to trigger on .=
: to do an in-place modification:
:
: $bigData .=applyBlockCipher($cipher, $key);
:
: would there be a way to do that without the extra copy implied in:
:
: $bigData = $bigData.applyBlockCipher($cipher, $key);
:
: while leaving
:
: $foo .=someOtherMethod();
:
: equivalent to
:
: $foo = $foo.someOtherMethod();
:
: when $foo's class or someOtherMethod() implementation doesn't do
: anything special?

We can autogenerate routines however we like as long as the metadata
is there to decide how to do it. In this case any autogenerated <op>=
is going to call the (presumably cloning) <op> and then assign the
resulting reference back to the source. So the default would do what
you want, I think.

Going the other way, if you only define <op>=, then an autogenerated
<op> would presumably .clone the object before doing the <op>= on it.

It also may be that these can work a little differently if you know
the underlying datatype is copy-on-write.

Larry

Jonathan Scott Duff

unread,
Mar 11, 2004, 3:05:55 PM3/11/04
to perl6-l...@perl.org
On Thu, Mar 11, 2004 at 11:11:54AM -0800, Larry Wall wrote:
> On the final hand, if people fall in love with both self:sort and =sort, we
> could have =sort be a shorthand for self:sort where it's unambiguous.

Wouldn't =sort potentially muck with POD?

-Scott
--
Jonathan Scott Duff
du...@pobox.com

Larry Wall

unread,
Mar 11, 2004, 3:43:22 PM3/11/04
to perl6-l...@perl.org
On Thu, Mar 11, 2004 at 02:05:55PM -0600, Jonathan Scott Duff wrote:

: On Thu, Mar 11, 2004 at 11:11:54AM -0800, Larry Wall wrote:
: > On the final hand, if people fall in love with both self:sort and =sort, we
: > could have =sort be a shorthand for self:sort where it's unambiguous.
:
: Wouldn't =sort potentially muck with POD?

Could. Historically pod only pays attention to = on the left margin though.
So you generally wouldn't have any problem unless you were in the habit
of declaring your methods in the C-ish idiom of:

int method
=rotate (int $a is rw) {...}

On the other hand, I suspect most people will end up declaring it

int method
self:rotate (int $a is rw) {...}

in any event, and reserve the =rotate for .=rotate, which can never put
the = on the left margin, even if we let ourselves have whitespace
before POD directives. So maybe we just require self: for the declaration,
and forget about = there. It interacts badly with global names anyway.
Is it "*=sort" or "=*sort"? With "*self:sort" it's more obvious.

Another interesting question, if the "postfix:.=foo" mess is defined
with as self:foo, should infix:+= be defined as self:+ instead?
In other words, should the <op>= syntax really be a metasyntax like
hyperoperators, where you never actually have to define a C<»+«>
operator, but the hyperoperator is always autogenerated from ordinary
C<+>? So basically any infix:<op>= gets remapped to self:<op>.

In that case, C<»+=«> is a double-meta operator that ends up generating
a hyper self:+.

I kinda like this approach because it means you can always get all of

$a !! $b
$a !!= $b
@a »!!« @b
@a »!!=« @b

merely by defining infix:!!. On the other hand, it also means that
someone can say silly things like:

$a cmp= $b
$a ~~= $b

I suppose we could simply disallow meta-= on non-associating operators.
Can anyone come up with a non-associating binary operator that *should*
have an assignment operator? The basic definition of non-associating
seems to be that the type of the arguments is incompatible with the
type produced by the operator. Which is precisely the problem with
something like

$a cmp= $b

insofar as $a is being treated as a string at one moment and as a boolean
at the next.

Larry

Larry Wall

unread,
Mar 11, 2004, 4:04:42 PM3/11/04
to perl6-l...@perl.org
On Thu, Mar 11, 2004 at 12:43:22PM -0800, Larry Wall wrote:
: Which is precisely the problem with something like

:
: $a cmp= $b
:
: insofar as $a is being treated as a string at one moment and as a boolean
: at the next.

Well, okay, not a boolean. More like a troolean.

Larry

John Siracusa

unread,
Mar 11, 2004, 4:14:44 PM3/11/04
to Perl 6 Language

Back in myyyyy daaayyyy, we used to call that a "scalar." And we liked it,
because it was all we had! ;)

-John

Chromatic

unread,
Mar 11, 2004, 4:18:52 PM3/11/04
to Larry Wall, p6l
On Thu, 2004-03-11 at 13:04, Larry Wall wrote:

> Well, okay, not a boolean. More like a troolean.

Unless it's a falselean.

-- c

Larry Wall

unread,
Mar 11, 2004, 4:38:58 PM3/11/04
to p6l
On Thu, Mar 11, 2004 at 01:18:52PM -0800, chromatic wrote:

: On Thu, 2004-03-11 at 13:04, Larry Wall wrote:
:
: > Well, okay, not a boolean. More like a troolean.
:
: Unless it's a falselean.

It's more truelean than falselean by a 2/3rds majority. And it's
much more if you include 2, -2, 3, -3,... in the data type. And it's
*very* much more if you include the reals....

Larry

Matthew Walton

unread,
Mar 11, 2004, 6:40:36 PM3/11/04
to p6l

So that's a (numeric) scalar then...

I'm new to this list, although I've been keeping an eye on Perl 6 for
quite a while now as it's looking like it's going to be an extremely
pleasant language to work with. Seems I joined at the right time as
well, for these mutators are an interesting thing. Please excuse my no
doubt numerous abuses of conventional formatting used here as I don't
know it yet, and I've got a very shaky knowledge of some parts of the
Perl 6 grammar that everyone posting seems to know.

However, it strikes me that notation like

int method =foo(String $bar) {...}

is at risk of causing serious confusion to people coming from other
languages. This may not be a concern, of course (and isn't really one of
mine despite being a C++/Perl 5/Haskell kind of person at the moment).
It seems that

int method self:foo(String $bar) {...}

is clearer and easier to read, but I did actually prefer

int method mutate:foo(String $bar) {...}

or

int method inplace:foo(String $bar) {...}

which seem to have been dismissed in favour of the form using C<self>,
although I can see that it does have a valid interpretation. Perhaps I'm
just too stuck in writing member subs of objects in Perl 5 by saying

sub foo {
my $self = shift;
# something useful here
}

so I always see 'self' as reading something like 'this' does in C++ or
Java (or as 'self' does in Python, if I'm remembering that correctly).
There is undeniable logic in using it to define mutators though, as they
do most certainly act upon 'self' or 'this' or whatever it's called.

One is lead to wonder if the most appropriate definition might not be

int method mutator:foo(String $bar) { ... }

but that's getting very silly, so maybe just ignore everything I said
just now and cheer the introduction of C<self> as the most practical and
least prone to the introduction of finger trouble.

And having said all that, I like .= as invocation syntax for it, even if
I keep thinking it means 'append string'.

Anyway, thankyou for listening, I shall return now to watching in awe.

Matthew

Damian Conway

unread,
Mar 11, 2004, 6:51:28 PM3/11/04
to perl6-l...@perl.org
Larry wrote:

> On the other hand, I suspect most people will end up declaring it
>
> int method
> self:rotate (int $a is rw) {...}
>
> in any event, and reserve the =rotate for .=rotate, which can never put
> the = on the left margin, even if we let ourselves have whitespace
> before POD directives. So maybe we just require self: for the declaration,
> and forget about = there.

Yes please!


> It interacts badly with global names anyway.
> Is it "*=sort" or "=*sort"? With "*self:sort" it's more obvious.

Agreed. I'd *very* much prefer to see "reflexive" methods like this declared
C<self:methodname>. From a readability stand-point, if for no other reason.


> Another interesting question, if the "postfix:.=foo" mess is defined
> with as self:foo, should infix:+= be defined as self:+ instead?
> In other words, should the <op>= syntax really be a metasyntax like
> hyperoperators, where you never actually have to define a C<»+«>
> operator, but the hyperoperator is always autogenerated from ordinary
> C<+>? So basically any infix:<op>= gets remapped to self:<op>.

I think that would be cleaner.


> On the other hand, it also means that
> someone can say silly things like:
>
> $a cmp= $b
> $a ~~= $b
>
> I suppose we could simply disallow meta-= on non-associating operators.
> Can anyone come up with a non-associating binary operator that *should*
> have an assignment operator? The basic definition of non-associating
> seems to be that the type of the arguments is incompatible with the
> type produced by the operator. Which is precisely the problem with
> something like
>
> $a cmp= $b
>
> insofar as $a is being treated as a string at one moment and as a boolean
> at the next.

I think it's "merely" a philosophical problem.

After all, we don't complain when people write:

$a = $a cmp $b;

So should we complain when people write exactly the same thing, only as:

$a cmp= $b;

Stylistically, they're equally as abhorrent, but Perl users aren't expecting
the Stylish Inquisition.

The real question is whether the two forms are equally likely to indicate a
logic error. One could argue that anyone who writes the first is more likely
just being (small-l) lazy, whereas writing the second probably indicates a
"thinko". But then one could also argue that it's (small-l) lazier to write
the second than the first, so the second is actually *more* likely to be
(small-l) laziness than error.

There are also cases where something like:

$a ||= $b;

or:

$a += $b;

changes the type of value in $a. Should we flag those too? Currently we do
warn on the second one if $a can't be cleanly coerced to numeric. Would that
be enough for C<cmp=> too, perhaps?


Damian


Austin Hastings

unread,
Mar 12, 2004, 3:47:57 AM3/12/04
to perl6-l...@perl.org

But it would work as a "class multi", right?

class String is extended {
multi scramble(String $s) {...}
}

"hello".scramble();
String::scramble("hello"); # Way overspecified for a multi...

=Austin

Austin Hastings

unread,
Mar 12, 2004, 3:47:22 AM3/12/04
to Larry Wall, perl6-l...@perl.org

> -----Original Message-----
> From: Larry Wall [mailto:la...@wall.org]

> On Thu, Mar 11, 2004 at 06:49:44AM -0800, Gregor N. Purdy wrote:
> : So, will "mutatingness" be a context we'll be able to inquire on
> : in the implementation of a called routine?
>
> Probably not, but it's vaguely possible you could somehow get a
> reference to what is being assigned to, if available, and check to see
> if $dest =:= $src (where =:= tests to see if two refs point to the
> same object). But in general I think most "want" testing is just a
> way of making code run slow, because it forces tests to be done at run
> time that should be done at compile time or dispatch time. It's better
> for the optimizer if you can give it enough type hints and signature
> hints to decide things earlier than the body of the sub or method.
>
> : Or, could we provide a specialized distinct implementation
> : for mutating that would get called if .=X() is used?
>
> That is much more likely. In general if you don't define both an <op>
> and an <op>= then Perl can autogenerate or emulate the missing
> one for you.
>
> Now in the specific case of . and .= we don't exactly have a normal
> binary operator, because the right side is not an expression.

$tis.=««sad pity pity sad sad pity true»;

$s .= ($useMbcs ? wlength : length);

(Side note: although that expression isn't valid, since the wlength
and length methods aren't qualified, it *should* be, since a human could
infer it rather easily. Can we make that DWIM? (One way would be
for the parser to convert that into if-else form if it appeared
ambiguous.))

> So we may have to provide a way of marking a normal method as a
> mutator. Possibly we end up with
>
> method =sort (Array @ary) returns Array {...} # inplace
> method sort (Array @ary) returns Array {...} # cloning
>
> That works nicely with the .= vs . distinction, visually speaking.

Why not just put a property on the calling context, and allow either:

# Run-time handling
method sort(Array @a) { if ($CALLER.mutating) {...} ...}
or
# Properties should be after names
method sort:mutating(Array @a) {...}
or
# But this is consistent with operators
method mutating:sort(Array @a) {...}

>
> On the other hand, you might want to do the same with multi subs:
>
> multi sub =sort (Array @ary) returns Array {...} # inplace
> multi sub sort (Array @ary) returns Array {...} # cloning
>
> and then it gets a little more problematic syntactically because
> multis are called like subroutines:
>
> =sort(@array);
>
> We would have to allow an initial = at the beginning of a term. So far
> I've resisted doing that because I don't want
>
> @obj.meth=foo();
>
> to become ambiguous, in case I decide to make the parentheses optional
> on method calls with arguments. If I did decide that, and we have
> terms beginning with =, it would not be clear whether the above meant
>
> @obj.meth(=foo());
>
> or
>
> @obj.meth=(foo());

Or @obj.meth = foo();

(As much as I despise those who don't use spaces around the assignment
operator, I'm willing to defend their "right" to the practice...)

> The = prefix notation also doesn't work very well for talking about the
> name of a routine:
>
> &=sort
>
> That looks an awful lot like a junctive assignment operator...

But it would be obvious from context that it was/n't:

$foo = &=sort;
bar(&=sort);
$baz &=sort;

> From a C++-ish perspective, the right thing to do is to differentiate
> not by the name but by the declared mutability of the invocant:
>
> multi sub sort (Array @ary is rw) returns Array {...} # inplace
> multi sub sort (Array @ary) returns Array {...} # cloning
>
> Or I suppose a case could be made for something that specifically
> declares you're returning one of the arguments:
>
> multi sub sort (Array @ary is rw) returns @ary {...} # inplace
>
> After all, it's possible to write a method that mutates its invocant
> but *doesn't* return it like a well-behaved mutator should. You don't
> always call a mutator in a void context--sometimes you want
> to be able to stack mutators:
>
> @array.=sort.=uniq;
>
> So you have to be able to return the mutant as well as mutate it in place.

In the case of mutators, won't the return always be the first argument?

So couldn't we just say:

multi sub sort(Array @ary is rw is mutated) returns Array {...}
multi sub sort(Array @ary) returns Array {...}

(and can't we infer the "returns Array" when "is mutated" is provided?)

> On the other hand, I'm deeply suspicious of a return signature that
> mentions a specific variable. What if the body says to return something
> else? Is that just ignored? Do we check it to see if it's the same
> item?

No. You might well say:

$string.=length;

And convert from one subtype to another. I think the mutation indicator
is a hint to the optimizer, and a crutch for the implementor, in cases
where it's possible to squeeze more performance out of skipping the
assignment phase. (In particular, where an inefficient assignment operator
exists.)

Question: Can all this noise be eliminated by paying more attention to
the construction of the assignment operator?

That is, do we have an example where $a .= meth is going to perform poorly,
and that performance is NOT because of the $a = $a.meth assignment? (And
that cannot be fixed by declaring the invocant 'is rw'.)

> So my guess is that it's probably better to have something more specific
> for the mutator "template". I think, actually, that I've convinced myself
> that a mutator should be marked in its name, and that it should generally
> be defined as a standard method rather than a multi sub:
>
> method =sort (Array @ary is rw) {...} # inplace
>
> This would automatically arrange to return the invocant.
> It would be illegal to use C<return> in such a routine.

So what? C<goto END;> ?

This isn't making it onto the "good ideas list"...

> And I guess, since it's an ordinary method, we can leave out the invocant:
>
> method =sort () {...} # inplace
>
> with the assumption that the default invocant on a mutator would
> automatically be assumed "rw".
>
> If you do happen to want to define a multi sub mutator, then the
> syntax for calling it could be
>
> &«=sort»(@array)
>

What about defining a multimethod mutator:

multi =foo(Array @a, Object $b) {...}
multi =foo(Array @a, Scalar $s) {...}

I want to be able to say

@a.=foo($x);

and have it work, right?

I'd still like to put a space between = and sort.

@a .= sort

may *imply* using inplace:sort instead of just sort, but I like that
space for ergonomic reasons.


> : If we are performing some operation on
> : large data, and we know the end result is going to clobber the
> : current object, we could avoid making an extra copy.
>
> Yes, computer performance is desirable. but I think the biggest goal
> of the mutating operators is mental performance. The fact is that
>
> $a += 1;
>
> is much easier to understand than
>
> $a = $a + 1;

Even you like the space. Vive le space!

=Austin

Deborah Pickett

unread,
Mar 11, 2004, 8:29:36 PM3/11/04
to perl6-l...@perl.org
On Fri, 12 Mar 2004 10.51, Damian Conway wrote:
> There are also cases where something like:
>
> $a ||= $b;
>
> or:
>
> $a += $b;
>
> changes the type of value in $a. Should we flag those too? Currently we do
> warn on the second one if $a can't be cleanly coerced to numeric. Would
> that be enough for C<cmp=> too, perhaps?

That triggered a thought about unary operators. What about:

$a !=; # i.e., $a = ! $a;

Obviously that particular example is a syntax error, but perhaps a more
verbose "self:" version of same would not be.

--
Debbie Pickett
http://www.csse.monash.edu.au/~debbiep
deb...@csse.monash.edu.au

Austin Hastings

unread,
Mar 12, 2004, 5:32:33 AM3/12/04
to Damian Conway, perl6-l...@perl.org
> -----Original Message-----
> From: Damian Conway [mailto:dam...@conway.org]
>
> Larry wrote:
>
> > On the other hand, I suspect most people will end up declaring it
> >
> > int method
> > self:rotate (int $a is rw) {...}
> >
> > in any event, and reserve the =rotate for .=rotate, which can never put
> > the = on the left margin, even if we let ourselves have whitespace
> > before POD directives. So maybe we just require self: for the
> declaration,
> > and forget about = there.
>
> Yes please!
>
> > It interacts badly with global names anyway.
> > Is it "*=sort" or "=*sort"? With "*self:sort" it's more obvious.
>
> Agreed. I'd *very* much prefer to see "reflexive" methods like this
declared
> C<self:methodname>. From a readability stand-point, if for no other
reason.
>

Boo, hiss.

Two things:

1- I'd rather use "inplace" than self.

2- I'd rather it be AFTER, than BEFORE, the name, because

method sort
method sort:inplace

reads, and more importantly SORTS, better than

method inplace:sort
method sort

To wit:

method :infix:<=>(Array, Array) returns Scalar
method :infix:==(Array, Array) returns Boolean
method :infix:!=(Array, Array) returns Boolean
method :infix:===(Array, Array) returns Boolean
method :infix:!==(Array, Array) returns Boolean
method :infix:x(Array) returns Array
method :infix:x:inplace(Array is rw)

### Note: How to handle [undef]? return-undef, or PHP-like push?
method :postfix:[](Array is rw, ?Scalar) returns Scalar

### Inplace-only?
method clear(Array is rw) returns Boolean

method compact(Array) returns Array
method compact:inplace(Array is rw)
### Inplace-only?
method delete(Array is rw, Int) returns WHAT, exactly?

method difference(Array, Array) returns Array #A-B
method differences(Array, Array) returns Array #(A-B) + (B-A)
method exists(Array, Scalar) returns Boolean
method flatten(Array) returns Array
method flatten:inplace(Array is rw) returns Array
method grep(Array, Code) returns Array
method includes(Array, Scalar) returns Boolean
method index(Array, Scalar) returns Int
method intersect(Array, Array)
method is_empty(Array) return Boolean
method join(Array, String)
method length(Array)
method map(Array, Code) returns Array
method pack(Array, String) returns String
method reverse(Array) returns Array
method reverse:inplace(Array is rw)
method rindex(Array) returns Int

### Boy are these likely to be wrong!
method sort(Array, ?Code) returns Array
method sort:inplace(Array is rw, ?Code)

### Inplace-only?
method splice(Array is rw, ?Int, ?Int, ?Array)

method union(Array, Array) returns Array
method unique(Array) returns Array
method unique:inplace(Array is rw)

### Inplace-only?
multi method fill(Array is rw, Scalar, Int, Int)
multi method fill(Array is rw, Scalar, Int)
multi method fill(Array is rw, Scalar, Array)
### Inplace-only?
multi method pop(Array is rw, ?Int) returns Array
multi method pop(Array is rw) returns Scalar
### Inplace-only?
multi method unshift(Array is rw, Scalar) returns Array
multi method unshift(Array is rw, Array) returns Array
### Inplace-only?
multi method push(Array is rw, Array) returns Array
multi method push(Array is rw, Scalar)
### Inplace-only?
multi method shift(Array is rw, Int) returns Array
multi method shift(Array is rw) returns Scalar

multi sub each(Array) returns Iterator # HOW does this work?

(Note also that :...fix sorts better than in-, post-, and pre-. I'd like to
suggest changing
them, since it costs nothing and results in a mild improvement in automation
behavior.)

> > Another interesting question, if the "postfix:.=foo" mess is defined
> > with as self:foo, should infix:+= be defined as self:+ instead?
> > In other words, should the <op>= syntax really be a metasyntax like
> > hyperoperators, where you never actually have to define a C<»+«>
> > operator, but the hyperoperator is always autogenerated from ordinary
> > C<+>? So basically any infix:<op>= gets remapped to self:<op>.
>
> I think that would be cleaner.

Alternatively, is there a valid reason to *need* to define your own
hyperoperator?

That is, can you code C< @a +« $x > better than C< @a.map {return $_ + $x}
>? I suspect that it's possible to do so, particularly for such simple cases
as assignment. (Hint: Persistent objects, database, one SQL statement per
assignment.)

So perhaps I should ask for an :infix:=« operator override?

> > On the other hand, it also means that
> > someone can say silly things like:
> >
> > $a cmp= $b
> > $a ~~= $b
> >
> > I suppose we could simply disallow meta-= on non-associating operators.
> > Can anyone come up with a non-associating binary operator that *should*
> > have an assignment operator? The basic definition of non-associating
> > seems to be that the type of the arguments is incompatible with the
> > type produced by the operator. Which is precisely the problem with
> > something like
> >
> > $a cmp= $b
> >
> > insofar as $a is being treated as a string at one moment and as a
boolean
> > at the next.
>
> I think it's "merely" a philosophical problem.
>
> After all, we don't complain when people write:
>
> $a = $a cmp $b;
>
> So should we complain when people write exactly the same thing, only as:
>
> $a cmp= $b;
>
> Stylistically, they're equally as abhorrent, but Perl users aren't
expecting
> the Stylish Inquisition.
>

So long as you allow modifiers in there, as:

$a cmp:i:n=8= $b;

I'm okay with it. Good luck with <=>=

> The real question is whether the two forms are equally likely to indicate
a
> logic error. One could argue that anyone who writes the first is more
likely
> just being (small-l) lazy, whereas writing the second probably indicates a
> "thinko". But then one could also argue that it's (small-l) lazier to
write
> the second than the first, so the second is actually *more* likely to be
> (small-l) laziness than error.
>
> There are also cases where something like:
>
> $a ||= $b;
>
> or:
>
> $a += $b;
>
> changes the type of value in $a. Should we flag those too?
> Currently we do warn on the second one if $a can't be cleanly coerced to
numeric.
> Would that be enough for C<cmp=> too, perhaps?

Data type conversion, especially for Scalar subtypes, should very definitely
be controllable by pragma.

> Damian

=Austin

Luke Palmer

unread,
Mar 12, 2004, 12:03:52 PM3/12/04
to Austin Hastings, perl6-l...@perl.org

Well, for a slightly more complex expression, a human would have some
trouble. This is very likely to be laziness, and we can do without it.
There is certainly a way to do this if it is absolutely necessary:

my $method = ($useMbcs ?? 'wlength' :: 'length');
$s.=$method;

> > On the other hand, I'm deeply suspicious of a return signature that
> > mentions a specific variable. What if the body says to return something
> > else? Is that just ignored? Do we check it to see if it's the same
> > item?
>
> No. You might well say:
>
> $string.=length;
>
> And convert from one subtype to another. I think the mutation indicator
> is a hint to the optimizer, and a crutch for the implementor, in cases
> where it's possible to squeeze more performance out of skipping the
> assignment phase. (In particular, where an inefficient assignment operator
> exists.)

The last thing we need is another idiom that gets destroyed for
"efficiency" reasons. Once people hear that that is "fast", they'll
start writing:

$string.=length;

Instead of what they would usually write, the much cleaner:

my $len = $string.length;

Even though the latter is only 0.05% slower.

Speed has corrupted many programmers.

>
> Question: Can all this noise be eliminated by paying more attention to
> the construction of the assignment operator?
>
> That is, do we have an example where $a .= meth is going to perform poorly,
> and that performance is NOT because of the $a = $a.meth assignment? (And
> that cannot be fixed by declaring the invocant 'is rw'.)

The performance issue is never because of the assignment. Assignment is
basically free: it's just copying a pointer.

It's usually because of the construction. Constructing a 10,000 element
array's going to be expensive, so you'd rather sort in place.

Luke

Dave Whipp

unread,
Mar 12, 2004, 12:19:46 PM3/12/04
to perl6-l...@perl.org

"Larry Wall" <la...@wall.org> wrote in message
news:20040310192...@wall.org...

> Unfortunately we can't just use topicalization to say
>
> my Cat $tom = .new()
>
> because most people won't expect simple assignment to break their
> current topic.
>
> So another option is to replace = with something that I<does> set the
> topic for the right side. If we used .= for that, then you'd have
> to write
>
[...]
>
> Another approach would be to have some kind of "microtopic" that
> represented the left side of an ordinary assignment. Suppose for
> the sake of argument that the microtopic is ^. Then you could write
>
> @array = ^.sort;
>
> and a constructor would be
>
> my Kanga $roo = ^.new()
>
> But that introduces a new concept that doesn't really generalize well.
> So forget that.

Why are we mixing the concepts of assignment and topicalization --
especially in a way that doesn't generalize. Why can't we invent a
"topicalization" operator, analogous to the old binding operator, that
simply sets its LHS as the topic of its RHS: and then have an assigning
version of that operator.

For example, lets use the "section" Unicode symbol: "§" to locally set the
current topic within an expression. Now we could say:

$x = ( $foo § .a + .b + .c )

to mean

$x = $foo.a + $foo.b + $foo.c

The assigning version of the operator could be

$x §= .foo;
my Dog $dog §= .new;


Dave.


Jonathan Scott Duff

unread,
Mar 12, 2004, 12:28:09 PM3/12/04
to Austin Hastings, Larry Wall, perl6-l...@perl.org
On Fri, Mar 12, 2004 at 03:47:22AM -0500, Austin Hastings wrote:
> > -----Original Message-----
> > From: Larry Wall [mailto:la...@wall.org]
> >
> > Now in the specific case of . and .= we don't exactly have a normal
> > binary operator, because the right side is not an expression.
>
> $tis.=««sad pity pity sad sad pity true»;
>
> $s .= ($useMbcs ? wlength : length);
>
> (Side note: although that expression isn't valid, since the wlength
> and length methods aren't qualified, it *should* be, since a human could
> infer it rather easily. Can we make that DWIM? (One way would be
> for the parser to convert that into if-else form if it appeared
> ambiguous.))

So ... how smart will perl6 be?

$o .= (foo,bar,baz);
$o .= (expr_returning_method);

Since human expectations vary I don't think I want these.

Larry Wall

unread,
Mar 12, 2004, 12:43:59 PM3/12/04
to perl6-l...@perl.org
On Fri, Mar 12, 2004 at 12:29:36PM +1100, Deborah Pickett wrote:
: That triggered a thought about unary operators. What about:

:
: $a !=; # i.e., $a = ! $a;

Well, an argument could be made that the corresponding syntax is really:

!= $a;

But you have to read the

A <op>= B ==> A = A <op> B

transformation differently. Something more like

A <op>= B ==> <mysterylocation> = A <op> B

where <mysterylocation> turns out to be the first actual term, which
when A is specified is A, and otherwise B. In other words, dropping
the A out gives you both the binary and unary forms:

A <op>= B ==> <mysterylocation> = A <op> B
<op>= B ==> <mysterylocation> = <op> B

That could actually be pretty handy for

+= $a # coerce yourself to numeric
~= $a # coerce yourself to string
?= $a # coerce yourself to boolean

On the other hand, if we did that generally, we'd also get operators like:

\= $a # turn $a into a reference to itself

Yow.

: Obviously that particular example is a syntax error, but perhaps a more

: verbose "self:" version of same would not be.

Well, it's only a syntax error because we say it's a syntax error. But
I do think prefix unarys tend to be more readable than postfix.

But, yes, method calls are essentially unary postfix operators.
In the OO worldview it's perfectly valid to tell an object to do
something to itself. In the functional worldview, of course, keeping
any kind of state is viewed with deep suspicion.

So really, it's just a matter of whether there's a standard syntax for
"negate yourself". I don't think there is a large call for it, since
most objects don't think of themselves as booleans. But if an object
did want to support that behavior, saying

$a.self:!();

would certainly be "self" evident, as it were. It might even come
for free with the Boolean role. But then there's the good question
of whether to allow the unary op:

!= $a;

Some good questions only have bad answers. This might be one of them.

Larry

John Siracusa

unread,
Mar 12, 2004, 1:30:50 PM3/12/04
to Perl 6 Language
On 3/12/04 12:43 PM, Larry Wall wrote:
> Some good questions only have bad answers. This might be one of them.

I have been watching this thread with increasing unease, asking myself
exactly what the potential benefit is of this proposed feature and syntax.
I'm all for saving some typing, but yeesh. The only case that seems even
remotely onerous is this one:

my My::Big::Class::Name $obj = My::Big::Class::Name.new();

vs.

my My::Big::Class::Name $obj .= new()

but I'm willing to eat that if I never have to see the other crazy "="-based
syntax proposals this thread has spawned :)

-John

Larry Wall

unread,
Mar 12, 2004, 3:01:10 PM3/12/04
to perl6-l...@perl.org
On Fri, Mar 12, 2004 at 09:19:46AM -0800, Dave Whipp wrote:
: Why are we mixing the concepts of assignment and topicalization --
: especially in a way that doesn't generalize. Why can't we invent a
: "topicalization" operator, analogous to the old binding operator, that
: simply sets its LHS as the topic of its RHS: and then have an assigning
: version of that operator.

Hmm, yes, other than the usual arguments against inventing *any* new
operator...

: For example, lets use the "section" Unicode symbol: "§" to locally set the


: current topic within an expression. Now we could say:
:
: $x = ( $foo § .a + .b + .c )
:
: to mean
:
: $x = $foo.a + $foo.b + $foo.c

I expect the first thing a Nihongo-jin would do is to change it from
the section sign § to the は "wa" particle that already means "the
preceding is the topic for the following".

$x = ( $foo は .a + .b + .c )

So maybe the ASCII workaround is just "wa". :-) * .5

$x = ( $foo wa .a + .b + .c )

(Yes, Simon, I know that that Japanese only allows wa in the top-level
sentence position, not embedded in subexpressions. ^_^ Perhaps we
should use ga instead...)

Some will argue that since English doesn't have a grammatical
postfix topicalizer like Japanese, we should stick with something
like more English-like:

$x = (.a + .b + .c given $foo)

But that doesn't give us the corresponding assignment op. Interestingly,
modifier "if" and "unless" do in fact have corresponding assignment ops,
namely &&= and ||=.

Or we could use a syntax that is already allowed:

$x = {.a + .b + .c}($foo)

But that doesn't give us a simple topic-in-front form either...

: The assigning version of the operator could be


:
: $x §= .foo;
: my Dog $dog §= .new;

Though I confess I find it hard to imagine persuading people to write:

my Dog $dog wa= .new;

Now, if we had a unary = that assigned to the current topic, we could do
it with the existing topicalizer as

given my Dog $dog { = .new }

But I'm not recommending that approach, because I dislike unary =, and
because I don't want every declaration to have to say "given".

For the sake of argument, let's call our theoretical new operator "wa".
Taking a clue from Japanese, it's perhaps a mistake to think of "wa"
as a binary operator. In Japanese, it's a postpositional particle,
which is to say, a postfix operator. You can, in fact, just say:

Neko wa.

which means something like "consider the cat". On the other hand,
as a natural language Japanese doesn't have to specify the scope of
topicalization, while Perl has to. So it's not clear whether we
should define wa such that

my Dog $dog wa= .new;

is really an assignment operator, or whether it actually
parses more like a postpositional:

(my Dog $dog).wa = .new;

In the latter case, you could use the wa postposition anywhere you
could use the expression before it, because it would set up the
expectation that the next thing is an operator rather than a term.
So you could usefully say something like

$modthingie wa %= .modulus;

or tell a subscript to default to the array as its topic:

@array wa .[.min .. .max]

That would of course read better with a single character postfix:

@array§[.min .. .max]

It's really a pity that question mark is already so overloaded with
boolean connotations, because

$dog? .bark

would really be the best postfix operator in ASCII for this.
People would probably end up writing

my Dog $spot ?= .new;

as an idiom. And

@array?[.min .. .max]

would be the way to get a topicalized subscript. If "wa" were a binary
operator as originally proposed, then you'd have to write that:

@array ? $_[.min .. .max]

or

@array ? .[.min .. .max]

since it's setting up expectation for a term rather than an operator.

On the other hand, if it's a binary operator of a particular precedence,
then the scoping of "wa" is clear. It's not so clear if mark the term
with a postfix. Unless the rule is simply that it governs for the
rest of the entire statement. In which case we'd have people writing

$foo? (
...
)

when they should probably be saying

given $foo {
...
}

But maybe the rule is that it governs the rest of the surrounding
paren scope, much like a "my" declaration does in a block, or like
a list operator eats all the commas to its right.

Not sure I want to explain that though...I certainly haven't explained
it well here. It is certainly a fact that most Perl and C programmers
are more comfortable with infix operators than postfix.

But this is gonna be a weird binary operator, if operator it is.
Its purpose is not just to evaluate the left argument but to bind
it as a temporary topic and then call the right side with that
temporary topic.

Arguably, we already have exactly this operator. It's called "dot".

Ignore for the moment that we've said that .() calls a sub. What
if binary . were our mini-topicalizer?

$topic . (.a + .b + .c)

Then we're back to

my dog $spot .= .new;

and then the dotted subscript form:

@array .[.min .. .max]

would automatically topicalize the subscript. Except that people would
be confused by the difference between that and

@array . [.min .. .max]

which would presumably just produce an anonymous list of subscripts without
actually doing the subscripting.

Hmm. The basic problem is that we're using the same notation to both
create and use the topic. So I think "dot" isn't the operator we want.
Forget that.

Despite the severe overloading problems, it's really gonna be hard
to do much better than

$topic ? (.a + .b + .c)
my dog $spot ?= .new;
@array?.[.min .. .max]

And I do think people would rebel at using Latin-1 for that one.
I get enough grief for «...». :-)

Larry

Brent "Dax" Royal-Gordon

unread,
Mar 12, 2004, 6:33:10 PM3/12/04
to Larry Wall, perl6-l...@perl.org
Larry Wall wrote:
> Now, if we had a unary = that assigned to the current topic, we could do
> it with the existing topicalizer as
>
> given my Dog $dog { = .new }
>
> But I'm not recommending that approach, because I dislike unary =, and
> because I don't want every declaration to have to say "given".

my Dog $dog given= .new;

Where 'given' is 'wa'.

Unfortunately, it's backwards compared to the statement modifiers Perl
already has. That suggests

=.new given my Dog $dog;

but that requires the unary equals you apparently don't like *and* puts
the less important bit on the LHS.

Bah. Just use 'wa' and make the world learn Japanese. :^P

--
Brent "Dax" Royal-Gordon <br...@brentdax.com>
Perl and Parrot hacker

Oceania has always been at war with Eastasia.

Simon Cozens

unread,
Mar 12, 2004, 8:04:12 PM3/12/04
to perl6-l...@perl.org

"Oh, it's got lots of Japanese in it, I'd better read it..." :)

la...@wall.org (Larry Wall) writes:
> Some will argue that since English doesn't have a grammatical
> postfix topicalizer like Japanese, we should stick with something
> like more English-like:
>
> $x = (.a + .b + .c given $foo)

I think I'm missing something here. We have "given" as a perfectly good
topicaliser. Now, I remember harping on a while ago about generalizing the
idea of some control structures returning values, such as $x = if $a { $b }
else { $c };

Now if we do generalise that, we get

$x = given $foo { .a + .b + .c };

which gives us the topic-in-front form, the "given" which is the standard way
of declaring the topic, and it's all nice.

> my Dog $dog wa= .new;

Urgh. This reads like you're topicalising a $dog, assigning to it and
acting on it all at the same time. Too many particles!

my Dog $inu wa ga o new desu; # ? :)

> So you could usefully say something like
>
> $modthingie wa %= .modulus;

Hrm.

given($modthingie) %= .modulus;

might work, but it relies on a few pieces of underlying magic, none of which I
believe to be over-the-top in themselves but taken together may leave a bad
taste:

control structures return a value, as above
given takes an optional block, purely setting the topic if no block
the topic persists throughout a statement

> if operator it is.

I don't think it's an operator so much as a function. It sets the
topic and, depending on how things turn out, returns either void or
the topic again.

--
teco < /dev/audio
- Ignatios Souvatzis

Matt

unread,
Mar 13, 2004, 1:51:18 AM3/13/04
to perl6-l...@perl.org
Please bare with me, I do follow this list, but sporadically.

What it all boils down to, obviously, is that we, as lazy programmers, want
to have to type less, but still leave the code make sense when read. So to
me, that should automatically throw out stuff such as C<$x = ( $foo § .a +
.b + .c )>. Even if you know exactly what it means, you still can't help by
think "huh?" when you read it.

So, do we want to be able to topicalize only the current expression (all
code up to a ;, }, etc)? Or should multiple topicalizations be allowed? If
someone did want to do multiple topicalization, should they maybe consider
that they are stuffing too much code into one line?

So, without further ado, here's a long list of proposals (replace topic with
another more appropriate word maybe.

1. Expression descriptions (sort of like, options for expressions.. maybe
leaves room for other neat tricks too). Although, I don't know if this
would be possible in perl6.
topic $foo: $x = .a + .b + .c

2. An operator with lowest precedence? All other operators have higher
precedence.
$foo topic $x = .a + .b + .c

3. A wildcard morph.. "match me to all dots missing a topic"
$x = $foo* .a + .b + .c

------

This brings me to another "idea" I have.. although I have a feeling you guys
have already thought of it.

Instead of ...
$x = $a + $b + $c + $d;
How about ...
$x = +«$a $b $c $d»

Karl Brodowsky

unread,
Mar 13, 2004, 3:02:50 AM3/13/04
to perl6-l...@perl.org
> And I do think people would rebel at using Latin-1 for that one.
> I get enough grief for «...». :-)

I can imagine that these cause some trouble with people using a charset
other than ISO-8859-1 (Latin-1) that works well with 8 bit, like Greek,
Arabic, Cyrillic and Hebrew.

For these guys Unicode is not so attractive, because it kind of doubles the size
of their files, so I would assume that they tend to do a lot of stuff with their
koi-8 or with some ISO-8859-x not containing the desired character. For «» it
might not be such a problem, because <<>> would work instead.

Maybe this issue could (will?) be addressed by declaring the charset in the
source and using something like (or better than) \u00AB for stuff that this
charset does not have, using a charset-conversion to unicode while parsing
the source. This looks somewhat cleaner to me than just pretending a source
file written in ISO-8859-7 (Greek) were ISO-8859-1 (Latin-1), relying on the
assumption that the two characters we use above 0x80 happen to be in
the same positions 0xab and 0xbb.

Sorry if that is an old story...

Best regards,

Karl

Arcadi Shehter

unread,
Mar 12, 2004, 5:03:35 PM3/12/04
to Larry Wall, perl6-l...@perl.org

Date: Fri, 12 Mar 2004 12:01:10 -0800 Larry Wall wrote:
>It's really a pity that question mark is already so overloaded with
>boolean connotations, because
>
> $dog? .bark
>
>would really be the best postfix operator in ASCII for this.
>People would probably end up writing
>
> my Dog $spot ?= .new;
>
>as an idiom. And
>
> @array?[.min .. .max]
>
>would be the way to get a topicalized subscript.

some time in the past there was a talk about ... ?? ... :: ... operator being
a combination of two binary : ?? and :: . But I dont know the ruling.
If one factorize trinary ??:: to two binary operators,
?? could act a postfix topicalazer while :: becomes binary
operator :
$a :: $b evaluates to left or right argument according to true/false property of the _current topic_
something like infix:::($a,$b){ given CALLER::_ { when .true return $a ; return $b } but it evaluate $b only
if necessary.


Arcadi

Deborah Pickett

unread,
Mar 14, 2004, 10:44:40 PM3/14/04
to Perl 6 Language
On Sat, 13 Mar 2004 05.30, John Siracusa wrote:
> The only case that seems even
> remotely onerous is this one:
>
> my My::Big::Class::Name $obj = My::Big::Class::Name.new();
> vs.
> my My::Big::Class::Name $obj .= new()

There's also the related issue of in-place operations on some
difficult-to-write lvalue:

@{$coderef.("argument").{hashelem}} =
sort @{$coderef.("argument").{hashelem}};

Did I get the text the same both times? What about maintaining that code?
What about side effects on the subroutine I called there?

Someone Damian-shaped will probably come in and point out how to prettify that
using "given", but it still wouldn't be as short as last week's

$coderef.("argument").{hashelem}.self:sort();

Mark J. Reed

unread,
Mar 15, 2004, 8:10:36 AM3/15/04
to perl6-l...@perl.org
On 2004-03-13 at 09:02:50, Karl Brodowsky wrote:
> For these guys Unicode is not so attractive, because it kind of doubles the
> size of their files,

Unicode per se doesn't do anything to file sizes; it's all in how you
encode it. The UTF-8 encoding is not so attractive in locales that make
heavy use of characters which require several bytes to encode therein, or
relatively little use of characters in the ASCII range; but that's why
there are other encoding schemes like SCSU which get you Unicode
compatibility while not taking up much more space than the locale's native
charset.

--
Mark REED | CNN Internet Technology
1 CNN Center Rm SW0831G | mark...@cnn.com
Atlanta, GA 30348 USA | +1 404 827 4754

Luke Palmer

unread,
Mar 15, 2004, 12:00:28 PM3/15/04
to Deborah Pickett, Perl 6 Language
Deborah Pickett writes:
> Someone Damian-shaped will probably come in and point out how to prettify that
> using "given", but it still wouldn't be as short as last week's
>
> $coderef.("argument").{hashelem}.self:sort();

But that still has problems. What's the important thing in this
"sentence"? The way it's written, you'd think C<$coderef>, while the
correct answer (as I see it) is C<sort>.

Assuming we go with =sort as the sort mutator, this could be nicely
written:

=sort $coderef.("argument").{hashelem};

Using indirect-object syntax. Saying C<self:sort> (or C<mutate:sort>,
which I prefer) doesn't obscure it much.

Luke

John Williams

unread,
Mar 15, 2004, 1:56:26 PM3/15/04
to Larry Wall, perl6-l...@perl.org
On Wed, 10 Mar 2004, Larry Wall wrote:
> You subscript hashes with {...} historically, or these days, «...»,
> when you want constant subscripts. So what you're looking for is
> something like:
>
> if / <?foo> ... <?baz> ... { $?foo{'baz'} ... $?baz } .../
> or
> if / <?foo> ... <?baz> ... { $?foo«baz» ... $?baz } .../

I'm probably a bit behind on current thinking, but did %hash{bareword}
lose the ability to assume the bareword is a constant string?

And why «»? Last I heard, that was the unicode version of qw(), which
returns an array. Using an array constructor as a hash subscriptor is
not a "least surprise" to me.

~ John Williams


John Williams

unread,
Mar 15, 2004, 2:10:40 PM3/15/04
to matt, perl6-l...@perl.org
> This brings me to another "idea" I have.. although I have a feeling you guys
> have already thought of it.
>
> Instead of ...
> $x = $a + $b + $c + $d;
> How about ...
> $x = +«$a $b $c $d»

The closest way to what you have written is this:

$x = 0;
$x »+=« ($a, $b, $c, $d);

Or the slightly less attractive (IMHO) syntax invented recently:

$x +=« ($a, $b, $c, $d);

Of course, perl6 will have a built-in reduce function as well (RFC76):

$x = reduce {$^a + $^b} $a, $b, $c, $d;

~ John Williams


Larry Wall

unread,
Mar 15, 2004, 3:00:03 PM3/15/04
to perl6-l...@perl.org
On Mon, Mar 15, 2004 at 12:10:40PM -0700, John Williams wrote:
: Or the slightly less attractive (IMHO) syntax invented recently:

:
: $x +=« ($a, $b, $c, $d);

The latest guess is that we're not using lopsided ones for binary ops, but
only for unary ops.

Larry

Larry Wall

unread,
Mar 15, 2004, 3:05:01 PM3/15/04
to perl6-l...@perl.org
On Sat, Mar 13, 2004 at 12:03:35AM +0200, arcadi shehter wrote:
: some time in the past there was a talk about ... ?? ... :: ... operator being
: a combination of two binary : ?? and :: . But I dont know the ruling.
: If one factorize trinary ??:: to two binary operators,
: ?? could act a postfix topicalazer while :: becomes binary
: operator :
: $a :: $b evaluates to left or right argument according to true/false property of the _current topic_
: something like infix:::($a,$b){ given CALLER::_ { when .true return $a ; return $b } but it evaluate $b only
: if necessary.

By that argument, ordinary "if" should topicalize, and I don't think
we should be confusing boolean evaluation with topicalization.
Conditionals are so often "smaller" than the current topic.
We shouldn't force topicalizing on what is essentially a bit. Usually
it's some minor aspect of the current topic that we're testing,
and narrowing the scope of the topic is not what the user will want
or expect.

Larry

Larry Wall

unread,
Mar 15, 2004, 3:22:50 PM3/15/04
to perl6-l...@perl.org
On Fri, Mar 12, 2004 at 03:47:57AM -0500, Austin Hastings wrote:
: > -----Original Message-----
: }

You probably have to decide whether that's a multi sub or a multi method,
at least syntactically. Though it may not matter which you pick because...

: "hello".scramble();


: String::scramble("hello"); # Way overspecified for a multi...

For a single argument function, those probably come out to the same thing
in the end. With more arguments you have to start distinguishing single
dispatch from multiple dispatch.

Larry

Larry Wall

unread,
Mar 15, 2004, 4:10:28 PM3/15/04
to perl6-l...@perl.org
On Fri, Mar 12, 2004 at 05:32:33AM -0500, Austin Hastings wrote:
: Boo, hiss.

:
: Two things:
:
: 1- I'd rather use "inplace" than self.

What is this "place" thing? I want the object to do something to itself
reflexively, which may or may not involve places...

: 2- I'd rather it be AFTER, than BEFORE, the name, because


:
: method sort
: method sort:inplace
:
: reads, and more importantly SORTS, better than
:
: method inplace:sort
: method sort

Well, that's a point that has some weight, but...

: To wit:

But you can put the declarations in whatever order you want regardless
of the default collation order, so this is kind of a weak argument unless
you're writing a tool, in which case you could certainly ignore whatever
prefixes you like when you sort.

: (Note also that :...fix sorts better than in-, post-, and pre-. I'd like to


: suggest changing
: them, since it costs nothing and results in a mild improvement in automation
: behavior.)

The main reason those are out front is to reduce grammatical ambiguity
when you use an operator sub name where a term is expected. That is,
it's harder to parse:

$x = +:foo();

correctly than

$x = foo:+();

Even with sort:inplace, how does the parser know that the name is
"sort:inplace", or whether the user is trying to pass a pair :inplace
to the "sort" routine? The foo:-style prefixes are "reserved" in
general, but the :foo pair is not, and it would be a mistake to start
introducing reserved pairs, I think, much like the Perl 5 problem
that ARGV is truly global while ARGH is not.

: > > Another interesting question, if the "postfix:.=foo" mess is defined


: > > with as self:foo, should infix:+= be defined as self:+ instead?
: > > In other words, should the <op>= syntax really be a metasyntax like
: > > hyperoperators, where you never actually have to define a C<»+«>
: > > operator, but the hyperoperator is always autogenerated from ordinary
: > > C<+>? So basically any infix:<op>= gets remapped to self:<op>.
: >
: > I think that would be cleaner.
:
: Alternatively, is there a valid reason to *need* to define your own
: hyperoperator?
:
: That is, can you code C< @a +« $x > better than C< @a.map {return $_ + $x}
: >? I suspect that it's possible to do so, particularly for such simple cases
: as assignment. (Hint: Persistent objects, database, one SQL statement per
: assignment.)
:
: So perhaps I should ask for an :infix:=« operator override?

I'm extremely leary of giving people the right to break the "hyper"
contract. Which they will do left and right if given opportunity.
The hyper contract says

@a »operator« @b

always applies the operator in parallel to corresponding elements.
I don't want people using »« for other higher-order mathmatical
operators that just happen to work on arrays of multiple elements.
They can find their own operators for that...

But if there were a way of keeping the contract while giving the
flexibility to the implementor, that would be okay.

: > > On the other hand, it also means that


: > > someone can say silly things like:
: > >
: > > $a cmp= $b
: > > $a ~~= $b
: > >
: > > I suppose we could simply disallow meta-= on non-associating operators.
: > > Can anyone come up with a non-associating binary operator that *should*
: > > have an assignment operator? The basic definition of non-associating
: > > seems to be that the type of the arguments is incompatible with the
: > > type produced by the operator. Which is precisely the problem with
: > > something like
: > >
: > > $a cmp= $b
: > >
: > > insofar as $a is being treated as a string at one moment and as a
: boolean
: > > at the next.
: >
: > I think it's "merely" a philosophical problem.
: >
: > After all, we don't complain when people write:
: >
: > $a = $a cmp $b;
: >
: > So should we complain when people write exactly the same thing, only as:
: >
: > $a cmp= $b;
: >
: > Stylistically, they're equally as abhorrent, but Perl users aren't
: expecting
: > the Stylish Inquisition.
: >
:
: So long as you allow modifiers in there, as:
:
: $a cmp:i:n=8= $b;
:
: I'm okay with it. Good luck with <=>=

We're thinking that adverbs on infix ops have to go after the right arg:

$a cmp= $b :i:n(8);

That is, op modifiers are only allowed where an op is expected. Otherwise
we can't use a general :foo() syntax for pairs when terms are expected.

Larry

Karl Brodowsky

unread,
Mar 15, 2004, 6:28:32 PM3/15/04
to perl6-l...@perl.org
Mark J. Reed wrote:

> Unicode per se doesn't do anything to file sizes; it's all in how you
> encode it.

Yes. And basically there are common ways to encode this: utf-8 and utf-16
(or similar variants requiring >= 2 bytes per character)

> The UTF-8 encoding is not so attractive in locales that make
> heavy use of characters which require several bytes to encode therein, or
> relatively little use of characters in the ASCII range;

utf-8 is fine for languages like German, Polish, Norwegian, Spanish, French,...
which have >= 90% of the text with ASCII-7-bit-characters.

> but that's why
> there are other encoding schemes like SCSU which get you Unicode
> compatibility while not taking up much more space than the locale's native
> charset.

These make sense for languages like Japanese, Korean, Chinese etc, where you need
more than one byte per character anyway.

But Russian, Greek, Hebrew, Arabic, Armenian and Georgian would work fine with one
byte per character. But the kinds of of encoding that I can think of both make
this two bytes per character. So for these I see file sizes doubled. Or do I
miss something? Anyway, it will be necessary to specify the encoding of unicode in
some way, which could possibly allow even to specify even some non-unicode-charsets.

IMHO the OS should provide a standard way to specify such a charset as a file attribute,
but usually it does not and it won't in the future, unless the file comes through the
network and has a Mime-Header.

Best regards,

Karl

Dan Sugalski

unread,
Mar 15, 2004, 6:21:17 PM3/15/04
to perl6-l...@perl.org
At 12:28 AM +0100 3/16/04, Karl Brodowsky wrote:
>Anyway, it will be necessary to specify the encoding of unicode in
>some way, which could possibly allow even to specify even some
>non-unicode-charsets.

While I'll skip diving deeper into the swamp that is character sets
and encoding (I'm already up to my neck in it, thanks, and I don't
have any long straws handy :) I'll point out that the above statement
is meaningless--there *are* no Unicode non-unicode charsets.

It is possible to use the UTF encodings on non-unicode charsets--you
could reasonably use UTF-8 to encode, say, Shift-JIS characters.
(where Shift-JIS is both an encoding and a character set, and it can
be separated into pieces)

It's not unwise (and, in practice, at least in implementation quite
sensible) to separate the encoding from the character set, but you
need to be careful to keep the separation clear, though many of the
sets and encodings don't go out of their way to help with that.
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Dan Sugalski

unread,
Mar 15, 2004, 6:42:02 PM3/15/04
to mark.a...@comcast.net, perl6-l...@perl.org
At 11:36 PM +0000 3/15/04, mark.a...@comcast.net wrote:
>Another possibility is to use a UTF-8 extended system where you use
>values over 0x10FFFF to encode temporary code block swaps in the
>encoding. I.e.,
>some magic value means the one byte UTF-8 codes now mean the Greek block
>instead of the ASCII block.

You could do that, but then I'd be forced to do something well and
truly horrible to you, and we'd rather not have that. :)

Character set and encoding are metadata, and ought be stored
out-of-band, at least once the data makes it into your program.
Twiddling the internal representation of the bytes is a fairly
sub-optimal way to do that, so I'd as soon not mandate that we have
to. (I do dislike publically breaking mandates like that. Terribly
inconvenient)

Joe Gottman

unread,
Mar 15, 2004, 8:36:23 PM3/15/04
to Perl 6 Language

----- Original Message -----
From: "Deborah Pickett" <deb...@csse.monash.edu.au>
To: "Perl 6 Language" <perl6-l...@perl.org>
Sent: Sunday, March 14, 2004 10:44 PM
Subject: Re: Mutating methods


> On Sat, 13 Mar 2004 05.30, John Siracusa wrote:
> > The only case that seems even
> > remotely onerous is this one:
> >
> > my My::Big::Class::Name $obj = My::Big::Class::Name.new();
> > vs.
> > my My::Big::Class::Name $obj .= new()
>
> There's also the related issue of in-place operations on some
> difficult-to-write lvalue:
>
> @{$coderef.("argument").{hashelem}} =
> sort @{$coderef.("argument").{hashelem}};
>
> Did I get the text the same both times? What about maintaining that code?
> What about side effects on the subroutine I called there?
>
> Someone Damian-shaped will probably come in and point out how to prettify
that
> using "given", but it still wouldn't be as short as last week's
>
> $coderef.("argument").{hashelem}.self:sort();


Why not just do
@{$_} = sort @{$_} given $coderef.("argument").{hashelem};

Joe Gottman


Larry Wall

unread,
Mar 15, 2004, 9:36:53 PM3/15/04
to perl6-l...@perl.org
On Mon, Mar 15, 2004 at 11:56:26AM -0700, John Williams wrote:

: On Wed, 10 Mar 2004, Larry Wall wrote:
: > You subscript hashes with {...} historically, or these days, «...»,
: > when you want constant subscripts. So what you're looking for is
: > something like:
: >
: > if / <?foo> ... <?baz> ... { $?foo{'baz'} ... $?baz } .../
: > or
: > if / <?foo> ... <?baz> ... { $?foo«baz» ... $?baz } .../
:
: I'm probably a bit behind on current thinking, but did %hash{bareword}
: lose the ability to assume the bareword is a constant string?

It's thinking hard about doing that. :-)

: And why «»? Last I heard that was the unicode version of qw(), which


: returns an array. Using an array constructor as a hash subscriptor is
: not a "least surprise" to me.

We'd be trading that surprise for the surprise that %hash{shift} doesn't
call C<shift>. Plus we get literal hash slices out of it for free.
Plus it also works on pair syntax :foo«some literal words». And probably
trait and property syntax as well.

And basically because I decided :foo('bar') is too ugly for something
that will get used as often as switches are on the unix command line.
The %hash syntax is just a fallout of trying to be consistent with
the pair notation. Once people start seeing :foo«bar» all over,
they won't find %hash«bar» surprising at all, and will appreciate the
self-documenting literalness of argument.

And unfortunately it's an unavoidable part of my job description to
decide how people should be surprised. :-)

Larry

Larry Wall

unread,
Mar 15, 2004, 9:49:31 PM3/15/04
to Perl 6 Language
On Mon, Mar 15, 2004 at 08:36:23PM -0500, Joe Gottman wrote:
:
: ----- Original Message -----

Er, let me guess. Because it still wouldn't be as short as last week's

$coderef.("argument").{hashelem}.self:sort();

maybe? :-)

My other guesses are: the end-weight problem, the forward reference
on the multiple pronouns, the fact that you still need to recognize
that the argument is the same as the result, or the general grottiness
of @{$_} as a deref syntax compared to dot. Did I get it right?

Larry

Luke Palmer

unread,
Mar 15, 2004, 9:54:09 PM3/15/04
to perl6-l...@perl.org
Larry Wall writes:
> On Mon, Mar 15, 2004 at 11:56:26AM -0700, John Williams wrote:
> : On Wed, 10 Mar 2004, Larry Wall wrote:
> : > You subscript hashes with {...} historically, or these days, «...»,
> : > when you want constant subscripts. So what you're looking for is
> : > something like:
> : >
> : > if / <?foo> ... <?baz> ... { $?foo{'baz'} ... $?baz } .../
> : > or
> : > if / <?foo> ... <?baz> ... { $?foo«baz» ... $?baz } .../
> :
> : I'm probably a bit behind on current thinking, but did %hash{bareword}
> : lose the ability to assume the bareword is a constant string?
>
> It's thinking hard about doing that. :-)
>
> : And why «»? Last I heard that was the unicode version of qw(), which
> : returns an array. Using an array constructor as a hash subscriptor is
> : not a "least surprise" to me.
>
> We'd be trading that surprise for the surprise that %hash{shift} doesn't
> call C<shift>.

And how often is that, compared with the frequency of doing
%hash{optname}, %hash{root}, etc? %hash{~shift} isn't so bad, and it's
one of those quirks that people will just have to learn. Perl has
plenty of those, and people don't mind once they know them. I'm
grateful for most of them, now that I'm more experienced.

It's one of those things you'll have to mention when you talk about hash
subscripting, that's all. After all, why isn't a hash subscripted with
[]? And why isn't the inside of a subscript a closure? These are as
"inconsistent" as quoting a single {word}.

> Plus we get literal hash slices out of it for free. Plus it also
> works on pair syntax :foo«some literal words». And probably trait and
> property syntax as well.
>
> And basically because I decided :foo('bar') is too ugly for something
> that will get used as often as switches are on the unix command line.
> The %hash syntax is just a fallout of trying to be consistent with
> the pair notation. Once people start seeing :foo«bar» all over,
> they won't find %hash«bar» surprising at all, and will appreciate the
> self-documenting literalness of argument.

Except the six extra keystrokes involved in typing them. Yeah, I know,
I could reduce that, but Perl's already used up the rest of the
keyboard! :-)

Luke

Larry Wall

unread,
Mar 15, 2004, 11:05:19 PM3/15/04
to perl6-l...@perl.org
On Mon, Mar 15, 2004 at 07:54:09PM -0700, Luke Palmer wrote:
: Larry Wall writes:
: > On Mon, Mar 15, 2004 at 11:56:26AM -0700, John Williams wrote:
: > : On Wed, 10 Mar 2004, Larry Wall wrote:
: > : > You subscript hashes with {...} historically, or these days, «...»,
: > : > when you want constant subscripts. So what you're looking for is
: > : > something like:
: > : >
: > : > if / <?foo> ... <?baz> ... { $?foo{'baz'} ... $?baz } .../
: > : > or
: > : > if / <?foo> ... <?baz> ... { $?foo«baz» ... $?baz } .../
: > :
: > : I'm probably a bit behind on current thinking, but did %hash{bareword}
: > : lose the ability to assume the bareword is a constant string?
: >
: > It's thinking hard about doing that. :-)
: >
: > : And why «»? Last I heard that was the unicode version of qw(), which
: > : returns an array. Using an array constructor as a hash subscriptor is
: > : not a "least surprise" to me.
: >
: > We'd be trading that surprise for the surprise that %hash{shift} doesn't
: > call C<shift>.
:
: And how often is that, compared with the frequency of doing
: %hash{optname}, %hash{root}, etc? %hash{~shift} isn't so bad, and it's
: one of those quirks that people will just have to learn. Perl has
: plenty of those, and people don't mind once they know them. I'm
: grateful for most of them, now that I'm more experienced.

Yeah, and guess who had to trust his gut-level feelings for that? :-)

: It's one of those things you'll have to mention when you talk about hash


: subscripting, that's all. After all, why isn't a hash subscripted with
: []? And why isn't the inside of a subscript a closure? These are as
: "inconsistent" as quoting a single {word}.

Well, all context dependencies are created equal, but some are more
equal than others...

: > Plus we get literal hash slices out of it for free. Plus it also


: > works on pair syntax :foo«some literal words». And probably trait and
: > property syntax as well.
: >
: > And basically because I decided :foo('bar') is too ugly for something
: > that will get used as often as switches are on the unix command line.
: > The %hash syntax is just a fallout of trying to be consistent with
: > the pair notation. Once people start seeing :foo«bar» all over,
: > they won't find %hash«bar» surprising at all, and will appreciate the
: > self-documenting literalness of argument.
:
: Except the six extra keystrokes involved in typing them. Yeah, I know,
: I could reduce that, but Perl's already used up the rest of the
: keyboard! :-)

It's really the visual disambiguation that convinces me. It's extra
keystrokes for me too, y'know, and my finger joints complain to me
every day. I mostly just ignore 'em, but they do make a fearful
crackling most of the time... But my eyes are bad too, so I have
to cater to them too...

But really, it's my brain that's my weakest organ, and I can't fix that,
so I figure I'll just have to feature it.

Larry

Mark A Biggar

unread,
Mar 15, 2004, 6:36:08 PM3/15/04
to Dan Sugalski, perl6-l...@perl.org
Another possibility is to use a UTF-8 extended system where you use values over 0x10FFFF to encode temporary code block swaps in the encoding. I.e.,
some magic value means the one byte UTF-8 codes now mean the Greek block
instead of the ASCII block. But you would need broad agreement for that to
work. As Dan said this really need a separation between encoding and character set.

--
Mark Biggar
mark.a...@comcast.net

Mark J. Reed

unread,
Mar 16, 2004, 9:57:06 AM3/16/04
to perl6-l...@perl.org
On 2004-03-16 at 00:28:32, Karl Brodowsky wrote:
> Mark J. Reed wrote:
>
> >Unicode per se doesn't do anything to file sizes; it's all in how you
> >encode it.
>
> Yes. And basically there are common ways to encode this: utf-8 and utf-16
> (or similar variants requiring >= 2 bytes per character)

There are many ways to encode it. UCS-4/UTF-32 (4 bytes per character),
UCS-2/UTF-16 (2 bytes for 80% of all currently-defined characters, 4 bytes
for the rarely-used 20% that lie outside the Basic Multilingual Plane),
UTF-8 (1 byte for ASCII, 2 bytes for code points U+0080 through U+07FF,
3 bytes for code points U+0800 through U+FFFF, 4 bytes outside the BMP)

> >there are other encoding schemes like SCSU which get you Unicode
> >compatibility while not taking up much more space than the locale's native
> >charset.
>
> These make sense for languages like Japanese, Korean, Chinese etc, where
> you need more than one byte per character anyway.

No. You have mischaracterized or misunderstood the situation. UTF-8 is
*not* the only encoding that requires as little as one byte per
character. That is why I specifically mentioned SCSU - it provides a
sliding "window" accessible via single byte offsets. In SCSU, *any*
128-byte portion of the Unicode range, not just the part corresponding
to US-ASCII, may be represented by a series of single bytes. It adds a
small amount of overhead for code-switching, but in general file sizes
are very close to what you get with the corresponding national character
set, while still allowing the ability to escape out of that range and
include any Unicode character.

> Anyway, it will be necessary to specify the encoding of
> unicode in some way, which could possibly allow even to specify even some
> non-unicode-charsets.

There are no non-Unicode charsets from the Unicode standpoint. National
charsets are just encodings of Unicode - incomplete encodings, since
only a subset of code points is representable, but encodings
nevertheless. Making this possible is the reason Unicode has characters
that are redundant with sequences using combining forms: every character
which exists as a unique character in some established character set
also exists as a unique character in Unicode.

James Mastros

unread,
Mar 16, 2004, 10:56:46 AM3/16/04
to perl6-l...@perl.org, Karl Brodowsky
Karl Brodowsky wrote:
> Mark J. Reed wrote:
>> The UTF-8 encoding is not so attractive in locales that make
>> heavy use of characters which require several bytes to encode therein, or
>> relatively little use of characters in the ASCII range;
>
> utf-8 is fine for languages like German, Polish, Norwegian, Spanish,
> French,...
> which have >= 90% of the text with ASCII-7-bit-characters.

Add perl to that list, by the way. I rather strongly suspect that most
perl code will consist mostly of 7-bit characters. (Even perl code
written by traditional-Chinese-speakers (and I pick on traditional
Chinese only because it has a very large character repituar -- one of
the reasons there's a "simplified" variant).)

>> but that's why
>> there are other encoding schemes like SCSU which get you Unicode
>> compatibility while not taking up much more space than the locale's
>> native charset.
>
> These make sense for languages like Japanese, Korean, Chinese etc, where
> you need more than one byte per character anyway.
>
> But Russian, Greek, Hebrew, Arabic, Armenian and Georgian would work
> fine with one byte per character. But the kinds of of encoding that I can
> think of both make this two bytes per character. So for these I see
> file sizes doubled. Or do I miss something?

Yes. You're missing the fact that SCSU is a very good encoding of
Unicode. http://www.unicode.org/reports/tr6/#Examples

In general, SCSU is one byte per character, except when switching
between half-blocks (that is, 0x7f contiguous characters), which take
one additional byte -- except switching between a single half-block and
ASCII. Thus, most of your second list of languages take one byte per
character for most code, and two bytes for encoding « and ». Hebrew,
Greek and Arabic take one additional byte (for the whole file) to encode
what half-block that the non-ASCII characters fall into. (Arabic and
Cyrillic are in default blocks.)

The first list of languages is hard to predict -- it changes depending
on how often you change between the different Japanese alphabets (and
pseudoalphabet), for example. Their example Japanese input compresses
to about 1.5 bytes per character.

(Note that SCSU is really an encoding, if it claims to be or not.)

> Anyway, it will be
> necessary to specify the encoding of unicode in some way, which could possibly
> allow even to specify even some non-unicode-charsets.

By the way, there is (should be) nothing that is encodable in a
non-Unicode character set that is not encodable in (any encoding of)
Unicode. That's where the "uni" bit comes from. If there is, it's
means that Unicode is not fulfilling it's design goals.

> IMHO the OS should provide a standard way to specify such a charset as a
> file attribute,
> but usually it does not and it won't in the future, unless the file
> comes through the
> network and has a Mime-Header.

I think the answer is multi-fold.

0) Auto-detect the encoding in the compiler, if a U+FFEF signature, or a
#! signature, is found at the beginning of the input. (If there is a
FFEF signature, it should get thrown away after it is recognized. It
may be possible to recoginze on "package" or "module" as well, and
possibly even on "#".)
1) Beleive what the underling FS/OS/transport tells us. (This is likely
to be a constant for many OSes, possibly selectable at the compiler's
compile-time. It's the encoding on the end of the content-type for HTTP
and other MIME-based transports.)
2) Support a "use encoding 'foo'" similar to that in recent perl5s: It
states the encoding that the file it appears in is written in.

(the higher-numbered sources of encoding information override the former
ones.)

John Williams

unread,
Mar 16, 2004, 3:08:16 PM3/16/04
to Larry Wall, perl6-l...@perl.org

And I suppose it's my job to ask silly questions and give muddled
feedback, so you know which bits confuse us users. :)


According to E7 :foo«bar» is exactly the same as foo=>'bar', and I should
even be able to say

%hash = ( :id0(3) :id1«foo» :id2{ :sub :ack } );

if I wanted to abuse the syntax that much. I'm ok with that, but it
doesn't seem like the sort of thing we should actually encourage users to
do.

But the syntax is really :key«val» vs %hash«key» so the key placement
is an inconsistency.

We also have :key(expr) vs %hash{expr}

:key{expr} (ala E7) is really something completely different, and perhaps
even unexpected when one gets used to the «» similarity.

One could nit-pik more key-vs-value inconsistencies, but I think
:key{expr} will turn out out to be the worst one.


No doubt you have already considered all of this, but it won't hurt for me
to try to "think like Larry" once in a while.

Something "like a unix command line option" implies :key=val might be
desirable. So :key=val could do both-side quoting, and :key=>expr could do
left-side quoting.

However the lack of commas between options rather rules out the above
because it strongly implies that :foo => val is allowed, when it really
isn't because the option ends at the whitespace after :foo . So the
values really need to go into some sort of subscript-like operator which
is expected to be attached directly to :foo(). And it should also be
quote-like if we want it to autoquote the right-side as well.

Possible quote-like operators are

:foo'val' :foo"val" :foo«val»

«» is the most subscript-like there, so I can see how you got there.


And I suppose that is when «» starting being considered as a subscript
operator in other places.

Getting free slicing and dicing with %hash«cut two 3» instead of
%hash{«cut 2 three»} may be nice, but what about %hash{'one big key'}?
or :key('one big value') for that matter?

Does %hash«key» = @x; put @x in list context or scalar context?
Because it could be a single hash value or a slice with one member.

I can think of a few places where something is definitely a hash-key:

key => 'value' # autoquote hash key (if simple identifier)
:key # autoquote hash key (if simple identifier)
%hash{key} # perl5 autoquote hash key if simple identifier
%hash«key» # perl6 quote hash key if non-whitespace

So, we would be replacing the quoting rule in the 3rd case (which works
the same as the first two[*]) with a different (stronger quoting, but
different) style of quoting the 4th case.
[* well, overlooking that :(expr) isn't allowed]

But it also creates a correlation between the hash-key in %hash«key» and
the hash-value in :key«value», which will probably result in a lot of
explaining why :key{expr} is different from %hash{expr}.

It's your call of course, but I'm not yet convinced of the need to change
the semantics of %hash{key}. Which usually just means that I lack
enlightenment on the subject. :)

~ John Williams


Karl Brodowsky

unread,
Mar 16, 2004, 4:17:57 PM3/16/04
to James Mastros, perl6-l...@perl.org
Dear All,

from what has been written by others, there are enough useful encodings other
than utf-8, utf-16/UCS-2 and UCS-4 that support efficient storage even
for unicode-files whose contents are Greek, Cyrillic, etc.. Sorry for the confusion
caused by the fact that I was not aware of these.

>> utf-8 is fine for languages like German, Polish, Norwegian, Spanish,
>> French,...
>> which have >= 90% of the text with ASCII-7-bit-characters.

> Add perl to that list, by the way. I rather strongly suspect that most
> perl code will consist mostly of 7-bit characters. (Even perl code
> written by traditional-Chinese-speakers (and I pick on traditional
> Chinese only because it has a very large character repituar -- one of
> the reasons there's a "simplified" variant).)

My experience would be that Perl-programs do contain local language and thus
local characters which might be outside of ISO-646-IRV (7-bit-ASCII) for
String-literals and for comments.

> By the way, there is (should be) nothing that is encodable in a
> non-Unicode character set that is not encodable in (any encoding of)
> Unicode. That's where the "uni" bit comes from. If there is, it's
> means that Unicode is not fulfilling it's design goals.

Yes, we can consider any file to be unicode with some encoding. That is
how the Java-guys do it, with the restriction that they don't easily let
you choose anything other than latin-1 + \ucafe-stuff for non-latin-1
characters (or maybe I didn't bother, because latin-1/ISO-8859-1 works
fine for me).

>> IMHO the OS should provide a standard way to specify such a charset as
>> a file attribute,
>> but usually it does not and it won't in the future, unless the file
>> comes through the
>> network and has a Mime-Header.

> I think the answer is multi-fold.

> 0) Auto-detect the encoding in the compiler, if a U+FFEF signature, or a
> #! signature, is found at the beginning of the input. (If there is a
> FFEF signature, it should get thrown away after it is recognized. It
> may be possible to recoginze on "package" or "module" as well, and
> possibly even on "#".)

With FFFE and FEFF this seems obvious. In case of #! it would not be clear
to me if this defaults to ISO-8859-1 (latin-1) or to utf-8. See HTML
vs. XHTML as an example where the default has been changed.

> 1) Beleive what the underling FS/OS/transport tells us. (This is likely
> to be a constant for many OSes, possibly selectable at the compiler's
> compile-time. It's the encoding on the end of the content-type for HTTP
> and other MIME-based transports.)

I understand that the FS/OS do not really tell us, at least neither for
Unix/Linux nor for NT/Windows. Relying on environment variables or locale
settings looks dangerous to me, because it breaks programs that worked fine
in environment A, when you run them elsewhere or it imposes restrictions
how to setup these environment variables. It could be ok for one-liners
run from the command line like this
ls *.JPG|perl -p -e 's/(.*\.)JPG$/mv $1JPG $1jpg/;' |grep mv |sh
stuff. This would work fine even for shell scripts, because they would have
to set the appropriate environment variables for themselves, thus disregarding
any user settings. Probably something additional like PERL_DEFAULT_ENCODING,
because otherwise we might get clashes with (other) regular use of locale-settings.

In cases where the OS or FS really has a capability to provide encoding on a
per file basis as a file attribute or in cases where the file comes from the
network with a mime-header, your suggestion should be perfect.

> 2) Support a "use encoding 'foo'" similar to that in recent perl5s: It
> states the encoding that the file it appears in is written in.

Yes, that looks like the right way to do it. And it eliminates part of the
concerns for 1), if it is assumed that this line use encoding is kind of required
in every non-trivial perl-source. Btw. this is the encoding of the perl-source-code
itself, files that are processed by perl I/O could off course have any encoding.

> (the higher-numbered sources of encoding information override the former
> ones.)

Yes, off course. 0) and 2) are obvious, but 1) might need to be dealt with carefully.

Best regards,

Karl

Larry Wall

unread,
Mar 16, 2004, 5:15:51 PM3/16/04
to perl6-l...@perl.org
On Tue, Mar 16, 2004 at 10:17:57PM +0100, Karl Brodowsky wrote:
: With FFFE and FEFF this seems obvious. In case of #! it would not be clear

: to me if this defaults to ISO-8859-1 (latin-1) or to utf-8. See HTML
: vs. XHTML as an example where the default has been changed.

Perl 6 would certainly try to default to utf-8 rather than latin-1.

: >1) Beleive what the underling FS/OS/transport tells us. (This is likely

: >to be a constant for many OSes, possibly selectable at the compiler's
: >compile-time. It's the encoding on the end of the content-type for HTTP
: >and other MIME-based transports.)
:
: I understand that the FS/OS do not really tell us, at least neither for
: Unix/Linux nor for NT/Windows. Relying on environment variables or locale
: settings looks dangerous to me, because it breaks programs that worked fine
: in environment A, when you run them elsewhere or it imposes restrictions
: how to setup these environment variables. It could be ok for one-liners
: run from the command line like this
: ls *.JPG|perl -p -e 's/(.*\.)JPG$/mv $1JPG $1jpg/;' |grep mv |sh
: stuff. This would work fine even for shell scripts, because they would have
: to set the appropriate environment variables for themselves, thus
: disregarding
: any user settings. Probably something additional like
: PERL_DEFAULT_ENCODING,
: because otherwise we might get clashes with (other) regular use of
: locale-settings.
:
: In cases where the OS or FS really has a capability to provide encoding on a
: per file basis as a file attribute or in cases where the file comes from the
: network with a mime-header, your suggestion should be perfect.

If the metadata can be trusted, then we'll trust the metadata.
Otherwise, Perl 6 will attempt heuristics only if it recognizes a file
with high bits set that can't possibly be any common Unicode encoding
(where that could include SCSU, I suppose, if it starts with 0e fe ff).

: >2) Support a "use encoding 'foo'" similar to that in recent perl5s: It

: >states the encoding that the file it appears in is written in.
:
: Yes, that looks like the right way to do it. And it eliminates part of the
: concerns for 1), if it is assumed that this line use encoding is kind of
: required
: in every non-trivial perl-source. Btw. this is the encoding of the
: perl-source-code
: itself, files that are processed by perl I/O could off course have any
: encoding.

Yes, and you change those from the defaults with other pragmas or options.

: >(the higher-numbered sources of encoding information override the former

: >ones.)
:
: Yes, off course. 0) and 2) are obvious, but 1) might need to be dealt with
: carefully.

Yes, 1) needs to be split into 1) "metadata" and -1) "heuristics".
1) is only reliable metadata, which specifically does NOT include
environment variables. If the metadata is only guessing, it's better
to assume 0) and only if that fails regress to any -1) heuristics.

Larry

Larry Wall

unread,
Mar 16, 2004, 4:50:54 PM3/16/04
to perl6-l...@perl.org
On Tue, Mar 16, 2004 at 08:40:50PM +0200, arcadi shehter wrote:
: How about <- which is not overloaded by boolean connotations
: and is sort of ? turned by 90 degrees .

Don't think so. It's too ambiguous with current meanings.

: $topic<- (.a + .b + .c)

That asks if $topic is numerically less than the negated return value
of the sum of ($_.a + $_.b + $_.c).

: my dog $spot<- = .new

That's currently a syntax error.

: @array<- .[.min .. .max]

That asks if the length of @array is less than the negated value of
$_[$_.min .. $_.max].

Larry

Arcadi Shehter

unread,
Mar 16, 2004, 1:40:50 PM3/16/04
to Larry Wall, perl6-l...@perl.org
Larry Wall writes:
>
> Despite the severe overloading problems, it's really gonna be hard
> to do much better than
>
> $topic ? (.a + .b + .c)
> my dog $spot ?= .new;
> @array?.[.min .. .max]
>
> And I do think people would rebel at using Latin-1 for that one.
> I get enough grief for «...». :-)
>
> Larry

How about <- which is not overloaded by boolean connotations
and is sort of ? turned by 90 degrees .

$topic<- (.a + .b + .c)


my dog $spot<- = .new

@array<- .[.min .. .max]

arcadi

Gordon Henriksen

unread,
Apr 10, 2004, 10:57:11 AM4/10/04
to Larry Wall, perl6-l...@perl.org

I'm just catching up, and really rather late to the party, but I just
though perhaps worth pointing out that a very simple solution to the
{call()} vs. {bareword} ambiguity, the {"string literal"}, is indeed
fewer keystrokes and less surprise (at least for a Perl 5 programmer)
and less context dependence than «»-is-a-subscript-now-too.

Ba-a-ah,

Gordon Henriksen
mali...@mac.com

0 new messages