Translitteration and combining strings and array references

Peter Makholm

unread,

Oct 14, 2005, 2:38:55 AM10/14/05

to perl6-l...@perl.org

Yesterday I spend some hours getting pugs to understand
translitterations with multiple ranges in each pair. E.g.

"foobar".trans( "a-z" => "n-za-n" );

By accident I tested something like:

"foobar".trans( ['a' .. 'z'] => "n-za-m" );

and it didn't work.

The problem is that ['a' .. 'z'] gets stringified to 'a b c d ...'
which gets 'b' translated to the third letter in the right hand side.

Is this supposed to work and if so how should the code differ between

"foobar".trans( ['a' .. 'b'] => '12'); # a=>1, b=>2
"foobar".trans( "a b" => "123" ) # a=>1, ' '=>2, b=>3

Same problem ocurs if left hand side is a string and right hand side
is an array reference but in this case the code implementing trans can
see it.

--
Peter Makholm | Why does the entertainment industry wants us to
pe...@makholm.net | believe that a society base on full surveillance
http://hacking.dk | is bad?
| Do they have something to hide?

Larry Wall

unread,

Oct 14, 2005, 1:43:59 PM10/14/05

to perl6-l...@perl.org

On Fri, Oct 14, 2005 at 08:38:55AM +0200, Peter Makholm wrote:
: Yesterday I spend some hours getting pugs to understand

: translitterations with multiple ranges in each pair. E.g.
:
: "foobar".trans( "a-z" => "n-za-n" );
:
: By accident I tested something like:
:
: "foobar".trans( ['a' .. 'z'] => "n-za-m" );
:
: and it didn't work.
:
: The problem is that ['a' .. 'z'] gets stringified to 'a b c d ...'
: which gets 'b' translated to the third letter in the right hand side.

Hmm, why is stringification getting involved at all? We're intending
transliteration to work with multi-codepoint sequences of various
sorts, so the canonical representation of the data structure can't
be simple strings.

Actually, it looks like the bug is probably that => is forcing stringification
on its left argument too agressively. It should only do that for an identifier.

One other quibble is that we're switching ranges in character classes to
use ".." instead of "-", so trans should use the same convention.

: Is this supposed to work and if so how should the code differ between

:
: "foobar".trans( ['a' .. 'b'] => '12'); # a=>1, b=>2
: "foobar".trans( "a b" => "123" ) # a=>1, ' '=>2, b=>3

Actually, the [...] is somewhat gratuitous. Should work with parens too.
In fact, it should work with a bare range object on the left:

"foobar".trans( 'a' .. 'b' => '12'); # a=>1, b=>2

: Same problem ocurs if left hand side is a string and right hand side

: is an array reference but in this case the code implementing trans can
: see it.

Overzealous => stringification, I think. .trans can use a string as
a list by splitting it, but the underlying structures must be lists.

Thanks for working on this! Do you know any more people like you? :-)

Larry

Juerd

unread,

Oct 14, 2005, 7:27:58 PM10/14/05

to perl6-l...@perl.org

Larry Wall skribis 2005-10-14 10:43 (-0700):

> Actually, it looks like the bug is probably that => is forcing
> stringification on its left argument too agressively. It should only
> do that for an identifier.

Would it work to call this process autoquoting, instead of
stringification? I'm assuming other means of stringification do not
involve interpreting barewords.

> One other quibble is that we're switching ranges in character classes to
> use ".." instead of "-", so trans should use the same convention.

Wasn't there going te be a feature to trans entire strings? i.e. 'foo'
=> 'bar', where foo is a single thing replaced by bar. Does this not
exclude any possibility of specifying ranges in strings?

Juerd
--
http://convolution.nl/maak_juerd_blij.html
http://convolution.nl/make_juerd_happy.html
http://convolution.nl/gajigu_juerd_n.html

Peter Makholm

unread,

Oct 14, 2005, 2:49:50 PM10/14/05

to perl6-l...@perl.org

la...@wall.org (Larry Wall) writes:

> On Fri, Oct 14, 2005 at 08:38:55AM +0200, Peter Makholm wrote:
> : Yesterday I spend some hours getting pugs to understand
> : translitterations with multiple ranges in each pair. E.g.
> :
> : "foobar".trans( "a-z" => "n-za-n" );
> :
> : By accident I tested something like:
> :
> : "foobar".trans( ['a' .. 'z'] => "n-za-m" );
> :
> : and it didn't work.
> :
> : The problem is that ['a' .. 'z'] gets stringified to 'a b c d ...'
> : which gets 'b' translated to the third letter in the right hand side.
>
> Hmm, why is stringification getting involved at all? We're intending
> transliteration to work with multi-codepoint sequences of various
> sorts, so the canonical representation of the data structure can't
> be simple strings.
>
> Actually, it looks like the bug is probably that => is forcing stringification
> on its left argument too agressively. It should only do that for an identifier.

The code I'm lookin at is in pugs/src/perl6/Prelude.pm around line 380:

method trans (Str $self: *%intable) is primitive is safe {

my sub expand (Str $string is copy) {
...
}

my sub expand_arrayref ( $arr is copy ) {
...
}

my %transtable;
for %intable.kv -> $k, $v {
# $k is stringified by the => operator.
my @ks = expand($k);
my @vs = $v.isa(Str) ?? expand($v) !! expand_arrayref($v);
%transtable{@ks} = @vs;
}

[~] map { %transtable{$_} // $_ } $self.split('');
}

> One other quibble is that we're switching ranges in character classes to
> use ".." instead of "-", so trans should use the same convention.

Ok.

> : Is this supposed to work and if so how should the code differ between
> :
> : "foobar".trans( ['a' .. 'b'] => '12'); # a=>1, b=>2
> : "foobar".trans( "a b" => "123" ) # a=>1, ' '=>2, b=>3
>
> Actually, the [...] is somewhat gratuitous. Should work with parens too.
> In fact, it should work with a bare range object on the left:
>
> "foobar".trans( 'a' .. 'b' => '12'); # a=>1, b=>2

Works too.

> Thanks for working on this! Do you know any more people like you? :-)

No, after seeing what happend to Lintilla I've kept clear from cloning
companies.

--
Peter Makholm | What if:
pe...@makholm.net | IBM bought Xenix from Microsoft instead of buying
http://hacking.dk | DOS?

Larry Wall

unread,

Oct 14, 2005, 7:52:47 PM10/14/05

to perl6-l...@perl.org

On Fri, Oct 14, 2005 at 08:49:50PM +0200, Peter Makholm wrote:
: The code I'm lookin at is in pugs/src/perl6/Prelude.pm around line 380:

:
: method trans (Str $self: *%intable) is primitive is safe {
:
: my sub expand (Str $string is copy) {
: ...
: }
:
: my sub expand_arrayref ( $arr is copy ) {
: ...
: }
:
: my %transtable;
: for %intable.kv -> $k, $v {
: # $k is stringified by the => operator.

Interesting comment. I wonder if it's true. The key shouldn't
be stringified by => unless it's an identifier, but even if => is
behaving itself, shoving it into a hash key will stringify for
a normal hash.

: my @ks = expand($k);

: my @vs = $v.isa(Str) ?? expand($v) !! expand_arrayref($v);

Nit: that probably wants to be an MMD dispatch eventually so we can handle
things that aren't quite Str or Array.

: %transtable{@ks} = @vs;

: }
:
: [~] map { %transtable{$_} // $_ } $self.split('');

: }

I think the sig is abusing slurpy hashes, which are really intended
only to sop up unbound named arguments, not a list of object pairs
like this. Filtering through an unshaped hash is going to force
string context on the keys. So either we need to declare a shape
on the hash that allows for non-Str keys (which I'm not sure Pugs
implements yet), or we need to protect these pairs from being processed
as named parameters.

Also, if we go with the syntactic definition of named args we've
been discussing lately on p6l, we'll need to put an extra set of
parens around the pair list, or prefix with "<==" to force it into
the list zone, or pass inside [...]. (And for syntactic named args,
a => probably *should* be enforcing string context on the key.)

Larry

Larry Wall

unread,

Oct 14, 2005, 8:17:48 PM10/14/05

to perl6-l...@perl.org

On Sat, Oct 15, 2005 at 01:27:58AM +0200, Juerd wrote:
: Larry Wall skribis 2005-10-14 10:43 (-0700):

: > Actually, it looks like the bug is probably that => is forcing
: > stringification on its left argument too agressively. It should only
: > do that for an identifier.
:
: Would it work to call this process autoquoting, instead of
: stringification?

Yes, autoquoting is what => is supposed to do to its left argument,
if it's an identifier.

: I'm assuming other means of stringification do not
: involve interpreting barewords.

Depends on whether you count the stringification done by a slurpy
hash on its keys as "other means of stringification". That seems
to be what's going on here, even though the left side of => isn't
an identifier.

: > One other quibble is that we're switching ranges in character classes to

: > use ".." instead of "-", so trans should use the same convention.
:
: Wasn't there going te be a feature to trans entire strings? i.e. 'foo'
: => 'bar', where foo is a single thing replaced by bar. Does this not
: exclude any possibility of specifying ranges in strings?

Doesn't seem like a big problem. Presumably if you ever really want
to translate to or from "..", you can make it an endpoint of its own
pair in the pair list. On the other hand, I could see ".." happening
accidentally in a string occasionally. And one can presumably
construct ranges with a real ".." operator. So maybe the non-quote
form of tr/// should always use lists, with a helper function to
translate "a..z" to a list and also carp about the fact that it will
break under Unicode. :-)

Or maybe the argument is always a list of pair of (list or string),
in which case we know the string at that level can be interpreted for
"..", but a string within a sublist can't.

If someone wants to work over the interface for consistency and
flexibility, that'd be fine.

Larry

Peter Makholm

unread,

Oct 15, 2005, 1:08:42 PM10/15/05

to perl6-l...@perl.org

la...@wall.org (Larry Wall) writes:

> : my %transtable;
> : for %intable.kv -> $k, $v {
> : # $k is stringified by the => operator.
>
> Interesting comment. I wonder if it's true.

That was my attempt to explain the observations I did. Clearly I put
the blame the wrong place.

> : my @ks = expand($k);
> : my @vs = $v.isa(Str) ?? expand($v) !! expand_arrayref($v);
>
> Nit: that probably wants to be an MMD dispatch eventually so we can handle
> things that aren't quite Str or Array.

Right, I'm going to fix that.

>
> : %transtable{@ks} = @vs;
> : }
> :
> : [~] map { %transtable{$_} // $_ } $self.split('');
> : }

[...]

> Also, if we go with the syntactic definition of named args we've
> been discussing lately on p6l, we'll need to put an extra set of
> parens around the pair list, or prefix with "<==" to force it into
> the list zone, or pass inside [...]. (And for syntactic named args,
> a => probably *should* be enforcing string context on the key.)

Ok, r7622 changed something about how the method gets called and that
broke most of the examples in S05. I'll probally turn my attention
somewhere else until I have a more stable understanding of what
happens.

--
Peter Makholm | I laugh in the face of danger. Then I hide until
pe...@makholm.net | it goes away
http://hacking.dk | -- Xander

Nicholas Clark

unread,

Oct 15, 2005, 4:32:35 PM10/15/05

to perl6-l...@perl.org

On Fri, Oct 14, 2005 at 05:17:48PM -0700, Larry Wall wrote:

> form of tr/// should always use lists, with a helper function to
> translate "a..z" to a list and also carp about the fact that it will
> break under Unicode. :-)

And EBCDIC.

The dinosaurs are not extinct yet. I guess that they are trying to out-live
Perl 4.

Nicholas Clark

David Formosa

unread,

Oct 15, 2005, 11:46:25 PM10/15/05

to perl6-l...@perl.org

On Fri, 14 Oct 2005 08:38:55 +0200, Peter Makholm <pe...@makholm.net> wrote:
> Yesterday I spend some hours getting pugs to understand
> translitterations with multiple ranges in each pair. E.g.
>
> "foobar".trans( "a-z" => "n-za-n" );
>
> By accident I tested something like:
>
> "foobar".trans( ['a' .. 'z'] => "n-za-m" );
>
> and it didn't work.

It's a bug. When Pugs gets anyhashes that bug will go away. Can you
add it in to the errors.

--
Please excuse my spelling as I suffer from agraphia. See
http://dformosa.zeta.org.au/~dformosa/Spelling.html to find out more.
Free the Memes.

Eric

unread,

Oct 17, 2005, 6:26:52 PM10/17/05

to dfor...@dformosa.zeta.org.au, perl6-l...@perl.org

On 16 Oct 2005 03:46:25 -0000, David Formosa (aka ? the Platypus) <

Actually its been fixed already. Of course i think the whole thing was then
broken again, i was planning on checking it out sometime tonight after
someone else figures out the current $?SELF being undeclared bug. ;) BTW it
doesn't need any hash i just needed pairs which it had for about an hour
before things changed again. lol. so is development on pugs I guess, here
today, gone tomorrow, back again another day.

--
--
__________
Eric Hodges

Peter Makholm

unread,

Oct 18, 2005, 5:54:47 AM10/18/05

to perl6-l...@perl.org

eri...@gmail.com (Eric) writes:

>> On Fri, 14 Oct 2005 08:38:55 +0200, Peter Makholm <pe...@makholm.net>
>> wrote:
>> > Yesterday I spend some hours getting pugs to understand
>> > translitterations with multiple ranges in each pair. E.g.

> Actually its been fixed already. Of course i think the whole thing was then

> broken again, i was planning on checking it out sometime tonight after
> someone else figures out the current $?SELF being undeclared bug.

Saturday I think everything but the test suite was working in some
way. It needed some changes in the syntax like explicit parens around
pairs and the testsuite wasn't corrected for this.

Would the prober thing be to update the tests for the syntax that is
working and maybe add a extra test for the syntax as speced?

I planed to look at it but found something to read instead.

> ;) BTW it doesn't need any hash i just needed pairs which it had for
> about an hour before things changed again. lol. so is development on
> pugs I guess, here today, gone tomorrow, back again another day.

Yeah, very fun way to confuse a newbie into perl6-hacking but thanks
for all you help.

--
Peter Makholm | Yes, you can fight it, but in the end the ultimate
pe...@makholm.net | goal of life is to have fun
http://hacking.dk | -- Linus Torvalds

Eric

unread,

Oct 18, 2005, 12:00:21 PM10/18/05

to Peter Makholm, perl6-l...@perl.org

I have a suggestion/proposal/whatever.

I am just starting to get a grasp of uses for pairs and where they are
handy. Working on string.trans some showed that it would be useful to have
the function accept a list of pairs. That was working until the fix for
magical pairs went through and now the pairs in the call are treated as
named arguments. After some discussion with iblech and looking at the
Synopsis it looks like *%hash will slurp up named args into a hash and
*@array slurp up extra parameters. *%hash could work for trans except it
stringifies the left (or magic quotes or whatever the term is) and looses
order. Both of those are significant to trans and possibly other uses for
lists of pairs. So I was wondering if we could have a signature that meant
to slurp up named args and put them in a list as pairs. For now I suggest
**@array, because it has the flattening connotation already and we are sort
of flattening a list of named args into a list of pairs.

The biggest issue I see with this is that i don't know how the key value of
named args is handled and if it is handled too soon to then be useful in a
pair.

Currently we (can|will be able to) do

"string".trans( (['h','e'] => "0") );
"string".trans( <== ['h','e'] => "0");

Those are fine and i can live with that, but it seems that if we made the
signature of trans

method trans(Str $self: **@intable) {};

Then we could just do plain old:
"string".trans(['h','e'] => "0");

Which to me seems a lot easier to read. I would propose that it only effects
named params so that there is no concern about pairs in values and how to
handle them.

Luke Palmer

unread,

Oct 18, 2005, 1:57:09 PM10/18/05

to Eric, Peter Makholm, perl6-l...@perl.org

On 10/18/05, Eric <eri...@gmail.com> wrote:
> Currently we (can|will be able to) do
>
> "string".trans( (['h','e'] => "0") );
> "string".trans( <== ['h','e'] => "0");
>
> Those are fine and i can live with that, but it seems that if we made the
> signature of trans
>
> method trans(Str $self: **@intable) {};
>
> Then we could just do plain old:
> "string".trans(['h','e'] => "0");
>
> Which to me seems a lot easier to read. I would propose that it only effects
> named params so that there is no concern about pairs in values and how to
> handle them.

Uh, no. Certainly not for a method. For a bare sub that has been
predeclared it may be possible. But we don't want to remagicalize
pairs after we just argued the heck out of it to make pairs *always*
be named parameters.

The way I'd do the interface is like this:

"string".trans([
<h e> => "0",
]);

It looks nicer if you use the indirect object form:

trans "string": [
<h e> => "0",
];

Luke

Juerd

unread,

Oct 18, 2005, 2:02:46 PM10/18/05

to perl6-l...@perl.org

Luke Palmer skribis 2005-10-18 11:57 (-0600):

> It looks nicer if you use the indirect object form:
> trans "string": [
> <h e> => "0",
> ];

It'd also look very nice with optional parens:

"string".trans [ <h e> => "0" ];

Or is it not yet time to resuggest that? :)

Eric

unread,

Oct 18, 2005, 2:41:55 PM10/18/05

to Luke Palmer, Peter Makholm, perl6-l...@perl.org

On 10/18/05, Luke Palmer <lrpa...@gmail.com> wrote:
>
> Uh, no. Certainly not for a method. For a bare sub that has been
> predeclared it may be possible. But we don't want to remagicalize
> pairs after we just argued the heck out of it to make pairs *always*
> be named parameters.

My thought was that it wouldn't be much different than *%hash as a signature
except you wouldn't loose order and the keys wouldn't me mashed. Is what I'm
suggesting more magical in someway? I freely admit it might be a bad idea, I
just wasn't sure why and thought i might bring it up since this seems
different than the magical ness of pairs before.

TSa

unread,

Oct 19, 2005, 7:48:08 AM10/19/05

to perl6-l...@perl.org

HaloO,

Luke Palmer wrote:
> It looks nicer if you use the indirect object form:
>
> trans "string": [
> <h e> => "0",
> ];

Given the right interpretation this just looks like
a typed label selection in a multi method.

multi trans
{
Str $x: ...; return;

Int $x: ...; return;

...; return;
}

Is this definitional form supported? To me it nicely
unifies the indirect object syntax---and postfix colon
in general---with labels.
--

TSa

unread,

Oct 19, 2005, 9:02:14 AM10/19/05

to perl6-l...@perl.org

HaloO,

Juerd wrote:
> Luke Palmer skribis 2005-10-18 11:57 (-0600):
>
>>It looks nicer if you use the indirect object form:
>> trans "string": [
>> <h e> => "0",
>> ];
>
>
> It'd also look very nice with optional parens:
>
> "string".trans [ <h e> => "0" ];
>
> Or is it not yet time to resuggest that? :)

I like it. Given enough Meta Information---namely the structural
arrow type---the .trans could be parsed as postfix op that returns
a prefix op. Otherwise you get a 'two terms in a row' *syntax* error!

(($ &) $)

The left item is actually calculated at compile time from string
interpolation. The $ on the right is an itemized pair. Further
expanded we get

((&.($) & :$)

or perhaps

(&.($).&.(:$)

BTW, lets assume the non-invocant param of .trans were called $foo.
Would in the above case +($foo.key) == 2? And I guess the parens
could be dropped because .key binds tighter than prefix:<+>, right?
I mean the type of the key in the pair is an array of compile time
strings. Or is that not preserved?
--

Larry Wall

unread,

Oct 19, 2005, 6:21:05 PM10/19/05

to perl6-l...@perl.org

On Wed, Oct 19, 2005 at 01:48:08PM +0200, TSa wrote:
: HaloO,

We would certainly not define a new syntax just for that. But it could
easily fall out of something resembling Luke's tuple proposal in
conjuction with a switch, assuming we add signature matching:

multi trans (*$_) {
when :(Str $x) { ... }
when :(Int $x) { ... }
default { ... }
}

give or take a bit of signature notation. Maybe something with pointies:

multi trans (*$_) {
when -> Str $x { ... }
when -> Int $x { ... }
default { ... }
}

but we'd have to tell "when" not to look for a second block after
evaluating its condition, and the pointy would have to be smart enough
to realize it was being passed a tuple/arg list and not just try to
bind it to the first parameter. Personally, I think the situation
will arise seldom enough that special syntax is not warranted, and a
general sig match via the smart operator is sufficient. Though that
approach also mandates a bit of kludginess to make sure the signatures
bindings persist across the block and no further. Though maybe that
falls out naturally if we include optional arrow on case values and
assume an implicit *$_ as the when target:

multi trans (*$_) {
when *$_ -> Str $x { ... }
when *$_ -> Int $x { ... }
default { ... }
}

In that case, the only oddity is that the selection of the case includes
the success of binding it to the argument. And also the extra implied *
on the case argument to flatten the tuple.

This is also all subject to the notion that we might have types that
are essentially named type tuples, probably as cases in a union type.
In such cases you can match the type name against the entire tuple
and then bind the args without typing them:

multi trans (Tree *$_) {
when Leaf -> $x { ... }
when Node -> $x,$y,$z { ... }
default { !!! }
}

But I'm still working my way through oodles of possible syntaxes
for declarative tree types and parametric types, so don't quote
me on this. And in general people will want to declare these as
separate subs unless they have some particular reason for wanting to
force evaluation order, which is somewhat suspect in the first place
if you're using declared tree types.

Larry

Larry Wall

unread,

Oct 19, 2005, 6:24:26 PM10/19/05

to perl6-l...@perl.org

On Wed, Oct 19, 2005 at 03:02:14PM +0200, TSa wrote:
: HaloO,

:
: Juerd wrote:
: >Luke Palmer skribis 2005-10-18 11:57 (-0600):
: >
: >>It looks nicer if you use the indirect object form:
: >> trans "string": [
: >> <h e> => "0",
: >> ];
: >
: >
: >It'd also look very nice with optional parens:
: >
: > "string".trans [ <h e> => "0" ];
: >
: >Or is it not yet time to resuggest that? :)
:
: I like it. Given enough Meta Information---namely the structural
: arrow type---the .trans could be parsed as postfix op that returns
: a prefix op. Otherwise you get a 'two terms in a row' *syntax* error!
:
: (($ &) $)

I think it'd be somewhat unfortunate if we started trying to guess which
.trans was eventually going to get called and change the parse based
on that.

: The left item is actually calculated at compile time from string

: interpolation. The $ on the right is an itemized pair. Further
: expanded we get
:
: ((&.($) & :$)
:
: or perhaps
:
: (&.($).&.(:$)

This notation is unfamiliar to me.

: BTW, lets assume the non-invocant param of .trans were called $foo.

: Would in the above case +($foo.key) == 2? And I guess the parens
: could be dropped because .key binds tighter than prefix:<+>, right?
: I mean the type of the key in the pair is an array of compile time
: strings. Or is that not preserved?

Yes, that should be preserved, seems to me.

Larry