Some questions about miltislices and other things

2 views
Skip to first unread message

Ryan Richter

unread,
Mar 2, 2007, 1:33:50 PM3/2/07
to perl6-l...@perl.org
As a little Perl 6 exercise I translated the Perl 5 Markov chain /
dissociated-press script from _The Practice of Programming_:

http://cm.bell-labs.com/cm/cs/tpop/markov.pl

Here's my Perl 6 attempt, with support for different prefix lengths.

my $n = 2; # prefix length
my $max = 10000; # max output words
my %state;
my $o = 0;
my @w = [;] "\n" xx $n # initial sate
does { method adv(@w: $i){ @w <== $i; @w.shift; } };

for <>.slurp.split { # read each word of input
%state{@@w}.push: $_;
@w.adv: $_; # advance chain
}
%state{@@w}.push: "\n"; # add tail

@w »=» "\n"; # reset initial state
while ($_ = %state{@@w}.pick).say {
last if /\n/ or $o++ >= $max;
@w.adv: $_; # advance chain
}

Since pugs doesn't seem to have full multislice support yet, can anyone
comment on whether I have the syntax and semantics right here?

In particular:

- Does shift do what I'm expecting it to do with a multidim array?

- Am I using the @ and @@ sigils correctly? I think I could just use @@
everywhere (right?), but I tried to use it only where needed. Do I
need an @@ on the method invocant or anywhere else?

- Does @w »=» "\n" leave the multidimensionality intact?

- As I understand it, the way I have it written the anonymous mixin is
applied to the value in @w and not to the @w container. If I did
something like @w = 1,2,3 that would clobber the mixin, right? Does
the hyperop-assignment avoid that?

- Is there some concise syntax for attaching the mixin to the container,
like

my @w is (Array does {...}) = ...;

Is this the correct non-golf way to do it?

Role R {...}
class A is Array { does R }
my @w is A = ...;

- Can I call methods on <> or do I have to say $*ARGS?

- Can for <>.slurp.split {...} be trusted to not use a huge amount of
memory for large inputs?

Any commentary or suggestions for improvement are welcomed.

-ryan (rhr on #perl6)

Larry Wall

unread,
Mar 12, 2007, 8:41:27 PM3/12/07
to perl6-l...@perl.org
On Fri, Mar 02, 2007 at 01:33:50PM -0500, Ryan Richter wrote:
: As a little Perl 6 exercise I translated the Perl 5 Markov chain /

: dissociated-press script from _The Practice of Programming_:
:
: http://cm.bell-labs.com/cm/cs/tpop/markov.pl
:
: Here's my Perl 6 attempt, with support for different prefix lengths.
:
: my $n = 2; # prefix length
: my $max = 10000; # max output words
: my %state;
: my $o = 0;
: my @w = [;] "\n" xx $n # initial sate
: does { method adv(@w: $i){ @w <== $i; @w.shift; } };

Note, as of today, append feed is now <<== rather than <==.

: for <>.slurp.split { # read each word of input


: %state{@@w}.push: $_;
: @w.adv: $_; # advance chain
: }
: %state{@@w}.push: "\n"; # add tail
:
: @w »=» "\n"; # reset initial state
: while ($_ = %state{@@w}.pick).say {

You maybe don't want each word on a separate line...

: last if /\n/ or $o++ >= $max;


: @w.adv: $_; # advance chain
: }
:
: Since pugs doesn't seem to have full multislice support yet, can anyone
: comment on whether I have the syntax and semantics right here?
:
: In particular:
:
: - Does shift do what I'm expecting it to do with a multidim array?

I have my doubts. Overloading that word to mean shifting the entire
dimension seems like it could cause conceptual collisions down the
road. There a good chance it should just shift the value out of the
first Capture without removing the Capture from the list of Captures.

And a shaped array might be different yet, shifting off the entire slice
corresponding to first element of the first dimension.

But in either case I wouldn't expect it to remove a dimension. Now,
it's possible that @@w.shift would remove the dimension, since it
treats the slice Capture as an individual element.

Rather than trying to maintain @w as a multidimensional array where
each slice ever only has one value, why not just use ordinary shift
and push on an ordinary array, and then %state{[;]@w} instead? Should
still have the theoretical advantage of storing the keys in a tree over
duplicating all the prefix keys. (For that reason I wouldn't go as far
as to recommend a hash keyed on tuples, which presumably would duplicate
prefixes.)

: - Am I using the @ and @@ sigils correctly? I think I could just use @@


: everywhere (right?), but I tried to use it only where needed. Do I
: need an @@ on the method invocant or anywhere else?

Should only need it where you want to distinguish chunky from smooth,
at least according to the current definition. But I'm not entirely
sure we aren't confusing things here.

: - Does @w »=» "\n" leave the multidimensionality intact?

Absolutely, that's part of why we distinguished the dwimmy variant from
the non-dwimmy.

: - As I understand it, the way I have it written the anonymous mixin is


: applied to the value in @w and not to the @w container. If I did
: something like @w = 1,2,3 that would clobber the mixin, right? Does
: the hyperop-assignment avoid that?

I don't think you want the mixin on the value, since @w.adv will be called
on the container, not on the value. An array container is not required
to be able to treat its value as a single object.

In any case, by precedence you're applying the mixin to result of
the xx, not the result of the [;], which makes @w.adv even less
likely to work.

: - Is there some concise syntax for attaching the mixin to the container,


: like
:
: my @w is (Array does {...}) = ...;

Would probably work without the parens if we got the parsing right, though
you might have to be explicit about the anonymous role:

my @w is Array does role {...} = ...;

And since the @ implies "is Array" by default,

my @w does role {...} = ...;

might be made to work too.

Also, it's not clear how much we prefer "but" over "does" when attempting
to modify values, but that doesn't apply here.

: Is this the correct non-golf way to do it?


:
: Role R {...}
: class A is Array { does R }
: my @w is A = ...;

Should work if you don't capitalize role.

: - Can I call methods on <> or do I have to say $*ARGS?

I don't know if an empty list knows how to slurp. I think the <> is special
to just the prefix:<=> operator.

: - Can for <>.slurp.split {...} be trusted to not use a huge amount of
: memory for large inputs?

No, any .slurp is pretty much guaranteed to use memory. Also, split no
longer has a default. You probably want

for $*ARGS.comb {...}

Larry

Ryan Richter

unread,
Mar 12, 2007, 10:48:41 PM3/12/07
to perl6-l...@perl.org
On Mon, 12 Mar 2007 17:41:27 -0700, Larry Wall wrote:
> On Fri, Mar 02, 2007 at 01:33:50PM -0500, Ryan Richter wrote:
> : while ($_ = %state{@@w}.pick).say {
>
> You maybe don't want each word on a separate line...

They did that in the original for some reason. In the unlikely event
that I want to sit down and read the output I can use |fmt :)

> : - Does shift do what I'm expecting it to do with a multidim array?
>
> I have my doubts. Overloading that word to mean shifting the entire
> dimension seems like it could cause conceptual collisions down the
> road. There a good chance it should just shift the value out of the
> first Capture without removing the Capture from the list of Captures.
>
> And a shaped array might be different yet, shifting off the entire slice
> corresponding to first element of the first dimension.
>
> But in either case I wouldn't expect it to remove a dimension. Now,
> it's possible that @@w.shift would remove the dimension, since it
> treats the slice Capture as an individual element.

OK, that sounds good.

> Rather than trying to maintain @w as a multidimensional array where
> each slice ever only has one value, why not just use ordinary shift
> and push on an ordinary array, and then %state{[;]@w} instead? Should
> still have the theoretical advantage of storing the keys in a tree over
> duplicating all the prefix keys. (For that reason I wouldn't go as far
> as to recommend a hash keyed on tuples, which presumably would duplicate
> prefixes.)

Yeah, I wrote it that way to begin with. The version I posted was sort
of designed to maximize the number of language questions.

> : - Am I using the @ and @@ sigils correctly? I think I could just use @@
> : everywhere (right?), but I tried to use it only where needed. Do I
> : need an @@ on the method invocant or anywhere else?
>
> Should only need it where you want to distinguish chunky from smooth,
> at least according to the current definition. But I'm not entirely
> sure we aren't confusing things here.

What do you mean by that?

> : - As I understand it, the way I have it written the anonymous mixin is
> : applied to the value in @w and not to the @w container. If I did
> : something like @w = 1,2,3 that would clobber the mixin, right? Does
> : the hyperop-assignment avoid that?
>
> I don't think you want the mixin on the value, since @w.adv will be called
> on the container, not on the value. An array container is not required
> to be able to treat its value as a single object.
>
> In any case, by precedence you're applying the mixin to result of
> the xx, not the result of the [;], which makes @w.adv even less
> likely to work.

Yeah, that was a really weird idea... I have a lot of those in case you
can't tell :)

Should we just assume that any of those things replaces the immutable
list and therefore clobbers any mixins? I.e. a mixin on a value is part
of the immutableness?

> : - Can for <>.slurp.split {...} be trusted to not use a huge amount of
> : memory for large inputs?
>
> No, any .slurp is pretty much guaranteed to use memory. Also, split no

Right, I expected an eager slurp but a lazy split (comb). I envisioned
an eager split/comb consuming much more memory than an eager slurp, but
I guess that doesn't make much sense.

> longer has a default. You probably want
>
> for $*ARGS.comb {...}

Perfect, thanks.

-ryan

Reply all
Reply to author
Forward
0 new messages