Re: Pipeline Performance

Aaron Sherman

unread,

Sep 1, 2004, 10:29:49 AM9/1/04

to Sean O'Rourke, Perl6 Language List

On Tue, 2004-08-31 at 14:11, Sean O'Rourke wrote:
> At Tue, 31 Aug 2004 13:23:04 -0400,
> a...@ajs.com (Aaron Sherman) wrote:
> > I would think you actually want to be able to define grep, map, et al.
> > in terms of the mechanism for unraveling, and just let the optimizer
> > collapse the entire pipeline down to a single map.
>
> Even for map and grep this is a bit trickier, since map can produce
> zero or more values for each input value, and calls its body in list
> context, whereas grep produces zero or one value, and gets called in
> scalar context.

You're confusing two stages of grep's operation. grep's body is called
in boolean context, but that has NOTHING to do with the return value,
which is a list of zero or one elements like so:

sub grep (&code, *@list) {
map { if code($_) { $_ } else { () } } @list;
}

That said, I skipped over a MAJOR point in my original reply, and it
must be said that in the case of map, you need iterators in order to do
the job correctly. map actually needs to return an object which, while
it can be treated as a list, is actually just an iterator object which
remembers the input list and transform closure.

Once you do that, you get cheap pipelining, as the major cost isn't the
construction of the temporary lists in Perl 5, it's POPULATING the
temporary lists. Constructing an iterator in Perl 6 means no population.

--
☎ 781-324-3772
✉ a...@ajs.com
☷ http://www.ajs.com/~ajs

Rod Adams

unread,

Aug 31, 2004, 10:36:25 PM8/31/04

to Aaron Sherman, Perl6 Language List

Aaron Sherman wrote:

>On Mon, 2004-08-30 at 16:34, Rod Adams wrote:
>
>
>
>>@x = @y ==> map lc ==> grep length == 4;

>>
>>
>
>I would think you actually want to be able to define grep, map, et al.
>in terms of the mechanism for unraveling, and just let the optimizer
>collapse the entire pipeline down to a single map.
>

>To propose one way of doing it (and really just a simple example off the
>top of my head which may not be the best idea...):
>
> macro grep(&cond, *@list) is mapper {
> if cond($_) { ($_) } else { () }
> }
> macro map(&cond, *@list) is mapper {
> cond($_)
> }
>
>Which would do two things:
>
>1. Define a subroutine of the same name that does the full map:
>
> sub grep (&cond, *@list) is mapper {
> my @result;
> for @list -> $_ { push @result, $_ if cond($_) }
> return @result;
> }
>
>2. Populates an optimizer table with just the macro form
>
>When you see C<grep {...} map {...} grep {...}> now, you can just
>consult that table and determine how much of the pipeline can be
>transformed into a single map.
>
>There, you're done. No more pipeline overhead from list passing.
>
>Now, of course you still have things like sort where you cannot operate
>on single elements, but those cases are more difficult to correctly
>optimize, and lazy lists won't help those either.
>
>
>
So if I was to write my own function, and wanted it to be in the middle
of a pipeline, but I wanted it to have a lazy evaluation feature, how
would I write that?

For reference, let's use the following problem:

Take as input a list of numbers. For every five numbers, return the
median element of that grouping. If the total list length is not
divisible by five, discard the rest. (Not the most useful, but this is
just example code).

Basic sub to do this:

sub MediansBy5 (*@list) {
my @result;
while @list.length >= 5 {
push @result, (sort @list.splice(0,5))[2];
}
return @result;
}

I would suspect that this would not be evaluated in a lazy manner,
especially because I'm pooling the results in @result.

One solution I see to this would be to have a "lazy return" of some
kind, where you can send out what results you have so far, but not
commit that your execution is over and still allow further results to be
posted. For lack of better word coming to mind, I'll call a "lazy
return" C<emit>.

Example above becomes:

sub MediansBy5 (*@list) {
while @list.length >= 5 {
emit (sort @list.splice(0,5))[2];
}
}

And then you could have the relatively simple:

sub grep (&cond, *@list) {
for @list -> $_ { emit $_ if cond($_); }
}

sub map (&func, *@list) {
for @list -> $_ { emit func($_); }
}

Thoughts?

-- Rod Adams

Sean O'Rourke

unread,

Aug 31, 2004, 2:10:35 PM8/31/04

to Perl6 Language List

At Tue, 31 Aug 2004 13:23:04 -0400,
a...@ajs.com (Aaron Sherman) wrote:

> I would think you actually want to be able to define grep, map, et al.
> in terms of the mechanism for unraveling, and just let the optimizer
> collapse the entire pipeline down to a single map.

Even for map and grep this is a bit trickier, since map can produce

zero or more values for each input value, and calls its body in list
context, whereas grep produces zero or one value, and gets called in

scalar context. So you'd need something like a full call and return
prototype for each mapping function, e.g.:

Function Return context Argument context
-------- -------------- ----------------
-> $a { $a + 2 } ($y) ($x)
grep(&block) ($y is optional) ($x)
map(&block) (*@result) ($arg)

Then your loop merging macro could deconstruct these into the
appropriate kind of loop (using foreach and pushing single items only
to make intention clear):

@a ==> map &b ==> @c
==>
foreach $a (@a) { foreach $b (map_item(&b, $a)) { push @c, $b } }

@a ==> (&b = -> $a { $a + 2 }) ==> @c
==>
foreach $a (@a) { push @c, b($a) }

@a ==> grep \&b ==> @c
==>
foreach $a (@a) { foreach $b (grep_item(&b, $a)) { push @c, $b } }

where "map_item" and "grep_item" are the single-element mapper
functions defining map and grep. I think that both the context and
the number of items consumed/produced could be gathered from
prototypes, so the only restrictions for mapping functions would be
(1) having a prototype available at definition time, and (2) being
side-effect-free.

/s

Larry Wall

unread,

Aug 30, 2004, 6:20:00 PM8/30/04

to perl6-l...@perl.org

On Mon, Aug 30, 2004 at 03:34:20PM -0500, Rod Adams wrote:
: My question is, is there anything that can be done within Perl 6 to help
: alleviate this issue.

All lists function lazily if they can in Perl 6.

Larry

Jonadab the Unsightly One

unread,

Sep 18, 2004, 6:40:33 PM9/18/04

to perl6-l...@perl.org

Rod Adams <r...@rodadams.net> writes:

> One solution I see to this would be to have a "lazy return" of some
> kind, where you can send out what results you have so far, but not
> commit that your execution is over and still allow further results to
> be posted. For lack of better word coming to mind, I'll call a "lazy
> return" C<emit>.
>
> Example above becomes:
>
> sub MediansBy5 (*@list) {
> while @list.length >= 5 {
> emit (sort @list.splice(0,5))[2];
> }}

That has a certain elegance.

It is worth noting that emit is shorter than return, so people are
going to be tempted to use it when they don't really need the
laziness. Obviously it won't be a full substitute for return because
by definition it doesn't have the (often desirable) side-effect of
terminating the current function. But I can see people writing things
like this...

sub foo (some arguments) {
do_stuff();
emit some value(s);
}

At the right brace there there is (if I understand correctly what
hasn't changed from Perl5) an implicit return, which returns the value
of the last evaluated thing in the block, which in this case would be
the emit. So, a couple of questions:

1. What if anything do you propose is the evaluation value of emit?
2. Should a subsequent implicit return behave differently than usual
if some values have already been emitted?

If the answer to question 1 is a more-or-less nonexistant value, then
question 2 might lose its importance.

One other question: Would emit behave differently if the sub is
called in a non-list context, such as void context or a scalar
context?

--
$;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}}
split//,"ten.thgirb\@badanoj$/ --";$\=$ ;-> ();print$/

Luke Palmer

unread,

Sep 18, 2004, 6:56:34 PM9/18/04

to Jonadab the Unsightly One, perl6-l...@perl.org

Jonadab the Unsightly One writes:
> Rod Adams <r...@rodadams.net> writes:
>
> > One solution I see to this would be to have a "lazy return" of some
> > kind, where you can send out what results you have so far, but not
> > commit that your execution is over and still allow further results to
> > be posted. For lack of better word coming to mind, I'll call a "lazy
> > return" C<emit>.
> >
> > Example above becomes:
> >
> > sub MediansBy5 (*@list) {
> > while @list.length >= 5 {
> > emit (sort @list.splice(0,5))[2];
> > }}

That's actually a very good idea. That's why Perl 6 has it :-)

sub MediansBy5 (*@list) {
gather {
while @list >= 5 { # there's no .length; it's .elems
take (sort @list.splice(0,5))[2];
}
}

C<gather> returns a list of everything that was C<take>n inside of it.
It does this by building a coroutine out of its argument, so it works
lazily.

Luke

Rod Adams

unread,

Sep 18, 2004, 9:05:55 PM9/18/04

to perl6-l...@perl.org

Luke Palmer wrote:

Better documentation on gather/take is merited.

Before starting this thread, I did a search on gather/take, thinking it
might be what I needed, but all I could find was in A12 :

"We snuck in an example the new |gather|/|take| construct. It is still
somewhat conjectural."

This was not very informative.
Not to mention, it should be in S4/S9, not A12.

I would question the need for C<gather>, however. Could not a lone
C<take>/C<emit> force the return value of the enclosing routine/closure
to be a lazy list, and here's a few values to get things started?

To address Jonadab's questions:

> 1. What if anything do you propose is the evaluation value of emit?

The lazy list being constructed. Maybe a reference to it instead.

> 2. Should a subsequent implicit return behave differently than usual
> if some values have already been emitted?

This is a bit harder, but I would think it should emit it's parameters,
then exit the routine.
The downside would be it forces list context where it might not have
existed before, but it's the most consistent thought I have.

> One other question: Would emit behave differently if the sub is called
> in a non-list context, such as void context or a scalar context?

My thought is that it returns a lazy list, and P6 rules for converting
such a thing into a void or scalar would apply.

-- Rod Adams.

Luke Palmer

unread,

Sep 18, 2004, 11:40:02 PM9/18/04

to Rod Adams, perl6-l...@perl.org

Rod Adams writes:
> Better documentation on gather/take is merited.

Without a doubt.

> I would question the need for C<gather>, however. Could not a lone
> C<take>/C<emit> force the return value of the enclosing routine/closure
> to be a lazy list, and here's a few values to get things started?

C<gather> is necessary. Coroutines are most useful in recursion, so you
need to know to which gather the emit is going. "From the current
function" isn't good enough, because of this case.

Luke

> >2. Should a subsequent implicit return behave differently than usual
> >if some values have already been emitted?
>
> This is a bit harder, but I would think it should emit it's parameters,
> then exit the routine.
> The downside would be it forces list context where it might not have
> existed before, but it's the most consistent thought I have.

Also gather/take addresses this problem, as there's an explicit return:
the result of the gather. You can also return something else if you see
fit.

I just think it's a much more versatile and controlled construct than
the more implicit "emit". I like to have control over my programs.

Luke

Jonadab the Unsightly One

unread,

Sep 19, 2004, 4:38:54 PM9/19/04

to perl6-l...@perl.org

Luke Palmer <lu...@luqui.org> writes:

> That's actually a very good idea. That's why Perl 6 has it :-)
>
> sub MediansBy5 (*@list) {
> gather {
> while @list >= 5 { # there's no .length; it's .elems
> take (sort @list.splice(0,5))[2];
> }
> }
>
> C<gather> returns a list of everything that was C<take>n inside of it.
> It does this by building a coroutine out of its argument, so it works
> lazily.

Okay. I have to go back and reread what has been said about coroutines now.

Michele Dondi

unread,

Sep 21, 2004, 6:21:32 AM9/21/04

to perl6-l...@perl.org

On Sat, 18 Sep 2004, Luke Palmer wrote:

>>> Example above becomes:
>>>
>>> sub MediansBy5 (*@list) {
>>> while @list.length >= 5 {
>>> emit (sort @list.splice(0,5))[2];
>>> }}
>
> That's actually a very good idea. That's why Perl 6 has it :-)
>
> sub MediansBy5 (*@list) {
> gather {
> while @list >= 5 { # there's no .length; it's .elems
> take (sort @list.splice(0,5))[2];
> }
> }
>
> C<gather> returns a list of everything that was C<take>n inside of it.
> It does this by building a coroutine out of its argument, so it works
> lazily.

I suppose that this construct may be used in a wider range of circumstance
and will allow some other extra features too (C<untake>?), OTOH the
simplicity ogf C<emit> was indeed striking. The cmt about its short length
in comparison with C<return>'s one is worth taking into account though...

Michele
--
l'Italia e' una penisola bagnata da tre mari e prosciugata da tremonti
- "chronos12" su it.hobby.umorismo

Michele Dondi

unread,

Sep 21, 2004, 6:28:08 AM9/21/04

to perl6-l...@perl.org

On Sat, 18 Sep 2004, Rod Adams wrote:

>> 2. Should a subsequent implicit return behave differently than usual if
>> some values have already been emitted?

It seems clear to me that behind the scenes there should be a "stack" into
which C<emit>ted stuff is pushed and that this is returned upon either
explicit or implicit C<return> at the end of the sub. Now a possible
issue, and maybe a more subtle one is that some care should be used when
one doesn't want to return anything but what has been C<emit>ted: and
explicit C<return> with no arguments should be needed in that case,
wouldn't it?

Michele
--
No one can ever predict all of the possible error conditions, of course;
as soon as we write idiot-proof code, along comes a better idiot.
But it's still worth making the attempt.
- Sherm Pendley in clpmisc (edited)

Rich Morin

unread,

Sep 26, 2004, 6:05:30 PM9/26/04

to

It looks as if "emit" could be implemented as a special case of the
more general technique. If no "reasonable" behavior can be defined for
use with co-routines, disallow that combination.

As a long-time user of the dbm_open family, I am in favor of having
simple interfaces for the most common cases. TMTOWTDI, as it were.

Touching on what I initially thought this thread might be about, I'd
like to know what Perl 6 brings to the table in terms of taking
advantage of CPU enhancements. That is, how will it allow me (nay,
encourage) programmers to take advantage of parallel processors,
pipelined (e.g., vector) processing, etc?

Main CPUs appear to be hitting a wall, when compared to their highly
parallel brethren. As a result, chip and system vendors are building
in a lot of CPU enhancements. For some applications, most of the
available processor power will come from outside of the CPU. So, it
would be nice to have the interpreter automagically take advantage of
available parallel hardware.

I assume that Damian's Quantum Computing ideas will be optimized in
this regard. I'd also like to see things like loop unrolling and
perhaps clever ways to do hash access and/or REs. What's on the table?

On a related tack, is there hope for "native" support for outboard
DSPs, graphics cards, etc? Apple's CoreImage and CoreVideo frameworks
will provide access to these in Tiger. It might be nicer to have a
Perlish (read, portable) Way To Do It.

Portable, cross-language solutions are emerging (e.g., OpenGL). It
may be that all Perl 6 should be doing is providing a switchboard.
Service providers (e.g., support libraries for graphics cards) can
subscribe, providing information on their services, limitations, and
performance characteristics. Apple does something like this, IIRC, in
their Driver Kit.

The switchboard would connect the Perl interpreter to any "subscribed"
service providers, defaulting to native processing if no hardware
assist is available. Apple matches user preferences to program
language capabilities (e.g., "Try for English, Klingon ...").
Something like this sort would be useful for situations where more
than one service provider exists.

-r