Iterator semantics

Patrick R. Michaud

unread,

Sep 9, 2008, 3:10:04 PM9/9/08

to perl6-l...@perl.org

I think my question can be best understood by example -- what
does the following produce?

my @a = 1,2,3,4,5;
for @a { .say; @a = (); }

My question is whether the change to @a inside the for loop
affects the iterator created at the beginning of the for loop.
In other words, would the above produce "1\n2\n3\n4\n5\n" or
"1\n" ?

My followup question is then:

my @a = 1,2,3,4,5;
my @b = 6,7,8;

for @a,@b { .say; @b = (); }

I have more examples involving various aspects of list and
iterator semantics, but the answers to the above will help guide
my questions.

Pm

Larry Wall

unread,

Sep 9, 2008, 3:50:02 PM9/9/08

to perl...@perl.org, perl6-l...@perl.org

On Tue, Sep 09, 2008 at 10:10:04AM -0500, Patrick R. Michaud wrote:
: I think my question can be best understood by example -- what

At the moment the design of Perl 6 (unlike certain FP languages) is
that any dependence on the *degree* of laziness is erroneous, except
insofar as infinite lists must have *some* degree of laziness in order
not to use up all your memory immediately. But any finite list can
do any amount of batching it pleases, so there's no foolproof way to
tell a lazy finite list from an eager list. Modifying the inputs to
a lazy list is basically asking for undefined behavior.

This seems to me like the most multicore-friendly default, though
perhaps not the most programmer friendly. It seems to me that any
programming involving cyclic lazy feeds must somehow have an "eager"
in it somewhere to prevent indeterminacy, but I don't know how to
write a compiler that enforces that offhand.

Larry

TSa

unread,

Sep 9, 2008, 4:03:45 PM9/9/08

to perl6-l...@perl.org

HaloO,

Patrick R. Michaud wrote:
> My question is whether the change to @a inside the for loop
> affects the iterator created at the beginning of the for loop.

Since Larry said that single assignment semantics is the ideal
we should strive for, I would opt for the iterator being unaffected
by the assignment to @a. When this happens the singly assigned
former content of @a is snaphot by the iterator. This also preserves
the former lazyness state. If possible the optimizer factors
out the assignments after the loop.

Regards, TSa.
--

"The unavoidable price of reliability is simplicity" -- C.A.R. Hoare
"Simplicity does not precede complexity, but follows it." -- A.J. Perlis
1 + 2 + 3 + 4 + ... = -1/12 -- Srinivasa Ramanujan

John M. Dlugosz

unread,

Sep 10, 2008, 12:03:25 AM9/10/08

to perl6-l...@perl.org

My take on it:

The 'for' loop does bind $_ to alias each item of the list in turn.
But, "the list" is not synonymous with the variable named @a. However,
the = operator operates on "the list" itself, not the container,
replacing the elements in the existing Array (or whatever) object. So,
the first iteration through the loop would indeed replace all the
elements in the object being iterated over.

The semantics of the 'for' loop are sequential. It is a
programming-flow construct, not a data-manipulation construct. So
laziness within the list is not an issue. If some elements are
delayed-evaluation, they get overwritten so that evaluation is never done.

The follow-up question is more interesting. The operation of @a,@b
makes a new List object. Just like =after= calling $c=$a+$b it does not
matter if you change $a. Imagine + being lazy with thousand-digit
numbers so it might avoid actually doing it 'till you needed (if ever)
the answer. It would be unacceptable if it appeared to bind to the
original arguments indefinitely.
If the comma concatenation is "lazy" it needs to be on a deeper level so
it does not seem to depend on the arguments not changing afterwards. It
might, in particular, copy all the primitive components out of @b which
includes range objects and other lazy lists, or arrange a copy-on-write
sharing with @b. I've done this kind of stuff with my C++ classes, so
I'm familiar with the scenario. That is, give the outward appearance of
"value semantics" but share storage or defer work internally. @a,@b
produces a well-defined value.

So, the first example prints "1" only, and the second prints 1 through 8.

--John

John M. Dlugosz

unread,

Sep 10, 2008, 12:15:48 AM9/10/08

to perl6-l...@perl.org

TSa Thomas.Sandlass-at-vts-systems.de |Perl 6| wrote:
> Since Larry said that single assignment semantics is the ideal
> we should strive for, I would opt for the iterator being unaffected
> by the assignment to @a. When this happens the singly assigned
> former content of @a is snaphot by the iterator.

That's what my C++ vararray class does with iterators. That way they
don't go stale if you manipulate the array during iteration.

Iterators are independent objects in Perl 6 that can be used in other
ways. I think it vaguly specifies in the synopses that the for loop
will use the default iterator for that List, provided by a member for
that purpose.

You can have different iterator types that work in different ways. You
can create which ever kind you need for the purpose.

The synopses does not specify the details of iterators. But you can
logically deduce that the default iterator needs to provide for rw (not
ref?) binding, since the for loop is documented to do that, if the for
loop uses the default iterator. The meta-law is that things work like
in Perl 5 unless specified otherwise, so the for loop needs to continue
to track the changing contents of the List object.

But I imagine you can create an iterator with various options: snapshot
or tracking, rw, readonly, or ref, and other attributes.

So if you really wanted to depend on those semantics, you could write
something like

my $it = @a.iterator(:snapshot);
while =$it -> { body-of-loop }

You could allow adverbs on the 'for' loop to pass through to the call to
iterator(), but that is probably too much syntactic sugar for the value
it gives.

--John

Daniel Ruoso

unread,

Sep 9, 2008, 8:50:47 PM9/9/08

to Patrick R. Michaud, perl6-l...@perl.org

Ter, 2008-09-09 às 10:10 -0500, Patrick R. Michaud escreveu:
> I think my question can be best understood by example -- what
> does the following produce?
> my @a = 1,2,3,4,5;
> for @a { .say; @a = (); }

The problem actually becomes more evident with

my @a = 1,2,3,4,5;
for @a { .say; @a[5] = 'five' }

because the first code actually replaces the content of the variable @a,
while the second will call .postcircumfix(5), which itself returns a
container which is then STOREd with 'five'.

The basic difference is that in

for @a {...}

we take an iterator that point to the array that is stored at that
moment in @a. If we replace the array stored in the variable @a, that
iterator would still point to the original array.

The second example actually modifies the object stored in the variable
'@a'. And that is a different issue.

daniel

John M. Dlugosz

unread,

Sep 11, 2008, 12:31:32 AM9/11/08

to perl6-l...@perl.org

Daniel Ruoso daniel-at-ruoso.com |Perl 6| wrote:
>
> The second example actually modifies the object stored in the variable
> '@a'. And that is a different issue.
>

I disagree. The assignment operator, as opposed to the binding operator
(:=) is an operator called on the Array object. It is the same as
calling = on each element in turn -- it changes the contents without
changing the identity of the container.

Larry Wall

unread,

Sep 11, 2008, 7:13:41 PM9/11/08

to perl...@perl.org, perl6-l...@perl.org

On Tue, Sep 09, 2008 at 08:50:02AM -0700, Larry Wall wrote:
: At the moment the design of Perl 6 (unlike certain FP languages) is

: that any dependence on the *degree* of laziness is erroneous, except
: insofar as infinite lists must have *some* degree of laziness in order
: not to use up all your memory immediately. But any finite list can
: do any amount of batching it pleases, so there's no foolproof way to
: tell a lazy finite list from an eager list. Modifying the inputs to
: a lazy list is basically asking for undefined behavior.
:
: This seems to me like the most multicore-friendly default, though
: perhaps not the most programmer friendly. It seems to me that any
: programming involving cyclic lazy feeds must somehow have an "eager"
: in it somewhere to prevent indeterminacy, but I don't know how to
: write a compiler that enforces that offhand.

Perhaps there is some middle ground. I was thinking about the
relationship of iterators with reactive programming this morning, and
it occurred to me that we tend to think of iterators as having
two states when perhaps they have more. In the imperative mindset,
when you ask an iterator if it has more elements, it either says "yes" or
"no". But in the reactive mindset, the two valid answers are "yes" and
"I dunno yet; hang on...".

If we are to combine these paradigms, then iterators have at least three
valid answers:

No, there is nothing to return here, ever (see "EOF").
Yes, there's something here that is easy to return, and if you ask
me nicely I will return it immediately without blocking.
I don't know if there's anything to return because I'd have to ask
someone else, and that is likely to block.

So when you put something into a list context, some of the values
will be considered "easy", and some will be considered "hard".
The basic question is whether we treat those the same or differently
from a referential point of view. Obviously there's an argument for
consistency on the one hand, but on the other, it's rather like the
difference between value semantics and object semantics, which is
another paradigm crossing idea.

A given array may has some values that are easy and some that are
hard. The easy ones are the values that have already been calculated,
presumably, plus maybe any values that the first iterator spec says are
easy (and if those are all easy, go on to the next iterator spec).

My basic point is that it's easy to arrange single assignment semantics
for the easy values, and hard for the hard values. The thing that
makes it harder for the hard values is that you have to avoid having
two different iterators asking the same sub-iterator for values.

Suppose we have

my @a = 1..10; # assume easy
my @b = =$socket; # assume hard

for @a,@b {...}

If the for list sees @b as hard and just copies @b's iterator into its
own list for lazy evaluation, then it's going to end up iterating the
socket without loading @b. Contrariwise if @b is eagerly evaluated it
might clobber the other iterator in the for list. So those iterators
must be kept unified. It's not enough to copy out the iterators
from the array; access to the hard elements of the array must still
be modulated by the array container. A snapshot of the array container
will not do in this case.

So I think when you ask to interpolate an array into a list, the array
must return an iterator that refers to itself, not an iterator that
merely snapshots its values.

And I guess the fundamental underlying constraint is that a list cannot
be considered immutable unless its feeds can be considered immutable,
at least in some kind of idempotent sense. This conflicts with the
whole point of reactive programming, which is that you have to react
because don't know what's coming.

This seems like it's close to the fundamental difficulty we're trying
to solve here. And maybe the solution is much like the value/object
distinction, where lists get to *decide* for themselves where they
switch from easy eager immutable semantics to hard lazy reactive semantics.
And if we're pretty clear about the boundary between those, it
could work out much like the boundary between DFAable regexes and
side-effectful regexes as defined in S05. And maybe it's even the
same boundary in some theoretical sense.

As a first shot at that definition, I'll submit:

1 .. $n # easy
1 .. * # hard

On the other hand, I can argue that if the first expression is easy,
then the first $n elements of 1..* should also be considered easy, and
it's not hard till you try to get to the *. :)

It could also be that I'm confusing things here, of course, and that
1..* is something easy and immutable that nevertheless cannot be
calculated eagerly. More to think about...

Larry

Aristotle Pagaltzis

unread,

Sep 12, 2008, 5:46:05 AM9/12/08

to perl6-l...@perl.org

* Larry Wall <la...@wall.org> [2008-09-11 21:20]:

> As a first shot at that definition, I'll submit:
>
> 1 .. $n # easy
> 1 .. * # hard
>
> On the other hand, I can argue that if the first expression is
> easy, then the first $n elements of 1..* should also be
> considered easy, and it's not hard till you try to get to the
> *. :)
>
> It could also be that I'm confusing things here, of course, and
> that 1..* is something easy and immutable that nevertheless
> cannot be calculated eagerly. More to think about...

In some sense, from the reactive programming perspective `1..*`
is actually easier than `1..$n`. For the latter’s iterator the
answer to “do you have another element” implies a conditional
somewhere, whereas for the former’s it’s trivially “yes.”

Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>

Eric Wilhelm

unread,

Sep 13, 2008, 3:16:27 AM9/13/08

to perl6-l...@perl.org

Hi Larry,

# from Larry Wall
# on Thursday 11 September 2008 12:13:

>So when you put something into a list context, some of the values
>will be considered "easy", and some will be considered "hard".
>The basic question is whether we treat those the same or differently

>from a referential point of view. ...

>The easy ones are the values that have already been calculated,

>presumably...

>
>Suppose we have
>
> my @a = 1..10; # assume easy
> my @b = =$socket; # assume hard
>
> for @a,@b {...}
>
>If the for list sees @b as hard and just copies @b's iterator into its
>own list for lazy evaluation, then it's going to end up iterating the
>socket without loading @b. Contrariwise if @b is eagerly evaluated it
>might clobber the other iterator in the for list. So those iterators
>must be kept unified. It's not enough to copy out the iterators
>from the array; access to the hard elements of the array must still
>be modulated by the array container. A snapshot of the array
> container will not do in this case.

If a lazy list is an array with an iterator welded onto the end, then:

1 .. 10 is just: 1, 2, 3, weld_here iter(4, thru => 10)
1 .. * is just: 1, 2, 3, weld_here iter(4)

And because an iterator doesn't go backwards, the array has to remember
the previous values to allow me to maintain the "just an array"
abstraction, so after I asked for @a[3], the internal state is like:

1 .. * becomes: 1, 2, 3, 4, weld_here iter(5)

>As a first shot at that definition, I'll submit:
>
> 1 .. $n # easy
> 1 .. * # hard
>
>On the other hand, I can argue that if the first expression is easy,
>then the first $n elements of 1..* should also be considered easy, and
>it's not hard till you try to get to the *. :)
>
>It could also be that I'm confusing things here, of course, and that
>1..* is something easy and immutable that nevertheless cannot be
>calculated eagerly.

But you can take a copy of the "@a = 1 .. *" however you like as long
as "just an array" holds such that @a[41] is always the same regardless
of whether you (internally) have to kick the welded-on iter 27 times to
get that element or have already passed it. So in this case you could
(internally) end up with two copies each holding their own iter() in
different states while still giving the same result at the "just an
array" level. But this is just an optimization on the general case you
stated of "must be modulated by the array container" and it is an
optimization which is only possible for predictable iters.

So, perhaps the case is:

1 .. $n # bounded and predictable
1 .. * # unbounded and predictable
=$handle # bounded and unpredictable
=$socket # unbounded and unpredictable

By "unpredictable", I'm meaning simply that the value of a given element
cannot be calculated independently by replicating a (pure) function
f($i). Perhaps a filehandle isn't a thorough example of that though?

What about:

my @a = 1..*;
my @b = =$socket;
for @a,@b {...; @a = something(); ...}

or:

my @a = 1..*;
my @b = =$socket;
my @c = (@a, @b);
for @c {...; @a = something(); ...}

In the first case, changing @a changes the for() iterator but in the
second, the for() will still be nibbling on 1..*, right? So the
welded-on-iter is basically by-reference until I do something under
the "just an array" paradigm which breaks the weld?

@a = 1..*;
@a[-1] = 9; # @a = (9) now ?

That's just my thoughts from what I understand and sorry if introducing
welding into the analogy causes bits of molten metal to go flying
around ;-)

--Eric
--
"You can't win. You can't break even. You can't quit."
--Ginsberg's Restatement of the Three Laws of Thermodynamics
---------------------------------------------------
http://scratchcomputing.com
---------------------------------------------------

Daniel Ruoso

unread,

Sep 12, 2008, 6:33:44 PM9/12/08

to Larry Wall, perl...@perl.org, perl6-l...@perl.org

Qui, 2008-09-11 às 12:13 -0700, Larry Wall escreveu:
> And I guess the fundamental underlying constraint is that a list cannot
> be considered immutable unless its feeds can be considered immutable,
> at least in some kind of idempotent sense. This conflicts with the
> whole point of reactive programming, which is that you have to react
> because don't know what's coming.

This is actually something I was talking in the IRC this week. The
amount of polymorphism Perl 6 supports makes it quite impossible to
detect if the feeds can be considered immutable or not (in terms of
concept, I mean, runtime tips could allow optimizations).

But one thing needs to be clear, "List" is immutable as a type, meaning
that the API for List only allows you to read from it, not write to it,
but it doesn't necessarily means that it is immutable as an instance,
because the List might have a "live" backend.

Since List is not a native type, the interpreter doesn't really have any
control on what it does to provide its values, and that's what I mean by
saying that we can't infer if the feeds can or cannot be considered
immutables.

> This seems like it's close to the fundamental difficulty we're trying
> to solve here. And maybe the solution is much like the value/object
> distinction, where lists get to *decide* for themselves where they
> switch from easy eager immutable semantics to hard lazy reactive semantics.
> And if we're pretty clear about the boundary between those, it
> could work out much like the boundary between DFAable regexes and
> side-effectful regexes as defined in S05. And maybe it's even the
> same boundary in some theoretical sense.

The problem is that this concept should apply to the entire chain, this
means that it can only be considered easy if all the feeds on the chain
are easy, and it is too easy for it to be considered hard... for
instance, is a 'map' considered easy or hard?

In the end, that means that most of the time "easy" feeds will be made
dirty by hard feeds and all the chain will be made lazy, so we have
little gain.

In SMOP, I'm probably going to presume that everything needs to be lazy,
then even:

my @o = grep { /\d+/ } =$*IN;
my @a = (1,2,(3,4,@o));
my @b = (1,2,(3,4,@a));

Will still allow @b to be seen in slice context, where you would see @a
also in slice context, because @a was not eagerly evaluated when
composing @b, and eventually @o might never be evaluated.

I think this is consistent with the spec that says somewhere that the
only operators that imply eager evaluation are the short-circuit boolean
operators (besides the 'eager' operator, of course, and the use of lazy
values in void context).

Of course the spec only says that it should be lazy with the feed
operators, but in SMOP I tend to think that all this evaluation will be
lazy.

daniel

Ashley Winters

unread,

Sep 13, 2008, 5:53:10 AM9/13/08

to perl6-l...@perl.org

On Fri, Sep 12, 2008 at 11:33 AM, Daniel Ruoso <dan...@ruoso.com> wrote:
> In SMOP, I'm probably going to presume that everything needs to be lazy,
> then even:
>
> my @o = grep { /\d+/ } =$*IN;
> my @a = (1,2,(3,4,@o));
> my @b = (1,2,(3,4,@a));

Can only one array "have" the iterator? If not, that makes for =$*IN {
=$*IN if $needs_skipping } look like a forbidden idiom. On the other
hand, if it's really really lazy, then I could run the following two
statements without reading anything out of the iterator

my @digitlike = grep { /\d+/ } =$*IN;
my @hexlike = grep { /<hexadecimal>+/ } =$*IN;

But, if the taking of an iterator causes the array to autoactualize
(made this word up to parallel autovivify), what happens to the other
lazy array? Does the iterator buffer its contents to be served up to
other readers, or is the $iterator.next function
first-come-first-serve? Or is there some possible "batching" behavior
that allows one of the iterators to gobble more than is used?

given a non-deterministic iterator which would return <1 2 3 a b c 5 6
7 d e f 8 9 0>
say @digitlike[0..4];
say @hexlike[0..4]; # what's this? <1 2 3 a b> or <7 d e f 8> or
<> or undefined behavior?

In the event you picked <1 2 3 a b>, when did the @hexlike grep block
get called?

- Ashley Winters