Snapshottable re-iterable iterators

Beni Cherniavsky

unread,

Jun 17, 2003, 8:29:14 AM6/17/03

to

From time to time I wanted to iterate the same iterator more than once
- but destructive iterators don't allow this. So I wrote a class for
wrapping destructive iterators. It gives you an iterable, whose
__iter__ goes over the values of the underlying iterator - but those
are only requested once. To make it fancy, I made sure that when you
release the iterable and iterate with all your iterators on it past
any point, no refernces remain to values older than it (I used a
linked list, bulit from two-item lists).

Then I thought that I do not want this multiple-iteration ability
always from the start. I wanted (and built) an iterator that could be
"snapshotted" at any moment - and the "snapshot" is a new iterator,
returning the same values from this point in time. It's similar in
spirit to the `fork` system call. Makes it easy to e.g. implement
lookahead on a stream - you fork the stream iterator, iterate on the
"child" stream and then can continue with the

Looking for a good method name for this snapshotting, I thought of
`__iter__` - that's where I ask for your comments. I got an iterator
that is destructive in the sense that calling ``.next()`` on it
advances it irreversibly - but it is an iterable at the same time and
calling ``.__iter__()`` on it creates a new iterator, with the same
future.

So the question is: is it a good idea to violate the iterator protocol
in this way - being an iterator but returning a "copy" of self from
`__iter__` rather than self? On one hand, it seems cute. On the
other, it makes it hard to avoid the forking when you don't want it.

If I'll go with another method rather than `__iter__`, the best
alternatives seem to be `copy` and `fork`.

The code is availiable at:

http://www.technion.ac.il/~cben/python/reiter.py

If you didn't understand what I'm talking about, the doctest example
there should make it clear.

--
Beni Cherniavsky <cb...@tx.technion.ac.il>
If I don't hack on it, who will? And if I don't GPL it, what am I?
And why not now?

Michael Chermside

unread,

Jun 17, 2003, 11:54:43 AM6/17/03

to

Beni Cherniavsky writes:
[... should the "snapshot-this-iterator" function be named __iter__? ...]

> On one hand, it seems cute. On the
> other, it makes it hard to avoid the forking when you don't want it.

I think you've summed it up nicely yourself in the above quote.

Argument 1: it makes it hard to avoid the forking when you don't want it.
Therefore, you should avoid the name "__iter__".

Argument 2: It seems cute.
Therefore, you should avoid the name "__iter__". "Cute" is a red flag
here warning you that it's a dangerously misleading choice of name.

So call it "snapshot()", or "copy()", but not "__iter__()".

-- Michael Chermside

Erik Max Francis

unread,

Jun 17, 2003, 4:40:00 PM6/17/03

to

Beni Cherniavsky wrote:

> From time to time I wanted to iterate the same iterator more than once
> - but destructive iterators don't allow this. So I wrote a class for
> wrapping destructive iterators. It gives you an iterable, whose
> __iter__ goes over the values of the underlying iterator - but those
> are only requested once.

Neat!

> So the question is: is it a good idea to violate the iterator protocol
> in this way - being an iterator but returning a "copy" of self from
> `__iter__` rather than self? On one hand, it seems cute. On the
> other, it makes it hard to avoid the forking when you don't want it.
>
> If I'll go with another method rather than `__iter__`, the best
> alternatives seem to be `copy` and `fork`.

__iter__ is a confusing name, since __iter__ is the name of the internal
method which returns an iterator.

fork and copy also seem ill-advised to me, since both have different
meanings, even in Python (re: os.fork, copy.copy). I would think reiter
would be fine.

--
Erik Max Francis && m...@alcyone.com && http://www.alcyone.com/max/
__ San Jose, CA, USA && 37 20 N 121 53 W && &tSftDotIotE
/ \ Punctuality is the virtue of the bored.
\__/ Evelyn Waugh

David Abrahams

unread,

Jun 17, 2003, 9:22:18 PM6/17/03

to Andy Koenig

Beni Cherniavsky <cb...@techunix.technion.ac.il> writes:

> From time to time I wanted to iterate the same iterator more than once
> - but destructive iterators don't allow this. So I wrote a class for
> wrapping destructive iterators. It gives you an iterable, whose
> __iter__ goes over the values of the underlying iterator - but those
> are only requested once. To make it fancy, I made sure that when you
> release the iterable and iterate with all your iterators on it past
> any point, no refernces remain to values older than it (I used a
> linked list, bulit from two-item lists).
>
> Then I thought that I do not want this multiple-iteration ability
> always from the start. I wanted (and built) an iterator that could be
> "snapshotted" at any moment - and the "snapshot" is a new iterator,
> returning the same values from this point in time. It's similar in
> spirit to the `fork` system call. Makes it easy to e.g. implement
> lookahead on a stream - you fork the stream iterator, iterate on the
> "child" stream and then can continue with the
>
> Looking for a good method name for this snapshotting, I thought of
> `__iter__` - that's where I ask for your comments. I got an iterator
> that is destructive in the sense that calling ``.next()`` on it
> advances it irreversibly - but it is an iterable at the same time and
> calling ``.__iter__()`` on it creates a new iterator, with the same
> future.

This is a very nice solution to a problem I once wrestled with... if
you don't mind the memory cost, of course!

> So the question is: is it a good idea to violate the iterator protocol
> in this way - being an iterator but returning a "copy" of self from
> `__iter__` rather than self? On one hand, it seems cute. On the
> other, it makes it hard to avoid the forking when you don't want it.

When don't you want it (other than to save memory)?

After all, normally, once you execute:

for x in iterable:
....

iterable is thereafter useless. I think semantically, most code would
never be able to detect the difference.

> If I'll go with another method rather than `__iter__`, the best
> alternatives seem to be `copy` and `fork`.

I think you've done exactly the right thing. Beautiful idea!

Incidentally, Andrew Koenig once invented a similar iterator over
linked lists for C++.

--
Dave Abrahams
Boost Consulting
www.boost-consulting.com