Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

reseting an iterator

3 views
Skip to first unread message

Jan

unread,
May 20, 2009, 2:35:47 PM5/20/09
to
Wouldn't it be easy for Python to implement generating functions so
that the iterators they return are equipped with a __reset__() method?

Here is the context of this question.

Python documentation defines a "iterator" as an object ITERATOR
having methods __next__() and __iter__() such that the call
ITERATOR.__iter__() returns the object itself, and once a call
ITERATOR. __next__() raises StopIteration every such subsequent call
does the same.

Python iterators model "generating Turing machines", i.e.
deterministic Turing machines which take no input, which have a
separation symbol # in the output alphabet; have an one-way-infinite
output tape on which the output head prints and moves only to the
right; The machine may have a halting state; a word is said to be
"generated by M" if it appears on the output tape delimited by
separation symbols # (independently of whether M halts or computes
infinitely). Generating Turing machines provide a characterization of
recursively enumerable languages: a language is generated by a
generating Turing machine iff it is accepted by a Turing machine.

Turing machines can take as input and run other Turing Machines --
similarly Python functions can take other functions (including
iterators) as parameters and call them. HOWEVER, this symmetry breaks
down: a Turing machine which takes a generating Turing machine M as
input can run M for a number of steps and then RESET M and run it
again, while iterators as currently defined in Python do not have a
reset method. (I realize that instead of reseting an old iterator, one
can make a new iterator but it is not as elegant.)

(Contrary to Python's philosophy that there should be one-- and
preferably only one --obvious way to do it) there are several ways to
define iterators:

Jan

unread,
May 20, 2009, 2:48:07 PM5/20/09
to
On May 20, 2:35 pm, Jan <jan.pl...@plattsburgh.edu> wrote:

OOPS, I have pressed some keys and the message went out before It was
finished.
Here is the last fragment:

So, one can define iterators by defining a class whose objects have
methods
__iter__ and __next__ -- with this approach it is easy to add some
__reset__
method. But the easiest way to define iterators is by constructing a
generating
function (with yield expressions); this function returns iterators,
although
without __reset__. Another way to define an iterator is to define
callable
and using iter(CALLABLE, SENTINEL) -- the resulting iterator will not
have __reset__.
I do not know how Python is implemented but I believe that
in the last two cases, Python could produce improved iterators with
__reset__
at almost no additional cost.

Jan

Jan

unread,
May 20, 2009, 2:58:31 PM5/20/09
to
On May 20, 2:48 pm, Jan <jan.pl...@plattsburgh.edu> wrote:

Iterators can also be produced by iter(ITERABLE) which
could mnufacture them with a __reset__.

Jan


Alan G Isaac

unread,
May 20, 2009, 3:29:01 PM5/20/09
to
Jan wrote:
> Wouldn't it be easy for Python to implement generating functions so
> that the iterators they return are equipped with a __reset__() method?

Use ``send``:
http://docs.python.org/whatsnew/2.5.html#pep-342-new-generator-features

Remember, there may be no underlying sequence object for
an iterator, and if you want one in that case, you should build it.

Alan Isaac

Terry Reedy

unread,
May 20, 2009, 4:45:59 PM5/20/09
to pytho...@python.org
Jan wrote:
> Wouldn't it be easy for Python to implement generating functions so
> that the iterators they return are equipped with a __reset__() method?

No. Such a method would have to poke around in the internals of the
__next__ function in implementation specific ways. The values used to
initialize that function might have changed, so 'reset' would have to be
carefully defined.

def squares():
start = int(input("enter starting int:"))
stop = int(input("enter stopping int"))
for i in range(start,stop):
yield i*i

What does 'reset' mean here?

> Here is the context of this question.
>
> Python documentation defines a "iterator" as an object ITERATOR
> having methods __next__() and __iter__() such that the call
> ITERATOR.__iter__() returns the object itself,

This is so that 'iter(iterator) is iterator', so that functions can take
either an interable or iterator as an argument and proceed without
checking which it got.

> and once a call ITERATOR. __next__() raises StopIteration every
> such subsequent call does the same.

After returning objects for some number of calls, which might be unbounded.

The protocol is basically one method with defined behavior. It is
intentionally minimal so it can be used as the universal within-Python
object stream protocol. Multiple ways of getting interators is in line
with this purpose.

Terry Jan Reedy

Steven D'Aprano

unread,
May 20, 2009, 8:47:47 PM5/20/09
to
On Wed, 20 May 2009 11:35:47 -0700, Jan wrote:

> Wouldn't it be easy for Python to implement generating functions so that
> the iterators they return are equipped with a __reset__() method?

No.

def gen():
for name in os.listdir('.'):
yield open(name).read()
os.remove(name)


How would you "easily" reset this generator so that it returns the same
values each time?

That's an extreme example, but as a general rule, generators/iterators
are one-shot: having consumed a value, you can't get it back again
without re-creating it from scratch, or possibly not even then. Here's a
less destructive example:

def gen():
for i in xrange(10):
yield time.time()

--
Steven

Duncan Booth

unread,
May 21, 2009, 4:46:38 AM5/21/09
to

While I agree 100% with your 'No', I don't actually agree with your
explanation.

Why are you assuming that resetting a generator should return the same
values each time? Why shouldn't resetting put it back to some known state,
but allow the resulting output to vary?

For example, that's what happens in the .Net universe with Linq: a Linq
expression is roughly equivalent to a Python iterable and every time you
iterate over it you can get a different set of results.

The following code would, I think, work except that generators don't allow
you to add attributes:

from functools import wraps
def resettable(generator):
@wraps(generator)
def wrapper(*args, **kw):
def __reset__():
gen = generator(*args, **kw)
gen.__reset__ = __reset__
return gen
return __reset__()
return wrapper

@resettable
def gen(n):
for i in xrange(n):
yield i, time.time()


import time
it = gen(3)
for v in it: print v
it = it.__reset__()
for v in it: print v


The simpler solution is just to reconstruct the iterator yourself.

--
Duncan Booth http://kupuguy.blogspot.com

norseman

unread,
May 21, 2009, 2:33:41 PM5/21/09
to pytho...@python.org
Terry Reedy wrote:

> Jan wrote:
>> Wouldn't it be easy for Python to implement generating functions so
>> that the iterators they return are equipped with a __reset__() method?
>
> No. Such a method would have to poke around in the internals of the
> __next__ function in implementation specific ways. The values used to
> initialize that function might have changed, so 'reset' would have to be
> carefully defined.
>
> def squares():
> start = int(input("enter starting int:"))
> stop = int(input("enter stopping int"))
> for i in range(start,stop):
> yield i*i
>
> What does 'reset' mean here?

I don't understand. Why would one use a reset here in the first place?
One simply re-runs this. The output could be collected into a list,
which one might want to reset, re-list.
bad example?


>
>> Here is the context of this question.
>>
>> Python documentation defines a "iterator" as an object ITERATOR
>> having methods __next__() and __iter__() such that the call
>> ITERATOR.__iter__() returns the object itself,
>

> This is so that 'iter(iterator) is iterator',

You would do well to clarify the previous line.
To me it is the same as a face is a face
function(x) is x six of one, half dozen of the other
gobble-d-goop, double talk so what?
Not trying to pick a fight - just pointing out the need for some
clarity. Had a math teacher that proclaimed any function(x) that only
returned x was (hummm can't print that here). Something like useless.


so that functions can take
> either an interable or iterator as an argument and proceed without
> checking which it got.

OK.

>
>> and once a call ITERATOR. __next__() raises StopIteration every
> > such subsequent call does the same.
>

if it can't be restarted (serial line input) then don't, else do.
serial line, ethernet, etc are basically pipes. But a list itself can be
re-wound, be it a file on disk, something in memory, on tape -- whatever

may have to reach back past the 'pipe' to the 'iterator' there to
restart but being able to restart (elegantly) restartables is the point.

> After returning objects for some number of calls, which might be unbounded.
>
> The protocol is basically one method with defined behavior. It is
> intentionally minimal so it can be used as the universal within-Python
> object stream protocol. Multiple ways of getting interators is in line
> with this purpose.

I think clarity is also needed here. Different disciplines will
interpret the above differently. Stream means river, channel, pipe,
singular direction to me. Once the water flows past there is no hope of
getting those molecules back in same order. A list of things in a
container (variable or file) is not a stream and can be rewound. The
whole concept of random access hinges on this.
To me this boils down to two distinct concepts.
one being stream as in here it comes there it goes, never to return.
The stream is not rewindable. The container you put it in might be.
one being sequential reading of finite number of randomly accessible
things. This being inherently rewindable.
Testing which is simple enough and can set the ability to rewind.
Having it 'built-in' will reduce the problems generated by another
'do-it-yourself' design by person that may or may not have thought
things out. The old - "I took the carburettor off the Olds but it
doesn't work on my Hugo. Can you help me?" would be avoided.
Really - rewind is better if it is builtin and preforms where it should.
The docs should explain what will or will not happen and why.
Preferably in plain language. :)

In short: are you telling us the reset() can't do in background the
exact same thing that the docs tell the users to do? It's a lot simpler
to move file descriptors and pointers from the inside.

I often get so close to things I forget to look up too.


>
> Terry Jan Reedy
>

Terry Reedy

unread,
May 21, 2009, 6:21:37 PM5/21/09
to pytho...@python.org
I will clarify by starting over with current definitions.

Ob is an iterator iff next(ob) either returns an object or raises
StopIteration and continues to raise StopIteration on subsequent calls.

Ob is an iterable iff iter(ob) raturns an iterator.

It is intentional that the protocol definitions be minimal, so that they
can used as widely as possible.

As a convenience, the definition of iterators is given a slight
complication. They are defined as a subcategory of iterables, with the
requirement that iter(iterator) be that same iterator. This means
that iterators need the following boilerplate:
def __iter__(self): return self
The extra burden is slight since most iterators are based on builtins or
generator functions or expressions, which add the boilerplate
automatically. The convenience is that one may write

def f(iterable_or_iterator):
it = iter(iterable_or_iterator)
...

instead of

def f(iterable_or_iterator):
if is_iterable(iterable_or_iterator):
it = iter(iterable_or_iterator)
else:
it = iterable_or_iterator

In particular, the internal function that implements for loops can do
the former.

In other words, a small bit of boilerplate added to iterators, mostly
automatically, saves boilerplate in the use of iterators and iterables.

When the protocols were defined, there was discussion about whether or
not to require 'continue to raise StopIteration'. For instance, an
iterator that returns objects derived from external input might not have
any new external input now but expect to get some in the future. It was
decided the such iterators should either wait and block the thread or
return a 'Not now' indicator such as None. StopIteration should
consistently mean 'Done, over and out' so for loops, for instance, would
know to exit.

The OP proposes that StopIteraton should instead mean 'Done until
reset', without defining 'reset'. Some comments:
* This would complicate the protocol.
* There are real use cases, and reiterability is a real issue. But ...
* Depending on the meaning, resetting may or may not be possible.
* When it is possible, it can potentially be done today with a .send()
method.
* Many use cases are easier with a new iterator. For instance

for i in iterable: block1()
for i in iterable: block2()

is easier to write than

it = iter(iterable)
for i in it: block1()
it.reset()
for i in it: block2()

with little relative time saving in the second case, for practical
problems, to compensate for the extra boilerplate.

Terry Jan Reedy

norseman

unread,
May 21, 2009, 7:00:45 PM5/21/09
to pytho...@python.org

Done unless you put the data pointer back to offset zero

> reset', without defining 'reset'. Some comments:
> * This would complicate the protocol.
> * There are real use cases, and reiterability is a real issue. But ...
> * Depending on the meaning, resetting may or may not be possible.
> * When it is possible, it can potentially be done today with a .send()
> method.
> * Many use cases are easier with a new iterator. For instance
>
> for i in iterable: block1()
> for i in iterable: block2()
>
> is easier to write than
>
> it = iter(iterable)
> for i in it: block1()
> it.reset()
> for i in it: block2()
>
> with little relative time saving in the second case, for practical
> problems, to compensate for the extra boilerplate.
>


while testing:
for i in it:
code
it.reset()

> Terry Jan Reedy
>

Terry Reedy

unread,
May 21, 2009, 7:08:42 PM5/21/09
to pytho...@python.org

And if there is not data pointer?

J. Cliff Dyer

unread,
May 22, 2009, 9:46:33 AM5/22/09
to Jan, pytho...@python.org
On Wed, 2009-05-20 at 11:35 -0700, Jan wrote:
> Wouldn't it be easy for Python to implement generating functions so
> that the iterators they return are equipped with a __reset__() method?
>
> Here is the context of this question.
>
> Python documentation defines a "iterator" as an object ITERATOR
> having methods __next__() and __iter__() such that the call
> ITERATOR.__iter__() returns the object itself, and once a call
> ITERATOR. __next__() raises StopIteration every such subsequent call
> does the same.

You don't need a reset method. There is no hard and fast rule that
__iter__ must return the object itself. It just needs to return an
iterator. For example:

>>> l = [1,2,3]
>>> l.__iter__()
<listiterator object at 0x7fd0da315850>
>>> l is l.__iter__()
False

Just create a class with an __iter__ method that returns a reset
iterator object.


class X(object):
def __init__(self, max=3):
self.counter = 0
self.max = max
def __iter__(self):
return self
def next(self):
if self.counter < self.max:
self.counter += 1
return self.counter
else:
raise StopIteration

class Y(object):
def __iter__(self):
return X()

In this setup, X has the problem you are trying to avoid, but Y behaves
as a resettable iterable.

>>> x = X()
>>> for c in x:
... print c
...
1
2
3
>>> for c in x:
... print c
...
>>> y = Y()
>>> for c in y:
... print c
...
1
2
3
>>> for c in y:
... if c < 3:
... print c
...
1
2
>>> for c in y:
... print c
...
1
2
3

Cheers,
Cliff


Jan

unread,
May 22, 2009, 1:54:59 PM5/22/09
to
On May 22, 9:46 am, "J. Cliff Dyer" <j...@sdf.lonestar.org> wrote:

> You don't need a reset method.  There is no hard and fast rule that
> __iter__ must return the object itself.  It just needs to return an
> iterator.  

I disagree.
If ITRATOR is a true iterator, ITRATOR.__iter__() must return
ITERATOR.
If ITERABLE is an iterable (but not necessarily an iterator)
ITRABLE.__iter__() must return an iterator.

> For example:
>
> >>> l = [1,2,3]
> >>> l.__iter__()
>
> <listiterator object at 0x7fd0da315850>>>> l is l.__iter__()
>
> False

[1,2,3] is an iterable but not an iterator, so this False result is
expected.
Compare this with the following.

>>> ii = iter([1,2,3]) # ii is an iterator.
>>> next(ii)
1
>>> jj = ii.__iter__() # call __iter__ method on an iterator
>>> ii is jj
True
>>> next(jj)
2

> Just create a class with an __iter__ method that returns a reset
> iterator object.
>
> class X(object):
>     def __init__(self, max=3):
>         self.counter = 0
>         self.max = max
>     def __iter__(self):
>         return self
>     def next(self):
>         if self.counter < self.max:
>             self.counter += 1
>             return self.counter
>         else:
>             raise StopIteration
>
> class Y(object):
>     def __iter__(self):
>         return X()
>
> In this setup, X has the problem you are trying to avoid, but Y behaves
> as a resettable iterable.

> y = Y()

This does not work.

With this, y is not an interator, and not even an iterable.

> for c in y:

This produces an error because by definition of for-loops
it is executed the same way as:

temp_iterator = iter(y) # temp_iterator is y
while True:
try:
print(next(temp_iterator)) # temp_iterator does not support
__next__()
except StopIteration:
break

Jan

J. Cliff Dyer

unread,
May 22, 2009, 4:13:39 PM5/22/09
to Jan, pytho...@python.org
On Fri, 2009-05-22 at 10:54 -0700, Jan wrote:
> On May 22, 9:46 am, "J. Cliff Dyer" <j...@sdf.lonestar.org> wrote:
>
> > You don't need a reset method. There is no hard and fast rule that
> > __iter__ must return the object itself. It just needs to return an
> > iterator.
>
> I disagree.
> If ITRATOR is a true iterator, ITRATOR.__iter__() must return
> ITERATOR.
> If ITERABLE is an iterable (but not necessarily an iterator)
> ITRABLE.__iter__() must return an iterator.
>

You are correct: It is an iterable, not an iterator. However, that's
not a disagreement with me. It may not be an iterator (and I probably
should have said so) but it works, and it solves the OP's problem.

Did you try running my code? I did. It works on my computer. What
error message did you get?

Cheers,
Cliff


J. Clifford Dyer

unread,
May 22, 2009, 8:31:22 PM5/22/09
to Jan, pytho...@python.org
On Fri, 2009-05-22 at 10:54 -0700, Jan wrote:
> This produces an error because by definition of for-loops
> it is executed the same way as:
>
> temp_iterator = iter(y) # temp_iterator is y
> while True:
> try:
> print(next(temp_iterator)) # temp_iterator does not support
> __next__()
> except StopIteration:
> break
>

I think this is where you missed my point.

iter(y) actually returns an instance of class X, which does support
iteration. And it returns a new X each time, thus resetting the
iterator.

That exact setup might or might not support your use case. I don't
know, because you haven't described it. However, whatever you need done
to X to get it back in shape to reiterate over can be done in
Y.__iter__().

Honestly, do you care if it's an iterator or an iterable, so long as
python can handle the job?


0 new messages