Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

pushback iterator

26 views
Skip to first unread message

Matus

unread,
May 17, 2009, 10:39:38 AM5/17/09
to pytho...@python.org
Hallo pylist,

I searches web and python documentation for implementation of pushback
iterator but found none in stdlib.

problem:
========
when you parse a file, often you have to read a line from parsed file
before you can decide if you want that line it or not. if not, it would
be a nice feature to be able po push the line back into the iterator, so
nest time when you pull from iterator you get this 'unused' line.

solution:
=========
I found a nice and fast solution somewhere on the net:

---------------------------------------------------------------------
class Pushback_wrapper( object ):
def __init__( self, it ):
self.it = it
self.pushed_back = [ ]
self.nextfn = it.next

def __iter__( self ):
return self

def __nonzero__( self ):
if self.pushed_back:
return True

try:
self.pushback( self.nextfn( ) )
except StopIteration:
return False
else:
return True

def popfn( self ):
lst = self.pushed_back
res = lst.pop( )
if not lst:
self.nextfn = self.it.next
return res

def next( self ):
return self.nextfn( )

def pushback( self, item ):
self.pushed_back.append( item )
self.nextfn = self.popfn
---------------------------------------------------------------------

proposal:
=========
as this is (as I suppose) common problem, would it be possible to extend
the stdlib of python (ie itertools module) with a similar solution so
one do not have to reinvent the wheel every time pushback is needed?


thx, Matus

Mike Kazantsev

unread,
May 17, 2009, 11:22:22 AM5/17/09
to pytho...@python.org
On Sun, 17 May 2009 16:39:38 +0200
Matus <mat...@gmail.com> wrote:

> I searches web and python documentation for implementation of pushback
> iterator but found none in stdlib.
>
> problem:
> ========
> when you parse a file, often you have to read a line from parsed file
> before you can decide if you want that line it or not. if not, it would
> be a nice feature to be able po push the line back into the iterator, so
> nest time when you pull from iterator you get this 'unused' line.
>

...


>
> proposal:
> =========
> as this is (as I suppose) common problem, would it be possible to extend
> the stdlib of python (ie itertools module) with a similar solution so
> one do not have to reinvent the wheel every time pushback is needed?

Sounds to me more like an iterator with a cache - you can't really pull
the line from a real iterable like generator function and then just push
it back.
If this "iterator" is really a list then you can use it as such w/o
unnecessary in-out operations.

And if you're "pushing back" the data for later use you might just as
well push it to dict with the right indexing, so the next "pop" won't
have to roam thru all the values again but instantly get the right one
from the cache, or just get on with that iterable until it depletes.

What real-world scenario am I missing here?

--
Mike Kazantsev // fraggod.net

signature.asc

Mike Kazantsev

unread,
May 17, 2009, 12:15:52 PM5/17/09
to Matus, pytho...@python.org
Somehow, I got the message off the list.

On Sun, 17 May 2009 17:42:43 +0200
Matus <mat...@gmail.com> wrote:

> > Sounds to me more like an iterator with a cache - you can't really pull
> > the line from a real iterable like generator function and then just push
> > it back.
>

> true, that is why you have to implement this iterator wrapper

I fail to see much point of such a dumb cache, in most cases you
shouldn't iterate again and again thru the same sequence, so what's
good hardcoding (and thus, encouraging) such thing will do?

Besides, this wrapper breaks iteration order, since it's cache is LIFO
instead of FIFO, which should rather be implemented with deque instead
of list.

> > If this "iterator" is really a list then you can use it as such w/o
> > unnecessary in-out operations.
>

> of course, it is not a list. you can wrap 'real' iterator using this
> wrapper (), and voila, you can use pushback method to 'push back' item
> received by next method. by calling next again, you will get pushed back
> item again, that is actually the point.

Wrapper differs from "list(iterator)" in only one thing: it might not
make it to the end of iterable, but if "pushing back" is common
operation, there's a good chance you'll make it to the end of the
iterator during execution, dragging whole thing along as a burden each
time.

> > And if you're "pushing back" the data for later use you might just as
> > well push it to dict with the right indexing, so the next "pop" won't
> > have to roam thru all the values again but instantly get the right one
> > from the cache, or just get on with that iterable until it depletes.
> >
> > What real-world scenario am I missing here?
> >
>

> ok, I admit that that the file was not good example. better example
> would be just any iterator you use in your code.

Somehow I've always managed to avoid such re-iteration scenarios, but
of course, it could be just my luck ;)

signature.asc

Luis Alberto Zarrabeitia Gomez

unread,
May 17, 2009, 12:47:35 PM5/17/09
to Mike Kazantsev, pytho...@python.org

Quoting Mike Kazantsev <mk.fr...@gmail.com>:

> And if you're "pushing back" the data for later use you might just as
> well push it to dict with the right indexing, so the next "pop" won't
> have to roam thru all the values again but instantly get the right one
> from the cache, or just get on with that iterable until it depletes.
>
> What real-world scenario am I missing here?

Other than one he described in his message? Neither of your proposed solutions
solves the OP's problem. He doesn't have a list (he /could/ build a list, and
thus defeat the purpose of having an iterator). He /could/ use alternative data
structures, like the dictionary you are suggesting... and he is, he is using his
pushback iterator, but he has to include it over and over.

Currently there is no good "pythonic" way of building a functions that decide to
stop consuming from an iterator when the first invalid input is encountered:
that last, invalid input is lost from the iterator. You can't just abstract the
whole logic inside the function, "something" must leak.

Consider, for instance, the itertools.dropwhile (and takewhile). You can't just
use it like

i = iter(something)
itertools.dropwhile(condition, i)
# now consume the rest

Instead, you have to do this:

i = iter(something)
i = itertools.dropwhile(condition, i)
# and now i contains _another_ iterator
# and the first one still exists[*], but shouldn't be used
# [*] (assume it was a parameter instead of the iter construct)

For parsing files, for instance (similar to the OP's example), it could be nice
to do:

f = file(something)
lines = iter(f)
parse_headers(lines)
parse_body(lines)
parse_footer(lines)

which is currently impossible.

To the OP: if you don't mind doing instead:

f = file(something)
rest = parse_headers(f)
rest = parse_body(rest)
rest = parse_footer(rest)

you could return itertools.chain([pushed_back], iterator) from your parsing
functions. Unfortunately, this way will add another layer of itertools.chain on
top of the iterator, you will have to hope this will not cause a
performace/memory penalty.

Cheers,

--
Luis Zarrabeitia
Facultad de Matem�tica y Computaci�n, UH
http://profesores.matcom.uh.cu/~kyrie

--
Participe en Universidad 2010, del 8 al 12 de febrero de 2010
La Habana, Cuba
http://www.universidad2010.cu

Matus

unread,
May 17, 2009, 4:17:17 PM5/17/09
to Luis Alberto Zarrabeitia Gomez, Mike Kazantsev, pytho...@python.org

that is basically one of many possible scenarios I was referring to.
other example would be:

----------------------------------------------------
iter = Pushback_wrapper( open( 'my.file' ).readlines( ) )
for line in iter:
if is_outer_scope( line ):
'''
do some processing for this logical scope of file. there is only fet
outer scope lines
'''
continue

for line in iter:
'''
here we expect 1000 - 2000 lines of inner scope and we do not want to
run is_outer_scope()
for every line as it is expensive, so we decided to reiterate
'''
if is_inner_scope( line ):
'''
do some processing for this logical scope of file untill outer scope
condition occurs
'''
elif is_outer_scope( line ):
iter.pushback( line )
break
else:
'''flush line'''
----------------------------------------------------

Simon Forman

unread,
May 22, 2009, 7:33:41 PM5/22/09
to


By coincidence I had need of just such a pushback iterator just the
other day. I was playing with queues and generators after reading
David Beazley's "Generator Tricks For Systems Programmers" [1] and I
needed an iterator that could be used with this generator function:

def pushQueueNonblocking(queue, iterator):
for item in iterator:
try:
queue.put(item, False)
except Queue.Full:
raise Queue.Full(item)

If the queue is full I pass item taken from the iterator but not yet
pushed onto the queue up to the caller in the Full exception so it
doesn't get lost. I wrote a wrapper function that retried putting
items on the queue after sleeping for a certain time if the queue is
full.

def pushQueueRetry(queue, iterator, retry_delay=1):
iterator = IteratorWithPushback(iterator)
while True:
try:
pushQueueNonblocking(queue, iterator)
except Queue.Full, err:
iterator.pushback(err.args[0])
if retry_delay >= 0:
sleep(retry_delay)
else:
break

Originally I used an itertools.chain() solution, but when I tried the
function with sleep of 0 (the queue was being get() from once a second
in another thread) I hit a funny error.

Because the put() thread was running, and retrying, much faster than
the get() thread, a huge "linked list" of itertools.chain generator
objects was stored up in memory. When the get() thread finally
unblocked the queue and the call to put() succeeded, the
pushQueueNonblocking() would loop and call next() on the chain object,
which would call next() on the next chain object, which called next()
on the next next chain object, etc...

That linked list of chain iterators was longer than
sys.getrecursionlimit().

Anyway, FWIW here's the IteratorWithPushback class that I whanged up:


class IteratorWithPushback:

def __init__(self, initial_iterator):
self.iter = iter(initial_iterator)
self.last = None
self.has_last = False

def __iter__(self):
return self

def next(self):
if self.has_last:
item = self.last
self.has_last = False
self.last = None # Don't hang on to it.
else:
item = self.iter.next()
return item

def pushback(self, item):
if self.has_last:
raise Exception("I'm full!")
self.last = item
self.has_last = True

As for putting something like this in the standard library, I'm not
sure. I've never needed one before and it's easy enough to write.
*shrug*


Regards,
~Simon

[1] http://www.dabeaz.com/generators/Generators.pdf

0 new messages