Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Nested generators is the python equivalent of unix pipe cmds.

9 views
Skip to first unread message

Steven D. Majewski

unread,
Aug 3, 2001, 1:44:21 AM8/3/01
to pytho...@python.org

## more on combining generators
## (inspired by previous discussion and more recent one on python-dev)
## Nested generators is the python equivalent of
## unix pipeline commands. -- Steve

from __future__ import generators

## You need:

## a generator: (you can also start with a list)

def Ints():
n = 0
while 1:
yield n
n += 1


## one or more filters:

def Test( gen, test ):
for x in gen:
if test(x): yield x

## a terminator:

## ( by condition... )

def Quit( gen, test ):
for x in gen:
if test(x): raise StopIteration
else: yield x

## ( or by count... )

def Count( gen, n ):
for x in gen:
yield x
n -= 1
if n == 0 : break


## shorthand names so the lines don't get too long:

odd = lambda x: Test( x, lambda y: y % 2 )
enough = lambda x: Quit( x, lambda y: y > 100 )
notdiv3 = lambda x: Test( x, lambda y: y % 3 )

## examples:

print "\n odd ints that are not divisible by 3 under 100:"
for i in notdiv3( enough( odd( Ints() ))):
print i


print "\n first 20 odd ints not divisible by 3:"
for i in Count( notdiv3(odd(Ints())), 20 ):
print i


# recursive file iterator as a generator:

from os import listdir, path

def Files( start ):
for file in listdir( start ):
file = path.join( start, file )
if path.isfile( file ): yield file
elif path.isdir(file):
for more in Files( file ):
yield more


isGif = lambda s: s.lower().find('.gif') >= 0

# This is MUCH nicer than using os.path.walk() with a callback!

## print the first 20 gif files I can find in your cwd...
for f in Count(Test( Files('.'), isGif ), 20 ): print f


Tim Hochberg

unread,
Aug 3, 2001, 10:22:16 AM8/3/01
to

"Steven D. Majewski" <sd...@Virginia.EDU> wrote in message>

[Much cool generator stuff deleted]

# This is MUCH nicer than using os.path.walk() with a callback!
>
> ## print the first 20 gif files I can find in your cwd...
> for f in Count(Test( Files('.'), isGif ), 20 ): print f

Very cool!

Although I do think it would be easier to read if the order of the test and
the generator was reversed:

for f in Count(20, Test(isGif, Files('.'))): print f

I may just have a weak mind, but on the first version, I lost track of who
the 20 belonged to by the time I got to the end.

-tim


Steven D. Majewski

unread,
Aug 3, 2001, 2:06:28 PM8/3/01
to Tim Hochberg, pytho...@python.org

On Fri, 3 Aug 2001, Tim Hochberg wrote:

> "Steven D. Majewski" <sd...@Virginia.EDU> wrote in message>
>

> # This is MUCH nicer than using os.path.walk() with a callback!
> >
> > ## print the first 20 gif files I can find in your cwd...
> > for f in Count(Test( Files('.'), isGif ), 20 ): print f
>
> Very cool!
>
> Although I do think it would be easier to read if the order of the test and
> the generator was reversed:
>
> for f in Count(20, Test(isGif, Files('.'))): print f
>
> I may just have a weak mind, but on the first version, I lost track of who
> the 20 belonged to by the time I got to the end.

I think you're probably right about swapping the arg order making it
more readable, Tim.

Those generators, or something similar, are probably good candidates
for a standard generator utility module.

The Files generator could also take multiple tests, so they wouldn't
have to be nested, and maybe also default paramater args for some
common things like file-extension, glob-patterns, dates, etc.
( Basically the sort of stuff in the unix 'find' command.
Long ago, I recall someone doing a python equivalent of unix 'file'
matching 'magic numbers' -- that might be a nice addition. )


-- Steve


Terry Reedy

unread,
Aug 3, 2001, 3:50:13 PM8/3/01
to

"Tim Hochberg" <tim.ho...@ieee.org> wrote in message
news:ssya7.78137$Cy.11...@news1.rdc1.az.home.com...

> Although I do think it would be easier to read if the order of the
test and
> the generator was reversed:
>
> for f in Count(20, Test(isGif, Files('.'))): print f
>
> I may just have a weak mind, but on the first version, I lost track
of who
> the 20 belonged to by the time I got to the end.

I noticed the same. A genpipe wrapping rclass would give us the very
clear:

for f in gen(Files('.')).Test(isGif).Count(20)

My untested idea for genpipe:

class genpipe:
def __init__(self, generator):
self.gen = generator #sanity check omitted
def Test(self, pred):
self.gen = _Test(self.gen, pred) # ditto for Count and other
filters
def __iter__(self):
return self.gen.next

Terry J. Reedy


Steven D. Majewski

unread,
Aug 3, 2001, 4:27:38 PM8/3/01
to Terry Reedy, pytho...@python.org

On Fri, 3 Aug 2001, Terry Reedy wrote:

>
> I noticed the same. A genpipe wrapping rclass would give us the very
> clear:
>
> for f in gen(Files('.')).Test(isGif).Count(20)
>
> My untested idea for genpipe:

>========<

Thanks, Terry.
Tested and corrected:

>
> class genpipe:
> def __init__(self, generator):
> self.gen = generator #sanity check omitted
> def Test(self, pred):
> self.gen = _Test(self.gen, pred) # ditto for Count and other

--> return self

> filters
> def __iter__(self):

--> return self.gen

##> return self.gen.next


>>> for x in Genpipe( Files('.')).Test( isGif ).Count(10):
... print x
...
./aaTristan/usd.gif
./aaTristan/welcome_dude2.gif
./Desktop/coffee-bg.gif
./Desktop/New Dowloads/xlsappdoc/data-classes.GIF
./Desktop/New Dowloads/xlsappdoc/graph-overlay-classes.GIF
./Desktop/sdm7g-3.GIF
./Documents/Dilbert/941202.gif
./Documents/Dilbert/adding-staff.gif
./Documents/Dilbert/behind-schedule.gif
./Documents/Dilbert/d941231.gif


-- Steve


Steven D. Majewski

unread,
Aug 3, 2001, 5:00:25 PM8/3/01
to Terry Reedy, pytho...@python.org


... and if you define an __or__ method ( "|" operator), you can make it
look more like a real unix pipeline :

class Gen:
def __init__( self, generator ):
self.generator = generator
def __iter__( self ):
return self.generator

class Genpipe(Gen):
def Test( self, pred ):
self.generator = Test( self.generator, pred )
return self
def Count( self, n ):
self.generator = Count( self.generator, n )
return self
def __or__( self, other ):
if callable(other):
self.generator = Test( self.generator, other )
return self

# not an all-the-options-complete implementation!

>>> for x in Genpipe(Files('.')).Count( 100 ) | isGif :
... print x

I find the separation by space-operator-space to make it more readable,
but I think that overriding of the __or__ operator might be a bit
confusing -- If anything an __and__ ( "&" ) or a shift ">>" might
make a but more logical sense.


Perhaps "+" should be concatenation of generators:

gen1 + gen2 + gen3

means do gen1 until empty, then gen2, ...

"|" could be alternation: one from column (generator) A, one from column B

We need one for the generator equivalent of 'zip' :

genA <op> genB

generates (A0,B0), (A1,B1), (A2,B2), ...

-- Steve


Alex Martelli

unread,
Aug 6, 2001, 12:50:29 PM8/6/01
to
"Tim Hochberg" <tim.ho...@ieee.org> wrote in message
news:ssya7.78137$Cy.11...@news1.rdc1.az.home.com...
...

> Although I do think it would be easier to read if the order of the test
and
> the generator was reversed:
>
> for f in Count(20, Test(isGif, Files('.'))): print f

Agreed, but please don't spell it 'Count' -- 'take(20, whatever)' appears
to me the 'obviously right way' to express "take the first 20 items of
whatever" (Haskell...), while 'count' (in whatever case) suggests to me
a function that counts things (and returns the number it has counted)...


Alex

0 new messages