Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

A simple generator application

1 view
Skip to first unread message

Doug Fort

unread,
Oct 12, 2002, 1:16:31 PM10/12/02
to
I'm interested in generators. I've read Dr. David Mertz's 'Charming
Python' articles (http://gnosis.cx/publish/tech_index_cp.html), and
the discussion here on c.l.p.

I've been looking for an excuse to actually use a generator. I need to put
include guards on a bunch of C++ header files. So I started out to create
a generator that does what os.path.walk does, with yield in place of the
callback 'visitor' function.

It turned out to be not that simple; but Ive attached what I've got. I
feel that this is worth cluttering up the group with because it's simple
enough to grasp, but it actually does something.

Since this is a new way of thinking for me, I'm interested in how this
task cold be done more cleanly.
--
Doug Fort, Programmer
http:/www.dougfort.net

Doug Fort

unread,
Oct 12, 2002, 1:18:59 PM10/12/02
to
the attachment

Doug Fort

unread,
Oct 12, 2002, 1:22:40 PM10/12/02
to
<can't get PAN to attach, here's the text>

#!/usr/bin/env python
"""
A generator that walks a directory tree while returning pathnames

I release this code to the public. Do what you want with it. No warranty
or support is implied or available.
"""
from __future__ import generators # needed for Python 2.2

__copyright__ = "Copyright (C) Doug Fort -- Released to the public domain"
__author__ = "Doug Fort"
__version__ = 0, 0, 1

import os
import stat
import glob

def dirWalker(top):
"""
a generator that walks a directory tree

This code is based on os.path.walk, with the callback function
replaced by a yield and recursion replaced by crude iteration
"""
dirstack = [top]
while len(dirstack) > 0:
top = dirstack.pop()
try:
names = os.listdir(top)
except os.error:
return
yield top, names
for name in names:
name = os.path.join(top, name)
try:
st = os.lstat(name)
except os.error:
continue
if stat.S_ISDIR(st[stat.ST_MODE]):
dirstack.append(name)

def fileWalker(top, pattern):
"""
a generator to return all files in a directory tree
whose names match a (glob) pattern.

for example: '*.py' should find all python scripts

pattern a valid argument to glob
"""
walker = dirWalker(top)
for dirname, dircontent in walker:
dirpattern = os.path.join(dirname, pattern)
for fullname in filter(os.path.isfile, glob.glob(dirpattern)):
yield fullname

if __name__ == "__main__":
"""test driver -- looks for a starting dir in sys.args[1]"""
import sys

walker = dirWalker(sys.argv[1])
for dirname, dircontent in walker:
print dirname

print "filewalker"
walker = fileWalker(sys.argv[1], "*.py")
for filename in walker:
print filename

holger krekel

unread,
Oct 12, 2002, 1:44:54 PM10/12/02
to
Doug Fort wrote:
> I've been looking for an excuse to actually use a generator. I need to put
> include guards on a bunch of C++ header files. So I started out to create
> a generator that does what os.path.walk does, with yield in place of the
> callback 'visitor' function.
>
> It turned out to be not that simple; but Ive attached what I've got. I
> feel that this is worth cluttering up the group with because it's simple
> enough to grasp, but it actually does something.
>
> Since this is a new way of thinking for me, I'm interested in how this
> task cold be done more cleanly.

It's already quite nice.

But in 'fileWalker' i would try to avoid "duplicate" file-IO.

First you are calling dirWalker which calls 'os.listdir'
and in each (yielded) result you call 'glob.glob' which
scans the directory again.

regards,

holger

Erik Max Francis

unread,
Oct 13, 2002, 1:10:54 AM10/13/02
to
Doug Fort wrote:

> __copyright__ = "Copyright (C) Doug Fort -- Released to the public
> domain"

Note that placing a work in the public domain explicitly waives all
copyrights, so this notice is faulty and could be the subject of further
confusion. Either it's copyrighted by you or it's public domain, but
not both.

--
Erik Max Francis / m...@alcyone.com / http://www.alcyone.com/max/
__ San Jose, CA, USA / 37 20 N 121 53 W / &tSftDotIotE
/ \ A good indignation brings out all one's powers.
\__/ Ralph Waldo Emerson
Church / http://www.alcyone.com/pyos/church/
A lambda calculus explorer in Python.

Oren Tirosh

unread,
Oct 13, 2002, 9:46:28 AM10/13/02
to
On Sat, Oct 12, 2002 at 05:22:40PM +0000, Doug Fort wrote:
> def dirWalker(top):
> """
> a generator that walks a directory tree
>
> This code is based on os.path.walk, with the callback function
> replaced by a yield and recursion replaced by crude iteration
> """
> dirstack = [top]
> while len(dirstack) > 0:
...

Why remove the recursion? It's the easiest way to write this code. I find
it much more straightforward than maintaining an explicit stack.

Recursive generators are a fantastic tool. They combine the power of
recusion with a simple, "linear" output format. They're also quite
efficient because yielding a value is almost an order of magnitude faster
than calling a Python function. Unless your recursion is really deep
yielding the values up the chain is not going to cause any significant
overheads.

Oren

Doug Fort

unread,
Oct 13, 2002, 12:42:09 PM10/13/02
to
On Sun, 13 Oct 2002 09:46:28 +0000, Oren Tirosh wrote:

> On Sat, Oct 12, 2002 at 05:22:40PM +0000, Doug Fort wrote:
>> def dirWalker(top):
>> """
>> a generator that walks a directory tree
>>
>> This code is based on os.path.walk, with the callback function
>> replaced by a yield and recursion replaced by crude iteration
>> """
>> dirstack = [top]
>> while len(dirstack) > 0:
> ...
>
> Why remove the recursion? It's the easiest way to write this code. I find
> it much more straightforward than maintaining an explicit stack.
>

I couldn't get the generator to work with recursion. I'd be grateful for
an example.

Tim Peters

unread,
Oct 13, 2002, 3:59:45 PM10/13/02
to
[Doug Fort]

> I couldn't get the generator to work with recursion. I'd be grateful
> for an example.

There's an example of a recursive tree-node generator in PEP 255. It's also
in Lib/test/test_generators.py, in the pep_tests doctest. They're very easy
to write once you get the hang of them, but it seems to require an "aha!"
breakthrough at the start. It took me about 10 years to convince Guido it
was easy <wink>.

Unlike Oren, though, I prefer an explicit stack for this purpose, as, e.g.,
it's easy to switch from depth-first to breadth-first just by popping from
the other end. Having an explicit list of pending directories also allows
easy enhancements like sorting display based on functions applied to the
directory names and/or contents.


Avery Andrews

unread,
Oct 12, 2002, 6:55:26 PM10/12/02
to

On Sat, 12 Oct 2002, Doug Fort wrote:

> I'm interested in generators. I've read Dr. David Mertz's 'Charming
> Python' articles (http://gnosis.cx/publish/tech_index_cp.html), and
> the discussion here on c.l.p.
>
> I've been looking for an excuse to actually use a generator. I need to put
> include guards on a bunch of C++ header files. So I started out to create
> a generator that does what os.path.walk does, with yield in place of the
> callback 'visitor' function.
>

Here's one that I just wrote that actually does something, basically
combining two calls to an interface to AmziProlog into one convenient
package, the context is looking at sequentially numbered lines of text
where some line numbers appear twice (variants, errors, etc),
'next_line_num' is a 'deterministic' Prolog call that produces one answer,
'retrive_num_line' is a 'nondeterminstic one that produces multiple
answers. the results of the .run and .calls methods are basically
copies of the argument with the capital letters filled out by whatever
the Prolog engine can come up with, if anything, that makes the statements
true; the argument-positions are then accessible by indexing:

#
# generator to smooth out call-redo cycle. Might be better
# if the Prolog were also rethought.
#
def LineGetter(engine, linenum=0):
#
# Loops indefinitely until it kills itself
#
engine.clearCall()
while (1):
lino=engine.run("next_line_num(%d,X)"%linenum)
if lino == None:
return
linenum = int(lino[2])
for result in engine.calls("retrieve_num_line(%d,T,I,N,L)"%linenum):
yield result[1:6]

Dilton McGowan II

unread,
Oct 14, 2002, 12:41:31 AM10/14/02
to
Guess I missed your point Avery, maybe it's overly intellectually
stimulating. Doug wrote a piece of code that does something programmers need
to do every day, iterate directory trees and work with files. (Though I
agree with Oren about recursion, also Erik made a good point about the
copyright.)

I'm addressing your comment that your code *does* something. I grant that it
*may* do something.

Instead of taking aim, try helping load the ammo.

"Avery Andrews" <and...@pcug.org.au> wrote in message
news:Pine.GSO.4.21.02101...@supreme.pcug.org.au...

0 new messages