I've been looking for an excuse to actually use a generator. I need to put
include guards on a bunch of C++ header files. So I started out to create
a generator that does what os.path.walk does, with yield in place of the
callback 'visitor' function.
It turned out to be not that simple; but Ive attached what I've got. I
feel that this is worth cluttering up the group with because it's simple
enough to grasp, but it actually does something.
Since this is a new way of thinking for me, I'm interested in how this
task cold be done more cleanly.
--
Doug Fort, Programmer
http:/www.dougfort.net
#!/usr/bin/env python
"""
A generator that walks a directory tree while returning pathnames
I release this code to the public. Do what you want with it. No warranty
or support is implied or available.
"""
from __future__ import generators # needed for Python 2.2
__copyright__ = "Copyright (C) Doug Fort -- Released to the public domain"
__author__ = "Doug Fort"
__version__ = 0, 0, 1
import os
import stat
import glob
def dirWalker(top):
"""
a generator that walks a directory tree
This code is based on os.path.walk, with the callback function
replaced by a yield and recursion replaced by crude iteration
"""
dirstack = [top]
while len(dirstack) > 0:
top = dirstack.pop()
try:
names = os.listdir(top)
except os.error:
return
yield top, names
for name in names:
name = os.path.join(top, name)
try:
st = os.lstat(name)
except os.error:
continue
if stat.S_ISDIR(st[stat.ST_MODE]):
dirstack.append(name)
def fileWalker(top, pattern):
"""
a generator to return all files in a directory tree
whose names match a (glob) pattern.
for example: '*.py' should find all python scripts
pattern a valid argument to glob
"""
walker = dirWalker(top)
for dirname, dircontent in walker:
dirpattern = os.path.join(dirname, pattern)
for fullname in filter(os.path.isfile, glob.glob(dirpattern)):
yield fullname
if __name__ == "__main__":
"""test driver -- looks for a starting dir in sys.args[1]"""
import sys
walker = dirWalker(sys.argv[1])
for dirname, dircontent in walker:
print dirname
print "filewalker"
walker = fileWalker(sys.argv[1], "*.py")
for filename in walker:
print filename
It's already quite nice.
But in 'fileWalker' i would try to avoid "duplicate" file-IO.
First you are calling dirWalker which calls 'os.listdir'
and in each (yielded) result you call 'glob.glob' which
scans the directory again.
regards,
holger
> __copyright__ = "Copyright (C) Doug Fort -- Released to the public
> domain"
Note that placing a work in the public domain explicitly waives all
copyrights, so this notice is faulty and could be the subject of further
confusion. Either it's copyrighted by you or it's public domain, but
not both.
--
Erik Max Francis / m...@alcyone.com / http://www.alcyone.com/max/
__ San Jose, CA, USA / 37 20 N 121 53 W / &tSftDotIotE
/ \ A good indignation brings out all one's powers.
\__/ Ralph Waldo Emerson
Church / http://www.alcyone.com/pyos/church/
A lambda calculus explorer in Python.
Why remove the recursion? It's the easiest way to write this code. I find
it much more straightforward than maintaining an explicit stack.
Recursive generators are a fantastic tool. They combine the power of
recusion with a simple, "linear" output format. They're also quite
efficient because yielding a value is almost an order of magnitude faster
than calling a Python function. Unless your recursion is really deep
yielding the values up the chain is not going to cause any significant
overheads.
Oren
> On Sat, Oct 12, 2002 at 05:22:40PM +0000, Doug Fort wrote:
>> def dirWalker(top):
>> """
>> a generator that walks a directory tree
>>
>> This code is based on os.path.walk, with the callback function
>> replaced by a yield and recursion replaced by crude iteration
>> """
>> dirstack = [top]
>> while len(dirstack) > 0:
> ...
>
> Why remove the recursion? It's the easiest way to write this code. I find
> it much more straightforward than maintaining an explicit stack.
>
I couldn't get the generator to work with recursion. I'd be grateful for
an example.
There's an example of a recursive tree-node generator in PEP 255. It's also
in Lib/test/test_generators.py, in the pep_tests doctest. They're very easy
to write once you get the hang of them, but it seems to require an "aha!"
breakthrough at the start. It took me about 10 years to convince Guido it
was easy <wink>.
Unlike Oren, though, I prefer an explicit stack for this purpose, as, e.g.,
it's easy to switch from depth-first to breadth-first just by popping from
the other end. Having an explicit list of pending directories also allows
easy enhancements like sorting display based on functions applied to the
directory names and/or contents.
On Sat, 12 Oct 2002, Doug Fort wrote:
> I'm interested in generators. I've read Dr. David Mertz's 'Charming
> Python' articles (http://gnosis.cx/publish/tech_index_cp.html), and
> the discussion here on c.l.p.
>
> I've been looking for an excuse to actually use a generator. I need to put
> include guards on a bunch of C++ header files. So I started out to create
> a generator that does what os.path.walk does, with yield in place of the
> callback 'visitor' function.
>
Here's one that I just wrote that actually does something, basically
combining two calls to an interface to AmziProlog into one convenient
package, the context is looking at sequentially numbered lines of text
where some line numbers appear twice (variants, errors, etc),
'next_line_num' is a 'deterministic' Prolog call that produces one answer,
'retrive_num_line' is a 'nondeterminstic one that produces multiple
answers. the results of the .run and .calls methods are basically
copies of the argument with the capital letters filled out by whatever
the Prolog engine can come up with, if anything, that makes the statements
true; the argument-positions are then accessible by indexing:
#
# generator to smooth out call-redo cycle. Might be better
# if the Prolog were also rethought.
#
def LineGetter(engine, linenum=0):
#
# Loops indefinitely until it kills itself
#
engine.clearCall()
while (1):
lino=engine.run("next_line_num(%d,X)"%linenum)
if lino == None:
return
linenum = int(lino[2])
for result in engine.calls("retrieve_num_line(%d,T,I,N,L)"%linenum):
yield result[1:6]
I'm addressing your comment that your code *does* something. I grant that it
*may* do something.
Instead of taking aim, try helping load the ammo.
"Avery Andrews" <and...@pcug.org.au> wrote in message
news:Pine.GSO.4.21.02101...@supreme.pcug.org.au...