Request for feedback on my first Python program

Scott Meyers

unread,

May 30, 2003, 2:14:45 AM5/30/03

to

I'm a C++ programmer who's writing his very first Python program. This
means the program is going to be gross, and I apologize for that in
advance. I don't really have anybody I can show it to for feedback, so I'm
hoping I can get some comments here. If there is a better place for me to
seek guidance, please let me know.

The program is supposed to read a file containing directory and file names,
one per line. (The file can also contain comment and blank lines, which
should be ignored.) For each file or directory name, the program should
print out whether it's a directory, a file, or neither. That's it.

Here's the code; you may want to hold your nose:

import sys
import os
import string

# Error status codes
BadInvocation = 1

def usage(programName):
print "Usage: " + os.path.basename(programName) + " [file]"
sys.exit(BadInvocation)

# Take a list of strings, return an equivalent list where the following
# strings have been removed:
# - those with no non-whitespace characters
# - those whose first non-whitespace character is a "#"
def getMeaningfulLines(lines):
nonCommentLines = []
for i in range(0,len(lines)):
try:
firstToken = string.split(lines[i])[0]
if firstToken[0] != "#":
nonCommentLines.append(lines[i])
except IndexError:
continue
return nonCommentLines

# if this language had a main(), it'd be here...
if len(sys.argv) != 2: usage(sys.argv[0])

lines = getMeaningfulLines(open(sys.argv[1]).readlines())
for i in lines:
print i + " is a ",
if os.path.isdir(i): print "directory"
elif os.path.isfile(i): print "file"
else: print "non-directory and non-file"

In addition to the numerious stylistic gaffes I'm sure I've made, the
program also doesn't work correctly. Given this input file,

d:\
d:\temp\foo.py

I get this output:

d:\
is a non-directory and non-file
d:\temp\foo.py
is a non-directory and non-file

Aside from being ugly (how do I get rid of the newline that follows each
directory or file name?), the problem is that the first entry IS a
directory and the second one IS a file. So clearly I'm doing something
wrong. Any idea what it is?

Thanks very much in advance.

Scott

Andreas Jung

unread,

May 30, 2003, 2:40:37 AM5/30/03

to

It's nice to see that one of the world most famous C++ guys uses Python
too ...welcome :-)

Andreas

--On Donnerstag, 29. Mai 2003 23:14 Uhr -0700 Scott Meyers
<Use...@aristeia.com> wrote:

> --
> http://mail.python.org/mailman/listinfo/python-list

Achim Domma

unread,

May 30, 2003, 3:17:40 AM5/30/03

to

"Scott Meyers" <Use...@aristeia.com> wrote in message
news:MPG.194096b51...@news.hevanet.com...

> hoping I can get some comments here. If there is a better place for me to
> seek guidance, please let me know.

I think you have found the right place! ;-)

> Here's the code; you may want to hold your nose:

[...]

Here is how I would have written your code. Might not be perfect, but should
give you some ideas on how to use python:

from __future__ import generators
from os.path import isdir,isfile,basename
from sys import argv, exit

if len(argv)!=2:
print "Usage: " + basename(argv[0]) + " [file]"
exit(1)

def meaningfulLines(lines):
# the file object passed can be used as iterator
# it iterates over the lines in the file
for line in lines:
# strip removes the whitespaces
line = line.strip()
if not line or line.startswith('#'):
continue
# see http://www.python.org/doc/2.2.2/whatsnew/node5.html for
# details about generators
yield line

# meaningfulLines returns a generator which can be used
# to iterate over
for line in meaningfulLines(file(argv[1])):
print line,"is a",
if isdir(line): print "directory"
elif isfile(line): print "file"

else: print "non-directory and non-file"

The output is as expected, so I don't know why your code does not work for
you.

regards,
Achim

PS.: Very nice to see more and more top C++ expert interested in Python!

Paul Rubin

unread,

May 30, 2003, 3:29:36 AM5/30/03

to

Scott Meyers <Use...@aristeia.com> writes:

> I'm a C++ programmer who's writing his very first Python program. This
> means the program is going to be gross, and I apologize for that in
> advance. I don't really have anybody I can show it to for feedback, so I'm
> hoping I can get some comments here. If there is a better place for me to
> seek guidance, please let me know.
>
> The program is supposed to read a file containing directory and file names,
> one per line. (The file can also contain comment and blank lines, which
> should be ignored.) For each file or directory name, the program should
> print out whether it's a directory, a file, or neither. That's it.
>
> Here's the code; you may want to hold your nose:
>
> import sys
> import os
> import string
>
> # Error status codes
> BadInvocation = 1
>
> def usage(programName):
> print "Usage: " + os.path.basename(programName) + " [file]"
> sys.exit(BadInvocation)
>
> # Take a list of strings, return an equivalent list where the following
> # strings have been removed:
> # - those with no non-whitespace characters
> # - those whose first non-whitespace character is a "#"

> def getMeaningfulLines(lines): ...
> ...

I'd write that function like this:

def is_meaningful(line):
# return 1 if line is meaningful, 0 otherwise
line = line.strip() # remove leading and trailing whitespace
return (line and line[0] != '#')

And then

> lines = getMeaningfulLines(open(sys.argv[1]).readlines())

becomes

# read the lines and discard the non-meaningful ones
lines = filter(is_meaningful, open(sys.argv[1]).readlines())

For

> # if this language had a main(), it'd be here...
> if len(sys.argv) != 2: usage(sys.argv[0])

Generally I like to supply a main() anyway and call it:

def main():
bla blah
bla blah

main()

> Aside from being ugly (how do I get rid of the newline that follows each
> directory or file name?), the problem is that the first entry IS a
> directory and the second one IS a file. So clearly I'm doing something
> wrong. Any idea what it is?

Maybe there was an extra space or something.

Achim Domma

unread,

May 30, 2003, 3:34:32 AM5/30/03

to

"Scott Meyers" <Use...@aristeia.com> wrote in message
news:MPG.194096b51...@news.hevanet.com...

> Aside from being ugly (how do I get rid of the newline that follows each

> directory or file name?), the problem is that the first entry IS a
> directory and the second one IS a file. So clearly I'm doing something
> wrong. Any idea what it is?

The problem is the newline at the end of the line. If you remove it, for
example with 'strip', your code will work.

Achim

Ben Finney

unread,

May 30, 2003, 3:38:41 AM5/30/03

to

On Thu, 29 May 2003 23:14:45 -0700, Scott Meyers wrote:
> I'm a C++ programmer who's writing his very first Python program.

Glad to see you've come across to try the waters here, Scott. Those
of us who grew up on your C++ books were wondering when you'd show :-)

Before I try debugging your code:

> # if this language had a main(), it'd be here...

This is done in Python with:

if( __name__ == '__main__' ):
stuff_to_do_when_called_as_a_script()

This promotes reuse of *.py files as modules, since the same file can be
either run as a script or 'import'ed.

--
\ "Probably the earliest flyswatters were nothing more than some |
`\ sort of striking surface attached to the end of a long stick." |
_o__) -- Jack Handey |
http://bignose.squidly.org/ 9CFE12B0 791A4267 887F520C B7AC2E51 BD41714B

Ben Finney

unread,

May 30, 2003, 3:51:13 AM5/30/03

to

On Thu, 29 May 2003 23:14:45 -0700, Scott Meyers wrote:

> lines = getMeaningfulLines(open(sys.argv[1]).readlines())
> for i in lines:
> print i + " is a ",
> if os.path.isdir(i): print "directory"
> elif os.path.isfile(i): print "file"
> else: print "non-directory and non-file"

The readlines() method doesn't strip line endings. Try this:

lines = getMeaningfulLines(open(sys.argv[1]).readlines())
for i in lines:

filename = i.strip()
print filename + " is a ",
if os.path.isdir(filename): print "directory"
elif os.path.isfile(filename): print "file"

else: print "non-directory and non-file"

The strip() method will strip all leading and trailing whitespace from
the string, where whitespace includes space, tab, CR, LF, FF, VT.

--
\ "God forbid that any book should be banned. The practice is as |
`\ indefensible as infanticide." -- Dame Rebecca West |
_o__) |

Martin Franklin

unread,

May 30, 2003, 3:46:08 AM5/30/03

to

Scott Meyers wrote:
> I'm a C++ programmer who's writing his very first Python program. This
> means the program is going to be gross, and I apologize for that in
> advance. I don't really have anybody I can show it to for feedback, so I'm
> hoping I can get some comments here. If there is a better place for me to
> seek guidance, please let me know.
>
> The program is supposed to read a file containing directory and file names,
> one per line. (The file can also contain comment and blank lines, which
> should be ignored.) For each file or directory name, the program should
> print out whether it's a directory, a file, or neither. That's it.
>
> Here's the code; you may want to hold your nose:
>
> import sys
> import os
> import string

as of Python 2.0 the string module is not needed (string's have methods...)

>
> # Error status codes
> BadInvocation = 1

# conventionally all CAPS for constants
BADINVOCATION = 1

>
> def usage(programName):
> print "Usage: " + os.path.basename(programName) + " [file]"

print "Usage:", os.path.basename(programName), "[file]"
# Normally bad practice to add strings together comma inserts a space so...

> sys.exit(BadInvocation)
>
> # Take a list of strings, return an equivalent list where the following
> # strings have been removed:
> # - those with no non-whitespace characters
> # - those whose first non-whitespace character is a "#"
> def getMeaningfulLines(lines):
> nonCommentLines = []
> for i in range(0,len(lines)):

lines is a python list object so you can iterate over it's items directly

def getMeaningfulLines(lines):
nonCommentLines = []
for line in lines:
if line.startswith("#") or not line:
continue
else:
# call line.strip() to get rid of unwanted \n\r at EOL
# in fact this is what was causing your program to fail....
nonCommentLines.append(line.strip())
return nonCommentLines

> try:
> firstToken = string.split(lines[i])[0]
> if firstToken[0] != "#":
> nonCommentLines.append(lines[i])
> except IndexError:
> continue
> return nonCommentLines
>
>
> # if this language had a main(), it'd be here...

# it does..
if __name__=="__main__":
# this part only executed when called as first arg to python
# not when imported

> if len(sys.argv) != 2: usage(sys.argv[0])
>
> lines = getMeaningfulLines(open(sys.argv[1]).readlines())
> for i in lines:
> print i + " is a ",
> if os.path.isdir(i): print "directory"
> elif os.path.isfile(i): print "file"
> else: print "non-directory and non-file"
>
> In addition to the numerious stylistic gaffes I'm sure I've made, the
> program also doesn't work correctly. Given this input file,
>
> d:\
> d:\temp\foo.py
>
> I get this output:
>
> d:\
> is a non-directory and non-file
> d:\temp\foo.py
> is a non-directory and non-file
>
> Aside from being ugly (how do I get rid of the newline that follows each
> directory or file name?), the problem is that the first entry IS a
> directory and the second one IS a file. So clearly I'm doing something
> wrong. Any idea what it is?
>
> Thanks very much in advance.
>
> Scott

So here it is (tested on my linux machine only....)

import sys
import os

# Error status codes
# conventionally all CAPS for constants
BADINVOCATION = 1

def usage(programName):
print "Usage:", os.path.basename(programName), "[file]"
sys.exit(BADINVOCATION)

# Take a list of strings, return an equivalent list where the following
# strings have been removed:
# - those with no non-whitespace characters
# - those whose first non-whitespace character is a "#"
def getMeaningfulLines(lines):
nonCommentLines = []

for line in lines:
if line.startswith("#") or not line:
continue
else:
nonCommentLines.append(line.strip())
return nonCommentLines

# if this language had a main(), it'd be here...

# it does.. we just spell it like so:-
if __name__=="__main__":
# this part only executed when called as first arg to python
# not when this module is imported

if len(sys.argv) != 2:
usage(sys.argv[0])

lines = getMeaningfulLines(open(sys.argv[1]).readlines())
for i in lines:

print i, "is a",

if os.path.isdir(i):
print "directory"
elif os.path.isfile(i):
print "file"
else:
print "non-directory and non-file"

HTH
Martin

Bernard Delmée

unread,

May 30, 2003, 3:59:23 AM5/30/03

to

> # if this language had a main(), it'd be here...

In python, the customary idiom would be:

if __name__ == '__main__':
# main script or call to function

This way your script works both standalone and as a module if needed.

> Aside from being ugly (how do I get rid of the newline that follows
> each directory or file name?), the problem is that the first entry IS
> a directory and the second one IS a file. So clearly I'm doing
> something wrong. Any idea what it is?

As shown in your output, the newlines are what cause the "ugly" *and*
wrong result. Every line returned by readlines() ends with a NL which
you need to strip before accessing/testing the file it represents.

So insted of
> for i in lines:
> # ...
You could say
for l in lines:
i = l.strip()
# ...

Now, when can we expect "effective python" ?-)
But then python does not nearly have the pitfalls and idiosyncrasies
of C++, which one can only truly comprehend after reading your very
fine books. Thanks for these.

Regards,

Bernard.

Steven D'Aprano

unread,

May 30, 2003, 4:59:57 AM5/30/03

to

Apologies if this reply arrives twice, I've had some problems with
posting news articles today.

Scott Meyers <Use...@aristeia.com> wrote in message news:<MPG.194096b51...@news.hevanet.com>...

> I'm a C++ programmer who's writing his very first Python program. This
> means the program is going to be gross, and I apologize for that in
> advance. I don't really have anybody I can show it to for feedback, so I'm
> hoping I can get some comments here. If there is a better place for me to
> seek guidance, please let me know.

I don't pretend that this is the best or most Pythonic way of doing
what you ask, since I'm just a beginner myself.

(Tested but not guaranteed totally bug-free.)

Regards,

Steven.

[start code]

import sys, os, string

# Error status codes
BadInvocation = 1

def usage(programName):
"Print usage information and die."
print "Usage: %s [file]" % os.path.basename(programName)
sys.exit(BadInvocation)

def fileKind(pathname):
"""Given a file path, return the kind of file it is.
Currently only distinguishes between directories, regular files, and
everything else."""
if os.path.isdir(pathname):
return "directory"
elif os.path.isfile(pathname):
return "file"
else:
return "something else"

def listFileKinds(files):
"""Given a list of file paths, list each one and the kind of file
it is."""
for line in files:
line = string.strip(line) # delete leading/trailing whitespace
if (not line) or (string.split(line)[0][0] == "#"):
# ignore blank lines and comments
continue
print "%s is a %s." % (line, fileKind(line))

if __name__ == "__main__":

if len(sys.argv) != 2:
usage(sys.argv[0])

else:
listFileKinds(open(sys.argv[1]).readlines())

[end code]

Max M

unread,

May 30, 2003, 5:03:55 AM5/30/03

to

Scott Meyers wrote:
> I'm a C++ programmer who's writing his very first Python program. This
> means the program is going to be gross, and I apologize for that in
> advance. I don't really have anybody I can show it to for feedback, so I'm
> hoping I can get some comments here. If there is a better place for me to
> seek guidance, please let me know.

This is probably the best place around.

I would only do a few things differently. The "getMeaningfulLines()" was
way more complicated than it needed to be.

###################################

import sys
import os

# Error status codes
BADINVOCATION = 1

def usage(programName):
print "Usage: " + os.path.basename(programName) + " [file]"

sys.exit(BADINVOCATION)

# Take a list of strings, return an equivalent list where the following
# strings have been removed:
# - those with no non-whitespace characters
# - those whose first non-whitespace character is a "#"
def getMeaningfulLines(lines):

meaningfulLines = []
for line in lines:
line = line.strip()
if line and line[0] != '#':
meaningfulLines.append(line)
return meaningfulLines

# This is how a "main" is normally done in Python.
if __name__ == '__main__':

if len(sys.argv) != 2: usage(sys.argv[0])

paths = getMeaningfulLines(open(sys.argv[1]).readlines())
for path in paths:
fType = "non-directory and non-file"
if os.path.isdir(path):
fType = "directory"
elif os.path.isfile(path):
fType = "file"
print "%s line is a %s" % (path, fType)

--

hilsen/regards Max M Rasmussen, Denmark

http://www.futureport.dk/
Fremtiden, videnskab, skeptiscisme og transhumanisme

Ganesan R

unread,

May 30, 2003, 4:33:11 AM5/30/03

to

>>>>> "Scott" == Scott Meyers <Use...@aristeia.com> writes:

> I'm a C++ programmer who's writing his very first Python program.

Are you "the" Scott Meyers? Wow. Welcome :-).

The python idiom to iterate through lines in a file is

for line in file:
do_something()

readlines() will slurp in the whole file into memory.

> if firstToken[0] != "#":
> nonCommentLines.append(lines[i])

The python idiom for this is if lines[i].startswith("#")

So you can do something like

for line in file:
if line and line.startswith("#"):
line.rstrip() # strip \n and other white space
....

Ganesan

--
Ganesan R (rganesan at debian dot org) | http://www.debian.org/~rganesan/
1024D/5D8C12EA, fingerprint F361 84F1 8D82 32E7 1832 6798 15E0 02BA 5D8C 12EA

Max M

unread,

May 30, 2003, 5:31:04 AM5/30/03

to

Ganesan R wrote:

> readlines() will slurp in the whole file into memory.

That is most likely not a problem in this case.

> for line in file:
> if line and line.startswith("#"):
> line.rstrip() # strip \n and other white space

The spec said where the first non-whitespace character is #. This means
that there can be whitespace characters before the #.

So it is better to strip first to remove those.

Also strip() returns a string, it dosn't mutate the one you are in:

for line in file:
line = line.strip()

if line and line.startswith("#"):

# etc.

also:

if line and line.startswith("#"):

Will only give you the comments, it should be:

if line and not line.startswith("#"):

But you could just as well write:

if line and line[0] != "#":

Cliff Wells

unread,

May 30, 2003, 5:18:29 AM5/30/03

to

On Thu, 2003-05-29 at 23:14, Scott Meyers wrote:
> I'm a C++ programmer who's writing his very first Python program. This
> means the program is going to be gross, and I apologize for that in
> advance. I don't really have anybody I can show it to for feedback, so I'm
> hoping I can get some comments here. If there is a better place for me to
> seek guidance, please let me know.

Welcome. There is no better place.

> The program is supposed to read a file containing directory and file names,
> one per line. (The file can also contain comment and blank lines, which
> should be ignored.) For each file or directory name, the program should
> print out whether it's a directory, a file, or neither. That's it.

Not too bad for a first shot, but for something this simple, I don't
think I'd bother having a function to parse the data:

import sys
import os

# Error status codes

returnCodes = {
'Bad Invocation': 1,
}

# ----------------------------------------------------------------------

def usage(programName):
print "Usage: " + os.path.basename(programName) + " [file]"

sys.exit(returnCodes['Bad Invocation'])

# ----------------------------------------------------------------------
if __name__ == '__main__':

if len(sys.argv) != 2:
usage(sys.argv[0])

for line in open(sys.argv[1], 'r'):
line = line.strip()
if line[0] == '#':
continue

print line, "is a",

if os.path.isdir(line):
print "directory"
elif os.path.isfile(line):

print "file"
else:
print "non-directory and non-file"

Regards,
Cliff

Ganesan R

unread,

May 30, 2003, 5:50:54 AM5/30/03

to

>>>>> "Max" == Max M <ma...@mxm.dk> writes:

> Ganesan R wrote:
>> readlines() will slurp in the whole file into memory.

> That is most likely not a problem in this case.

Right. Others already pointed out that the trailing new line was the
problem.

>> for line in file:
>> if line and line.startswith("#"):
>> line.rstrip() # strip \n and other white space

> The spec said where the first non-whitespace character is #. This
> means that there can be whitespace characters before the #.

> So it is better to strip first to remove those.

You're right.

> Also strip() returns a string, it dosn't mutate the one you are in:

Right again. Thanks. Moral: don't post untested code to the list (as if I am
going to follow that one :-)

Giles Brown

unread,

May 30, 2003, 6:50:17 AM5/30/03

to

Scott Meyers <Use...@aristeia.com> wrote in message news:<MPG.194096b51...@news.hevanet.com>...

> I'm a C++ programmer who's writing his very first Python program. This
> means the program is going to be gross, and I apologize for that in
> advance. I don't really have anybody I can show it to for feedback, so I'm
> hoping I can get some comments here. If there is a better place for me to
> seek guidance, please let me know.
>
> The program is supposed to read a file containing directory and file names,
> one per line. (The file can also contain comment and blank lines, which
> should be ignored.) For each file or directory name, the program should
> print out whether it's a directory, a file, or neither. That's it.

As people will have pointed out by now (I am looking at google groups)
you are leaving the '\n' in your lines so that when you print it out
you are ending up with a new line in the middle of your print out.
(Try printing repr(i) within the loop to check).

Anyway just for comparison here's my version (uses 2.2 features):

from os.path import isdir, isfile
import sys

def main(path):
for line in open(path):
line = line.strip()
if line[:1] == '#' or not line:
# Blank or comment
continue
print line + " is a",
if isdir(line): print "directory"
elif isfile(line): print "file"

else: print "non-directory and non-file"

if __name__ == '__main__':
main(*sys.argv[1:])

Notable differences.
1) I'm processing the lines as a get them from the file.
I could gather them up and process them later, but that would need more memory.
2) I'm using the "*args" calling technique to ensure that an error is reported
if main is called with the wrong number of arguments. It would print usage,
but for quick and dirty scripts it highlights wrong argument count well enough.
3) I'm using string methods rather than the string module. String methods are
the future (I hear).
4) I'm using the standard "if __name__ == '__main__':" idiom to indicate the
main function.

Cheers,
Giles Brown

Andrew Walkingshaw

unread,

May 30, 2003, 7:09:43 AM5/30/03

to

In article <MPG.194096b51...@news.hevanet.com>, Scott Meyers wrote:

> Here's the code; you may want to hold your nose:

I can't talk, I'm a physicist and not a "real" programmer, but here's
how I would approach this problem:

import sys, os

BADINVOCATION = 1

def usage(programName):
# strings are immutable, so summing strings is generally considered
# bad style: it's much slower than using string interpolation
# (though this is a real nitpick in this case)
print("Usage: %s [file]" % os.path.basename(programName))
sys.exit(BADINVOCATION)

def isMeaningful(line):
if line != "" and line[0] != "#":
return True
else:
return False

def main():

if len(sys.argv) != 2:
usage(sys.argv[0])

f = open(sys.argv[1], "r")

for line in f:
s = line.lstrip().rstrip()
if isMeaningful(s):
if os.path.isdir(s):
print "%s is a directory." % s
elif os.path.isfile(s):
print "%s is a file." %s
else:
print "%s is neither a file nor a directory." % s

if __name__ == "__main__":
main()

- Andrew

--
Andrew Walkingshaw | andrew...@lexical.org.uk

Anand Pillai

unread,

May 30, 2003, 7:50:18 AM5/30/03

to

Hi Scott,

The main problem of your code is in the lines,

lines = getMeaningfulLines((open(sys.argv[1])).readlines())

for i in lines:
print i + " is a ",

...

readlines() return lines from the file till the newline character,
similar to fgets() in C. So what you read in your code is not d:\
or d:\temp\foo.py but 'd:\<many spaces>\n' and
'd:\temp\foo.py<many spaces>\n'.

To fix this you need to modify the code as follows.

for i in lines:
i = string.strip(i) # strip all spaces & newlines

print i + " is a ",

...

Now your code works correctly.

Apart from that the main() function can also be defined in
the program.

Here is a modified program with some changes I thought
made it better.

# scott.py - Scott Meyer's first Effective Python program :-)

import sys
import os
import string

# Error status codes
BadInvocation = 1

def usage(programName):
print "Usage: " + os.path.basename(programName) + " [file]"
sys.exit(BadInvocation)

# Take a list of strings, return an equivalent list where the following
# strings have been removed:
# - those with no non-whitespace characters
# - those whose first non-whitespace character is a "#"
def getMeaningfulLines(lines):

nonCommentLines = []

for line in lines:
try:
firstToken = string.split(line)[0]

if firstToken[0] != "#":

nonCommentLines.append(line)

except IndexError:
continue

return nonCommentLines

def whatami(str):

if os.path.isdir(str): return 'directory'
elif os.path.isfile(str): return 'file'
else: return 'non-directory and non-file'

# Here is the main()
def main():

# if this language had a main(), it'd be here...
if len(sys.argv) != 2: usage(sys.argv[0])

lines = getMeaningfulLines((open(sys.argv[1])).readlines())
for i in lines:
i = string.strip(i)
print i, 'is a', whatami(i)

if __name__=="__main__":
main()

Glad to help you pythonically yours,

Anand Pillai

Scott Meyers <Use...@aristeia.com> wrote in message news:<MPG.194096b51...@news.hevanet.com>...

Stephen D Evans

unread,

May 30, 2003, 8:15:31 AM5/30/03

to

"Scott Meyers" <Use...@aristeia.com> wrote in message

<snip>

String twiddling can get difficult to maintain. In your program only a
single strip() is required to remove leading and trailing whitespace.

Regular expressions work "out of the box" in Python. The "re" module
is well documented. Your getMeaningfulLines() function can be replaced by
a test for a match object:

# Tested on Python 2.2.2 and Python 2.3b1
import sys
import os
import re

# Error status codes
BADINVOCATION = 1

def usage(programName):
print "Usage: " + os.path.basename(programName) + " [file]"

sys.exit(BADINVOCATION)

def test(filename):
valid_pattern = re.compile('\s*[^#\s]')

#Python 2.3 use
#for line in file(filename, 'rt'):

for line in file(filename, 'rt').xreadlines():
if valid_pattern.match(line): # meaningful line
line = line.strip()
type = os.path.isdir(line) and "directory"
type = type or os.path.isfile(line) and "file"
type = type or "non-directory and non-file"
print line, "is a ", type

if __name__ == '__main__':

if len(sys.argv) != 2:
usage(sys.argv[0])

test(sys.argv[1])

Aahz

unread,

May 30, 2003, 9:22:18 AM5/30/03

to

In article <3ed70fb5$0$6526$afc3...@sisyphus.news.be.easynet.net>,

Bernard Delmée <bde...@advalvas.REMOVEME.be> wrote:
>
>Now, when can we expect "effective python" ?-)

In roughly twelve months, taking into account all the post-production
work. (Yes, I'm writing it.)
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/

"In many ways, it's a dull language, borrowing solid old concepts from
many other languages & styles: boring syntax, unsurprising semantics,
few automatic coercions, etc etc. But that's one of the things I like
about it." --Tim Peters on Python, 16 Sep 93

Aahz

unread,

May 30, 2003, 9:26:11 AM5/30/03

to

In article <slrnbdeevn.694...@athena.jcn.srcf.net>,

Andrew Walkingshaw <andrew...@lexical.org.uk> wrote:
>
>def main():
> if len(sys.argv) != 2:
> usage(sys.argv[0])
> f = open(sys.argv[1], "r")

This is generally a Bad Idea; your functions should be generic and
sys.argv handling should be done under "if __name__=='__main__':".

Andrew Walkingshaw

unread,

May 30, 2003, 10:44:24 AM5/30/03

to

In article <bb7m5j$282$1...@panix1.panix.com>, Aahz wrote:
> In article <slrnbdeevn.694...@athena.jcn.srcf.net>,
> Andrew Walkingshaw <andrew...@lexical.org.uk> wrote:
>>
>>def main():
>> if len(sys.argv) != 2:
>> usage(sys.argv[0])
>> f = open(sys.argv[1], "r")
>
> This is generally a Bad Idea; your functions should be generic and
> sys.argv handling should be done under "if __name__=='__main__':".

Thanks; hadn't considered that point. (Part of the value of trying to
be helpful in c.l.py, I've found, is that the most experienced
programmers take time to correct the efforts to help as well as
answering the question in the original post!)

Am I right in thinking that this is because this main()
is useless in the event of reusing this program as a module? That would
seem to be the obvious case where this would blow up nastily.

Hence, I guess the program should read:

def main(filename):
f = open(filename, "r")
[...]

if "__name__" == "__main__":

if len(sys.argv) != 2:
usage(sys.argv[0])

main(sys.argv[1])

- or something similar. Please correct me if I'm still misguided :)

Peter Hansen

unread,

May 30, 2003, 12:38:34 PM5/30/03

to

Aahz wrote:
>
> In article <slrnbdeevn.694...@athena.jcn.srcf.net>,
> Andrew Walkingshaw <andrew...@lexical.org.uk> wrote:
> >
> >def main():
> > if len(sys.argv) != 2:
> > usage(sys.argv[0])
> > f = open(sys.argv[1], "r")
>
> This is generally a Bad Idea; your functions should be generic and
> sys.argv handling should be done under "if __name__=='__main__':".

I don't entirely agree. Although I can see calling main as

if __name__ == '__main__':
main(sys.argv[1:])

anything more than this seems less readable. This has the added
advantage of preventing readers and maintainers from getting confused
because of the differences between code at module level (where what feels
like a local is actually module-global), and in a function like main()
where locals are very clearly local. I've seen __main__ blocks which
are far too ugly to read, mainly because they are at module level instead
of function-local.

I'm actually inconsistent about this myself, but I don't think
the advice Aahz gives here is necessarily a defacto standard.
(Or is it? I haven't sampled enough code to know for sure.)

-Peter

Michele Simionato

unread,

May 30, 2003, 12:53:31 PM5/30/03

to

Andrew Walkingshaw <andrew...@lexical.org.uk> wrote in message news:<slrnbdeevn.694...@athena.jcn.srcf.net>...

I am a Physicists too, and not a "real" programmer, but it seems to me
you could improve your script by using "line.strip()" instead of
"line.lstrip().rstrip()".

HTH,

Michele

Aahz

unread,

May 30, 2003, 1:35:33 PM5/30/03

to

In article <slrnbderi8.694...@athena.jcn.srcf.net>,

Andrew Walkingshaw <andrew...@lexical.org.uk> wrote:
>
>Hence, I guess the program should read:
>
>def main(filename):
> f = open(filename, "r")
> [...]
>
>if "__name__" == "__main__":
> if len(sys.argv) != 2:
> usage(sys.argv[0])
> main(sys.argv[1])
>
>- or something similar. Please correct me if I'm still misguided :)

More-or-less, but see my next response to Peter.

Aahz

unread,

May 30, 2003, 1:37:33 PM5/30/03

to

In article <3ED7890A...@engcorp.com>,

Peter Hansen <pe...@engcorp.com> wrote:
>Aahz wrote:
>> In article <slrnbdeevn.694...@athena.jcn.srcf.net>,
>> Andrew Walkingshaw <andrew...@lexical.org.uk> wrote:
>>>
>>>def main():
>>> if len(sys.argv) != 2:
>>> usage(sys.argv[0])
>>> f = open(sys.argv[1], "r")
>>
>> This is generally a Bad Idea; your functions should be generic and
>> sys.argv handling should be done under "if __name__=='__main__':".
>
>I don't entirely agree. Although I can see calling main as
>
> if __name__ == '__main__':
> main(sys.argv[1:])
>
>anything more than this seems less readable. This has the added
>advantage of preventing readers and maintainers from getting confused
>because of the differences between code at module level (where what feels
>like a local is actually module-global), and in a function like main()
>where locals are very clearly local. I've seen __main__ blocks which
>are far too ugly to read, mainly because they are at module level instead
>of function-local.

No real argument, but I think that if you're doing that kind of
processing in main(), it should be called _main() to indicate that it's
private. For example, you probably should not be calling open in a
main() function unless it's explicitly designed to only handle disk
files. (Which is the case in this example; I'm giving more generic
advice.)

Skip Montanaro

unread,

May 30, 2003, 1:00:32 PM5/30/03

to

>> >def main():
>> > if len(sys.argv) != 2:
>> > usage(sys.argv[0])
>> > f = open(sys.argv[1], "r")
>>
>> This is generally a Bad Idea; your functions should be generic and
>> sys.argv handling should be done under "if __name__=='__main__':".

Peter> I don't entirely agree. Although I can see calling main as

Peter> if __name__ == '__main__':
Peter> main(sys.argv[1:])

Peter> anything more than this seems less readable.

One pattern I picked up from some of the scripts in the Python distribution
is to extract the program name at module scope, refer to it in the
docstring, and substitute it in the usage() function. Of course, main()
gets called with sys.argv[1:] as its sole parameter. For an example, look
at Tools/scripts/db2pickle.py in the Python distribution:

http://tinyurl.com/d2a3

Skip

Bengt Richter

unread,

May 30, 2003, 6:01:03 PM5/30/03

to

On Thu, 29 May 2003 23:14:45 -0700, Scott Meyers <Use...@aristeia.com> wrote:
*the* Scott Meyers? ;-)
You should take the following with a grain of salt. There are a number of
Python book authors who post here, Alex Martelli being one of the most
prolific and authoritative (as you may have noticed if you've been lurking a while ;-)
And then there are the developers and overlapping sets of the afore.

>I'm a C++ programmer who's writing his very first Python program. This
>means the program is going to be gross, and I apologize for that in
>advance. I don't really have anybody I can show it to for feedback, so I'm
>hoping I can get some comments here. If there is a better place for me to
>seek guidance, please let me know.
>
>The program is supposed to read a file containing directory and file names,
>one per line. (The file can also contain comment and blank lines, which
>should be ignored.) For each file or directory name, the program should
>print out whether it's a directory, a file, or neither. That's it.
>

>Here's the code; you may want to hold your nose:
>

> import sys
> import os
if you want to access a few names unqualified, you can use an alternative import format, e.g.,

from os.path import isdir, isfile, basename

this will let you write isdir(xxx) instead of os.path.isdir(xxx), and so forth.

> import string
Unless you are trying to be 'way backwards compatible, it will be unusual
to need to import the string module. The builtin str class is the basis of
most strings, and its methods are accessible via the bound method notation
of <string expression>.method. E.g., 'abc'.upper() or sometimes with an argument.

Interactively, help(str) will show you. Help(xxx) or help('xxx') will often
get you what you need to know about xxx, especially if the code author has
been good about doc strings (of which more mention below, but see
http://www.python.org/peps/pep-0257.html for extensive info).

import sys, os
works too, though your separate-line style above is preferred style (see PEP 8
at http://www.python.org/peps/pep-0008.html).

>
> # Error status codes
> BadInvocation = 1

There are a few conventions re spelling. First-capital and joined-capitalized is
often used for a class name. Constant "declarations" are often all caps, sometimes
with underscore word ligation, e.g., BAD_INVOCATION = 1

>
> def usage(programName):
The first line position inside a function def has special magic. If you
put a string literal there (and it may continue to several lines), it
will become the documentation string for the function, and will show up
in interactive help and also be accessible programatically for whatever purpose, e.g.,
"""Prints usage info and exits."""

Triple quotes allow quoting unescaped single or double quotes (except at end where
there would be ambiguity) and also newlines, allowing multiple line doc strings, but
they're the convention even for single line doc strings.

> print "Usage: " + os.path.basename(programName) + " [file]"

Concatenating strings with '+' is fine for small expressions, but it is an inefficient
way to accumulate a long string. See FAQs. String formatting very close to C-style printf
formatting is avaiable with the '%' operator. Thus an alternative way to print the above is

print "Usage: %s [file]" % os.path.basename(programName)

BTW, the right hand argument to % above is short-hand for a length 1 tuple, so the safe
way to write it is
print "Usage: %s [file]" % (os.path.basename(programName),)

BTW2, without the trailing comma after a single element, it would just be a parenthesized expression,
not a tuple.

The way you wrote it is nice and clear, though, so it wins.

> sys.exit(BadInvocation)
this can also be spelled
raise SystemExit, BadInvocation
or
raise SystemExit(BAD_INVOCATION)

without having to import sys

>
> # Take a list of strings, return an equivalent list where the following
> # strings have been removed:
> # - those with no non-whitespace characters
> # - those whose first non-whitespace character is a "#"

Since you're writing this, you might as well make a docstring out of it that will be
seen by the help feature and pydoc.

> def getMeaningfulLines(lines):

"""
Take a list of strings, return an equivalent list where the following

strings have been removed:

those with no non-whitespace characters

those whose first non-whitespace character is a "#"
"""

return [line for line in lines if line.strip() and not line.lstrip().startswith('#')]

The above single line can replace the following 9, explained below

> nonCommentLines = []
> for i in range(0,len(lines)):
> try:
> firstToken = string.split(lines[i])[0]
> if firstToken[0] != "#":
> nonCommentLines.append(lines[i])
> except IndexError:
> continue
> return nonCommentLines

The oneliner, using a list comprehension, again is:
return [line for line in lines if line.strip() and not line.lstrip().startswith('#')]

in the above,
(1) [line for line in lines <condition>] builds a new list from the lines list,
but including only those that satisfy the condition, which is a short-circuited
logical expression
(2a) if line.lstrip() tests that the line is not a zero length string '' after removing white space
from the beginning and end. '' and a number of other "empty" things, e.g., (), [], {}, have
False value when tested with if, which makes some expressions concise.
(2b) and not line.lstrip().beginswith('#') tests that the first character of the line after whitespace
has been stripped from the left does not begin with '#'

Putting too much into one line can of course get Perlish, but once you are familiar with list
comprehensions, the above is pretty straightforward. For multi-megabyte strings, you may want
to write something that works more like a chain of pipes, so as to avoid huge buffering.

>
>
> # if this language had a main(), it'd be here...

No reason you can't have it. It's a common idiom.

def main():
> if len(sys.argv) != 2: usage(sys.argv[0])
>

> lines = getMeaningfulLines(open(sys.argv[1]).readlines())
> for i in lines:
> print i + " is a ",
> if os.path.isdir(i): print "directory"
> elif os.path.isfile(i): print "file"
> else: print "non-directory and non-file"

However, with the above def main(), executing the file as a script will
only execute the _definition_ of the main() function, without calling it.
Sort of like compiling the contructor for an executable object and executing
the constructor, but not calling the resulting callable object.
(Note the important fact that a definition is compiled to produce an executable
_defintion_, which is dynamically executed according to its context like any
other statement. Here it would get executed if we imported this program or if
we ran it with the interpreter, and at the end, the executing of the _definition_
def main(): ... would leave main locally bound to the actual function, ready to
be called, but not called. Thus you could write

if foo:
def bar(): print "bar executed"
else:
def baz(): print 'baz executed' # (single quotes are usual, unless same is being quoted).

and they would not both be bound to their respective names until this had been executed at least
twice, with different logical values of foo.

Ok, back to getting main's function to be called, as opposed just to having the name bound to
the ready-to-call function.

We could write
main()
at the same level of indentation as def main():...
This would execute the code right after executing the def. Useful for postponing execution
until all the def's of potentially interdependent functions have been executed, so that
they're all defined before trying to run them.

But if you want your file to serve as an importable module as well as a runnable script, the
typical idiom is to make the execution of main (or test, or whatever) dependent on the environment
in which it is being executed. When run from the command line, the global name __name__ will have
a value of '__main__' (only coincidental convention relates it to our main() function's name, BTW),
whereas if you import, the importing process will cause __name__ to have the name being imported.
Thus we can write

if __name__ == '__main__':
main()

And since this only gets executed when run, typically interactively, it is common to put usage printing
below this if also. That way it's left out of the imported module's namespace when imported.

>
>In addition to the numerious stylistic gaffes I'm sure I've made, the
>program also doesn't work correctly. Given this input file,
>
> d:\
> d:\temp\foo.py
>
>I get this output:
>
> d:\
> is a non-directory and non-file
> d:\temp\foo.py
> is a non-directory and non-file
>
>Aside from being ugly (how do I get rid of the newline that follows each
>directory or file name?), the problem is that the first entry IS a
>directory and the second one IS a file. So clearly I'm doing something
>wrong. Any idea what it is?

The newline is being passed to isdir and isfile, and neither see such strings
as valid file paths, so getting rid of the white space on both sides of the
path string should fix it. That is what str.strip is for.
>>> str.strip
<method 'strip' of 'str' objects>

You can invoke it like
>>> str.strip(' foo bar ')
'foo bar'

or you can invole it as a bound method implicitly operating on the string instance it's bound to
>>> ' foo bar '.strip()
'foo bar'

So we'll use that

> lines = getMeaningfulLines(open(sys.argv[1]).readlines())
> for i in lines:

i = i.strip()

> print i + " is a ",
> if os.path.isdir(i): print "directory"
> elif os.path.isfile(i): print "file"
> else: print "non-directory and non-file"

Note that os.path. can be left off in the above if the from ... import was used.

>
>Thanks very much in advance.

You're welcome. Hope I haven't misled. Someone will jump in to clarify if so, I'm sure.
Perhaps I better check on what's been posted since last night when I started this ;-)

To summarize, I made the mods, and a few more, including allowing "-" as file name for
stdin. If you run python interactively and import dfreport, and then type help(dfreport)
you can see the effect of doc strings. I like to catch all the exceptions, to end neatly.
As an afterthought, I added an optional trailing '-rrx' option for re-raise-exception, so
a traceback will be printed.

====< dfreport.py >=========================================
"""
This module file may be run as a script or be imported.
Run as a script it takes a file name argument, and prints a report
reflecting dir, file, and other references of the file's non-blank,
non-comment lines. Run without arguments for this plus usage.

Importing makes two functions available (getMeaningfulLines,
and getDirFileReportLines), which can be used programmatically.
See below in interactive help output for this module.
"""
import sys
from os.path import isdir, isfile, basename

def getMeaningfulLines(lines):

"""
Take a list of strings, return an equivalent list where the following

strings have been removed:

- those with no non-whitespace characters

- those whose first non-whitespace character is a "#"
"""

return [line for line in lines
if line.strip() and not line.lstrip().startswith('#')]

def getDirFileReportLines(filepath):
r"""
For each non-whitespace, non-#-headed line in file filepath
(or stdin if filepath is a single minus character)
return in a list a corresponding report line with
" is a <kind>\n" appended to the original line stripped
of enclosing whitespace, where <kind> may be:
- directory
- file
- non-directory and non-file
reflecting what is found in the os's file system.
"""
reportLines = []
if filepath=='-': fp = sys.stdin
else: fp = file(filepath)
for line in getMeaningfulLines(fp.readlines()):
line = line.strip()
if isdir(line): kind = "directory"
elif isfile(line): kind = "file"
else: kind = "non-directory and non-file"
reportLines.append('%s is a %s\n' % (line, kind))
return reportLines

def main(argv):
"""Invoked only when module is run as script"""

# Error status codes
BAD_INVOCATION = 1

def usage(programName, exitcode=BAD_INVOCATION):
"""Prints module doc string plus usage info and exits."""
print "Usage: [python] " + basename(programName) + " [-h | file | -]"
print " '-h' for further info, '-' to use stdin as input file"
print " NB: Some windows versions require [python] explicitly for IO redirection"
raise SystemExit, exitcode

try:
argc = len(argv)
if argc==1 or argc==2 and argv[1]=='-h':
print __doc__; usage(argv[0])
elif argc==2:
print ''.join(getDirFileReportLines(argv[1]))
else:
raise UserWarning, 'Bad Usage: "%s"'%' '.join(argv)
except SystemExit: raise # pass exits through
except Exception, e:
# print the name and message of any standard exception before usage
print '%s: %s' % (e.__class__.__name__, e)
if isinstance(e, IOError): usage(argv[0], e.errno) # pass IO errno's
else: usage(argv[0])
except:
print 'Nonstandard Exception %r: %r' % sys.exc_info()[:2]
usage(argv[0])

if __name__ == '__main__':
main(sys.argv)
============================================================

Piping test lines from the console via cat:

[14:50] C:\pywk\clp>cat |python dfreport.py -
# comment
# ws-prefixed comment, and next line is some spaces

#last had spaces
zdir
dfreport.py
fnord
^Z
zdir is a directory
dfreport.py is a file
fnord is a non-directory and non-file

It should also run as originally intended.

[14:52] C:\pywk\clp>dfreport.py blah
IOError: [Errno 2] No such file or directory: 'blah'
Usage: [python] dfreport.py [-h | file | -]
'-h' for further info, '-' to use stdin as input file
NB: Some windows versions require [python] explicitly for IO redirection

if blah is a file with useful content ;-)

And importing as a module, you can access defined functions, e.g.,
(note help access to doc strings, try help(dfreport) for whole module docs):

>>> import dfreport
>>> help(dfreport.getMeaningfulLines)
Help on function getMeaningfulLines in module dfreport:

getMeaningfulLines(lines)

Take a list of strings, return an equivalent list where the following

strings have been removed:

- those with no non-whitespace characters

- those whose first non-whitespace character is a "#"

>>> dfreport.getMeaningfulLines([
... '# not this one',
... ' #nor this one',
... ' ',
... 'this one should be first',
... ' white space is preserved '])
['this one should be first', ' white space is preserved ']

HTH

Regards,
Bengt Richter

Scott Meyers

unread,

May 30, 2003, 8:12:58 PM5/30/03

to

On 30 May 2003 03:50:17 -0700, Giles Brown wrote:
> 2) I'm using the "*args" calling technique to ensure that an error is reported
> if main is called with the wrong number of arguments. It would print usage,
> but for quick and dirty scripts it highlights wrong argument count well enough.

Can you tell me where I can read more about this? I've tried looking it
up, but I didn't find anything, probably because all I have to
syntactically go on is the asterisk...

Thanks,

Scott

Aahz

unread,

May 30, 2003, 8:42:50 PM5/30/03

to

In article <MPG.1941934e2...@news.hevanet.com>,

Section 5.3.4 of the language reference:
http://www.python.org/doc/current/ref/calls.html

Scott Meyers

unread,

May 30, 2003, 8:44:54 PM5/30/03

to

Many thanks to everyone for their very helpful suggestions, many of which
I've incorporated. I have a couple of additional little questions:
- I'm used to a compiler complaining if I reference a variable without
declaring it. Python doesn't do this, nor can I find mention of some
kind of warning mode that will cause it to do it. I've also read a
tiny little bit about pyChecker. Is it common to use such tools, or do
real Python programmers go bareback?
- Is it better to get into the habit of (a) importing modules and using
fully qualified names or (b) importing selective names from modules and
using them without qualification? That is, which is generally better?

import sys # a
sys.exit(1)

from sys import exit # b
exit(1)

Finally, my ultimate goal with this project is to read a list of file and
directory names from a file, create an archive (e.g., zip file) containing
copies of the specified files and directories, encrypt the archive, and ftp
it to a remote machine. I've found libraries for working with zip files
and for doing ftping, but I didn't see any standard library for encryping a
file. Is there one? Also, a cursory glance at the zipfile module
documentation didn't reveal a way to add the contents of a directory
(recursively -- including all its subdirectories and files) to a zip file.
Did I overlook something? I figure that if there's no way to do it, I can
always find a way to coax Python into invoking winzip or pkzip as an
external process, but I'm hoping I can do everything via Python libraries.

Again, thanks for all your help so far. I welcome any comments you have on
how I should best approach this problem.

Scott

Erik Max Francis

unread,

May 30, 2003, 8:50:31 PM5/30/03

to

Scott Meyers wrote:

> - I'm used to a compiler complaining if I reference a variable
> without
> declaring it. Python doesn't do this, nor can I find mention of
> some
> kind of warning mode that will cause it to do it. I've also read
> a
> tiny little bit about pyChecker. Is it common to use such tools,
> or do
> real Python programmers go bareback?

It's hard to see how you would do this in Python, since Python lacks
declarations. It's legal to reference a variable if you've already
defined it; it isn't otherwise (you get a NameError exception).

PyChecker can help with some systematic errors like this, and is always
worth running on any large project just as a sanity check, but
ultimately dynamic programming involves a different approach.

> - Is it better to get into the habit of (a) importing modules and
> using
> fully qualified names or (b) importing selective names from
> modules and
> using them without qualification? That is, which is generally
> better?
>
> import sys # a
> sys.exit(1)
>
> from sys import exit # b
> exit(1)

It's mostly a matter of style; I personally prefer the former unless
that creates hugely long names in the code). I would tend to say that
the latter approach is appropriate when you only need a few things from
the module in question. if you're using quite a bit of it, the former
notation makes more sense (except, of course, when it results in long
names; then importing an intermediate package would probably be
warranted).

--
Erik Max Francis && m...@alcyone.com && http://www.alcyone.com/max/
__ San Jose, CA, USA && 37 20 N 121 53 W && &tSftDotIotE
/ \ Heaven ne'er helps the man who will not act.
\__/ Sophocles

Chad Netzer

unread,

May 30, 2003, 9:44:24 PM5/30/03

to

On Fri, 2003-05-30 at 17:44, Scott Meyers wrote:

> - I'm used to a compiler complaining if I reference a variable without
> declaring it. Python doesn't do this, nor can I find mention of some
> kind of warning mode that will cause it to do it.

As a follow up to what Eric said, it is not uncommon to see people
initialize the most important variables or attributes to a default
value, and then reassign it later. As you know, in C++ it has become
common to declare variables right before you first use them (as opposed
to the beginning of a function). Well in Python, the variable names
aren't associated with a type, and so don't require that initialization.

The lack of declarations give us a bit more time to think up good
variable names. :)

> I've also read a
> tiny little bit about pyChecker. Is it common to use such tools, or do
> real Python programmers go bareback?

I go usually bareback, because PyChecker wasn't around when I started.
But it is helpful for the occasional typo, and many other buglets not
associated with design.

> - Is it better to get into the habit of (a) importing modules and using
> fully qualified names or (b) importing selective names from modules and
> using them without qualification?

Aha! Well, I like to think of it as this. If the module is providing a
number of related services, and I need many of them, I tend to import
it, and access its methods.

ie.

import math
math.sqrt()

If it is more of a container for a grab-bag of functions, and I need
only a few, I import the ones I need.

ie.

from math import sqrt

Well, actually, I got tricky there, because math can go either way. :)

Note that you can easily bind new names to module members for shortcuts,
when you need them.

ie. I may do an "import math" and use its functions. But then I have a
function that uses a math function a lot (say sqrt()). I can easily
make the sqrt() local, without having to change my other code.

ie.
import math

def some_function():

# I use sqrt() a lot, and don't want to always type math.sqrt()
sqrt = math.sqrt

# Now I can use sqrt() as if I had done "from math import sqrt"

So, it is a personal style thing, but over the years I find myself
tending to import the module, rather than it's functions. Less
namespace pollution. I make shortcut names when I need them.

Note that, due to implementation issues (mainly), if you want to use a
function in a loop, it is faster to make it a local variable to avoid
the module lookup on each iteration:

ie:

import math

# I use sqrt() in the loop, and want to gain a little bit of speed
sqrt = math.sqrt

for i in range(1000):
sqrt(i)

That is faster than if I had done:

for i in range(1000):
math.sqrt(i)

These kind of micro-optimizations are kind of going out of favor, as the
implementations(s) gain in performance (and computers gain even more).
You should to local rebinding more for code clarity and style, then
speed.

> Finally, my ultimate goal with this project is to read a list of file and
> directory names from a file, create an archive (e.g., zip file) containing
> copies of the specified files and directories, encrypt the archive, and ftp
> it to a remote machine. I've found libraries for working with zip files
> and for doing ftping, but I didn't see any standard library for encryping a
> file. Is there one?

Not really any "standard" one (ie shipped with the standard sources),
mainly due to the normal concerns with distributing encryption
implementations. But PyCrypto is commonly used and available, and their
a number of other implementations (some done wholly in Python.)

> Also, a cursory glance at the zipfile module
> documentation didn't reveal a way to add the contents of a directory
> (recursively -- including all its subdirectories and files) to a zip file.
> Did I overlook something?

You could use os.path.walk(). It calls a user function with each
pathname in a directory tree. You function then adds that path for the
zipfile.

The cookbook also has a "walktree" generator, with some additional
flexibility.

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/161542

--

Chad Netzer
(any opinion expressed is my own and not NASA's or my employer's)

Max M

unread,

May 31, 2003, 5:19:33 AM5/31/03

to

Scott Meyers wrote:

> - I'm used to a compiler complaining if I reference a variable without
> declaring it. Python doesn't do this, nor can I find mention of some
> kind of warning mode that will cause it to do it. I've also read a
> tiny little bit about pyChecker. Is it common to use such tools, or do
> real Python programmers go bareback?

Unit testing and agile methodologies is the usual route for Python
programmers. As for not having to declare variable... well you have
better get used to it ;-) Soon you won't even notice, and then you will
start enjoying it.

> - Is it better to get into the habit of (a) importing modules and using
> fully qualified names or (b) importing selective names from modules and
> using them without qualification? That is, which is generally better?
>
> import sys # a
> sys.exit(1)
>
> from sys import exit # b
> exit(1)

If I need to use functions defined in a module, I import the module, and
use it as "an object". This prevents namespace polution. If I need to
use an object from a module I import the object.

So in this case it would be a)

> I figure that if there's no way to do it, I can
> always find a way to coax Python into invoking winzip or pkzip as an
> external process, but I'm hoping I can do everything via Python libraries.

I would use popen() and make the os do the encryption and zipping. It is
very easy.

Mel Wilson

unread,

May 31, 2003, 7:57:29 AM5/31/03

to

In article <MPG.19419ac79...@news.hevanet.com>,

Scott Meyers <Use...@aristeia.com> wrote:
> - Is it better to get into the habit of (a) importing modules and using
> fully qualified names or (b) importing selective names from modules and
> using them without qualification? That is, which is generally better?
>
> import sys # a
> sys.exit(1)

This is generally better, unless the fully qualified name
gets ridiculously long. B is vulnerable to

> from sys import exit # b

... 1000 lines of code ...
def exit (something):
... some code ...
... 1000 more lines of code ...
> exit(1)

I doubt that anyone would actually do this with exit, but
something less well knows could be covered up this way. Of
course, nothing actually prevents `sys = 17` and its
consequences, either.

Regards. Mel.

Raseliarison nirinA

unread,

May 31, 2003, 8:16:59 AM5/31/03

to

"Scott Meyers" writes:

> - Is it better to get into the habit of (a) importing modules and using
> fully qualified names or (b) importing selective names from modules
> and using them without qualification? That is, which is generally
> better?
>
> import sys # a
> sys.exit(1)
>
> from sys import exit # b
> exit(1)
>

i cannot tell which is better. personally, i choose between them following
the context.
for example, with the form (a), one can use the same function names in
several modules. and when i use Tkinter, i always call:

from Tkinter import *

sometimes, the built-in function __import__ is useful:

SYS = __import__('sys')
QUIT = getattr(SYS,'exit')
QUIT(1)

if you want only one function from a module, you can specify its name, like
this:

from sys import exit as QUIT
QUIT(1)

also, take a look at the imp module:

>>> import imp
>>> help(imp)
Help on built-in module imp:

NAME
imp

FILE
(built-in)

DESCRIPTION
This module provides the components needed to build your own
__import__ function.

--
nirinA
--

Sean Legassick

unread,

May 31, 2003, 8:52:17 PM5/31/03

to

In message <slrnbdeevn.694...@athena.jcn.srcf.net>, Andrew
Walkingshaw <andrew...@lexical.org.uk> writes

>def isMeaningful(line):
> if line != "" and line[0] != "#":
> return True
> else:
> return False

Yuck!

def isMeaningful(line):
return line != "" and line[0] != "#";

No offence intended, but I've seen this too often in real code not to
point out its redundancy here.

Sean

--
Sean K. Legassick
se...@datamage.net
informage - http://informage.net - mouthing off from the fairest cape

Gerhard Haering

unread,

May 31, 2003, 10:11:06 PM5/31/03

to

* Sean Legassick <se...@datamage.net> [2003-06-01 02:52 +0200]:

> In message <slrnbdeevn.694...@athena.jcn.srcf.net>, Andrew
> Walkingshaw <andrew...@lexical.org.uk> writes

> >def isMeaningful(line):
> > if line != "" and line[0] != "#":
> > return True
> > else:
> > return False
>

> Yuck!
>
> def isMeaningful(line):
> return line != "" and line[0] != "#";
>
> No offence intended, but I've seen this too often in real code not to
> point out its redundancy here.

No offence intended, but your code is redundant as well ;-)

def isMeaningful(line):
return not line.startswith("#")

And omit the trailing semicolon, you filthy C/C++/Java programmer :-P

Gerhard
--
http://ghaering.de/

Sean Legassick

unread,

May 31, 2003, 11:30:39 PM5/31/03

to

In message <2003060102...@mephisto.ghaering.de>, Gerhard Haering
<g...@ghaering.de> writes

>No offence intended, but your code is redundant as well ;-)
>
>def isMeaningful(line):
> return not line.startswith("#")

Indeed, that is cleaner.

I specifically wanted to point out that

if X:

return True
else:
return False

is a particularly ugly and surprisingly common redundancy - in many
languages...

>And omit the trailing semicolon, you filthy C/C++/Java programmer :-P

...including these that are indeed more familiar to me.

I've been trying to wean myself off those darn semicolons when doing
Python, but sometimes they slip through. As I find myself liking and
using Python more and more every day I suspect the slips will become
fewer.

Bengt Richter

unread,

Jun 1, 2003, 12:20:05 AM6/1/03

to

On Sun, 1 Jun 2003 05:30:39 +0200, Sean Legassick <se...@datamage.net> wrote:

>In message <2003060102...@mephisto.ghaering.de>, Gerhard Haering
><g...@ghaering.de> writes
>>No offence intended, but your code is redundant as well ;-)
>>
>>def isMeaningful(line):
>> return not line.startswith("#")
>
>Indeed, that is cleaner.
>

but at the expense of being wrong ;-)

def isMeaningful(line):
assert line==line.strip() # to document the assumption
return bool(line) and not line.startswith("#")

Regards,
Bengt Richter

Erik Max Francis

unread,

Jun 1, 2003, 12:21:55 AM6/1/03

to

Sean Legassick wrote:

> I specifically wanted to point out that
>
> if X:
> return True
> else:
> return False
>
> is a particularly ugly and surprisingly common redundancy - in many
> languages...

What you write here is not necessarily redundant, provided that X is not
a Boolean. There may be more compact ways to write it, but that doesn't
mean that the above code is wrong.

--
Erik Max Francis && m...@alcyone.com && http://www.alcyone.com/max/
__ San Jose, CA, USA && 37 20 N 121 53 W && &tSftDotIotE

/ \ I am a gentlemen: I live by robbing the poor.
\__/ George Bernard Shaw

Sean Legassick

unread,

Jun 1, 2003, 11:21:50 AM6/1/03

to

In message <3ED97F63...@alcyone.com>, Erik Max Francis
<m...@alcyone.com> writes

>Sean Legassick wrote:
>
>> I specifically wanted to point out that
>>
>> if X:
>> return True
>> else:
>> return False
>>
>> is a particularly ugly and surprisingly common redundancy - in many
>> languages...
>
>What you write here is not necessarily redundant, provided that X is not
>a Boolean. There may be more compact ways to write it, but that doesn't
>mean that the above code is wrong.

Uh-huh, point taken. I was thinking of the case where X evaluates out to
a boolean, as in the previous example.

Aahz

unread,

Jun 1, 2003, 12:02:56 PM6/1/03

to

In article <mailman.1054429562...@python.org>,

Sean Legassick <se...@datamage.net> wrote:
>In message <slrnbdeevn.694...@athena.jcn.srcf.net>, Andrew
>Walkingshaw <andrew...@lexical.org.uk> writes
>>
>>def isMeaningful(line):
>> if line != "" and line[0] != "#":
>> return True
>> else:
>> return False
>
>Yuck!
>
>def isMeaningful(line):
> return line != "" and line[0] != "#";
>
>No offence intended, but I've seen this too often in real code not to
>point out its redundancy here.

It is *NOT* redundant; it is making clear that True/False are the only
possible return values. With your rewrite, reading and understanding
the code is necessary to clarify the intent.

Sean Legassick

unread,

Jun 1, 2003, 12:37:55 PM6/1/03

to

In message <bbd83g$aiu$1...@panix1.panix.com>, Aahz <aa...@pythoncraft.com>
writes

>It is *NOT* redundant; it is making clear that True/False are the only
>possible return values. With your rewrite, reading and understanding
>the code is necessary to clarify the intent.

Eeek. I know python is supposed to be readable, but you seem to be
suggesting that it's useful to convey information to someone who can't
read a simple boolean expression.

If you really need that level of self-documentation, I would put it in a
comment / docstring, not in redundant code.

Jack Diederich

unread,

Jun 1, 2003, 12:33:17 PM6/1/03

to

On Sun, Jun 01, 2003 at 12:02:56PM -0400, Aahz wrote:
> In article <mailman.1054429562...@python.org>,
> Sean Legassick <se...@datamage.net> wrote:
> >In message <slrnbdeevn.694...@athena.jcn.srcf.net>, Andrew
> >Walkingshaw <andrew...@lexical.org.uk> writes
> >>
> >>def isMeaningful(line):
> >> if line != "" and line[0] != "#":
> >> return True
> >> else:
> >> return False
> >
> >Yuck!
> >
> >def isMeaningful(line):
> > return line != "" and line[0] != "#";
> >
> >No offence intended, but I've seen this too often in real code not to
> >point out its redundancy here.
>
> It is *NOT* redundant; it is making clear that True/False are the only
> possible return values. With your rewrite, reading and understanding
> the code is necessary to clarify the intent.

+1 for good advice

If the docstring says "returns true if some_condition, false otherwise"
invariably someone will look at the code and use the side effect that it
isn't returning 1/0 for evil purposes[1]. reversing the order of compairson
shouldn't break any code, so it is good defensive programming to always
explicity return a bool.

The snippet in question will always return a bool, but IMO this is rarely
the case so it is best to be explicit.

-jack

[1] shoot the person who depended on the undocumented side effect

David Glass

unread,

Jun 1, 2003, 1:33:59 PM6/1/03

to

How about this as a compromise (using Python 2.2; maybe works with
earlier versions):

def isMeaningful(line):
return bool(line != "" and line[0] != "#")

- David

Fernando Perez

unread,

Jun 2, 2003, 2:50:05 PM6/2/03

to

Martin Franklin wrote:

> as of Python 2.0 the string module is not needed (string's have methods...)

Not true. Until strings have _all_ the methods in the strings module, we'll
still need it:

In [8]: import string

In [9]: s='abc'

In [10]: string.zfill(s,9)
Out[10]: '000000abc'

In [11]: s.zfill(9)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)

?

AttributeError: 'str' object has no attribute 'zfill'

Best,

f.