Request for criticism - tell me how to become more Pythonic

Jon

unread,

Sep 8, 2001, 12:25:45 AM9/8/01

to

I'm making an effort to make myself use Python at the moment, and decided
that the best way was to write a moderately sized script that would use
several features of the language. To this end, I've written a modestly sized
program which assists in encoding albums to Ogg Vorbis files (included
with this message). If anyone has the time, I'd be interested in comments
about my style: what I am doing right, and what I need to improve. My
comments:

1) The program is all in one big file at the moment. My next step would
probably be to modularise the program into seperate source files.

2) This script is about the fifth incremental iteration: from the first
which barely worked with no features, no functions, very basic parsing
facilities, and little flexibility. I'm very impressed with how easy it
was to get up and running with Python -- particularly with the ability to
mess about and look up __doc__ strings in the interactive interpreter. In
doing this, I noticed that dictionaries don't seem to have any __doc__
strings (at least in 2.0, which is the version I have available to me). Is
there a reason for this? I'm also suprised and impressed with how easy it
was to keep the code from decending into spaghetti hell.

3) I'm not very happy with the command line parsing section of the code.
The supplied getopt function seems to do very little for you -- is there
a more Pythonic way of processing the command line that I am not aware
of?

4) I love list comprehensions :)

5) I realise that I'm not dealing with errors in any systematic way. How
should I do this?

6) Yes, I'm using too many global variables.

7) Yes, I like long variable and function names.

Thanks for your time.

Here is the code (modulo line wrapping):

###############################################################################
#!/usr/bin/env python

###############################################################################
# Ogg Vorbis encoding script -- designed to be used after using CDParanoia -B
# to rip an album. Create an 'info.txt' file detailing the album and track
# information, and this script will handle the encoding chores.
#
# Handles track & album comments. Handles multi-artist albums.
# Handles command line arguments.
# THIS CODE IS PUBLIC DOMAIN AND COMES WITH NO WARRANTY.
###############################################################################
# This is the format that this program expects 'info.txt' to have:
#
# album artist
# album name
# album creation date
# album publisher
# optional album comment
# ...
#
# track name
# optional track comment
# ...
# ...
#
# If the album artist is 'Various', then the track information should be:
#
# track artist : track name
# optional track comment
# ...
#
# The colon is the field boundary indicator. Whitespace doesn't matter for
# the first four lines... it is used thereafter to distinguish between track
# information and comments. Album comments will be added to every track in
# that album -- track comments only to that individual track.
#
# Example 1) Single artist album:
# ----------------------------------------------------------------------------
# Van Morrison
# Astral Weeks
# 1968
# Warner Bros. Records Inc.
#
# Astral Weeks
# In The Beginning
# Beside You
# Sweet Thing
# Cyprus Avenue
# The Way Young Lovers Do
# Afterwards
# Madame George
# Ballerina
# Slim Slow Slider
# ----------------------------------------------------------------------------
#
# Example 2) Multiple artist album:
# ----------------------------------------------------------------------------
# Various
# Romeo + Juliet
# 1996
# Twentieth Century Fox Film Corporation
# Soundtrack to the movie 'Romeo + Juliet'
#
# Garbage: #1 Crush
# Everclear: Local God
# Gavin Friday: Angel
# One Inch Punch: Pretty Piece Of Flesh
# Des'ree: Kissing You
# Butthole Surfers: Whatever (I Had A Dream)
# The Cardigans: Lovefool
# Kym Mazelle: Young Hearts Run Free
# Quindon Tarver: Everybody's Free (To Feel Good)
# Mundy: To You I Bestow
# Radiohead: Talk Show Host
# Stina Nordenstam: Little Star
# The Wannadies: You And Me Song
# ----------------------------------------------------------------------------
###############################################################################
import os
import sys

the_program_title = "Silly little Ogg Vorbis encoding script"

###############################################################################
# process the command line, changing any default options as necessary
# Isn't there a nicer way to do this?

def process_command_line():
# these are the default values for the options you can change
options = {
'screen_width': 80,
'description_file': "info.txt",
'encoding_command': "oggenc",
'general_options': "-b 128",
'source_directory': ".",
'source_template': "track%(track index)02d.cdda.wav",
'destination_base': "/home/jon/encoded",
'destination_template':
"%(track index)02d - %(artist)s - %(track)s.ogg",
'illegal_file_chars': "=\"$%\\/*:?",
'replace_file_char': "_",
}

# these are the short forms you can use on the command line
short_form = {
'screen_width': 'w',
'description_file': 'f',
'encoding_command': 'e',
'general_options': 'o',
'source_directory': 'd',
'source_template': 'p',
'destination_base': 'b',
'destination_template': 't',
'illegal_file_chars': 'i',
'replace_file_char': 'r',
}
#----------------------------------------------------------------------
from getopt import getopt

short_options = "".join([short_form[opt]+':' for opt in options.keys()])
long_options = [ opt +'=' for opt in options.keys()]

# parse the command line options using getopt
try:
(matched_list, remainder) = \
getopt(sys.argv[1:], short_options, long_options)

lookup_table = {}
for opt in options.keys():
lookup_table['-'+ short_form[opt]] = opt
lookup_table['--'+ opt ] = opt

for (option, value) in matched_list:
options[lookup_table[option]] = value

#----------------------------------------------------------------------
# that's processed all the matched arguments. Now we just have the
# end of the command line to deal with. In our case, we want there
# to be at most one argument left, which will be the new value for
# 'description_file'.
if len(remainder) > 1:
raise "Too many arguments on command line"
if len(remainder) == 1:
print remainder[0]
options['description_file'] = remainder[0]

# post processing -- we may want to change the format of some options
options['screen_width'] = int(options['screen_width'])

#----------------------------------------------------------------------
# finally, turn the options into global variables so all the other
# functions can use them easily
for (option, value) in options.items():
globals()[option] = value

except:
# if there was a problem, print out some help.
print the_program_title
print "-"*len(the_program_title)
print
print sys.argv[0]," [--options] [information file name]"
print
print "Possible command line arguments:"

for option in options.keys():
print "\t-%s\t--%s" % (short_form[option], option)

raise "Command line formatting error"

###############################################################################
# extracts a comment if one is present

def parse_possible_comment(input):
comment_lines = []
while input and input[0] and input[0][0].isspace():
comment_lines.append(input.pop(0).strip())

return "\n".join(comment_lines)

###############################################################################
# first extract the general album information

def parse_album_information(input):
header_fields = ('artist', 'album', 'date', 'publisher')

if len(input)<len(header_fields):
raise "Input error: Album description truncated"

disk_info = {}
for field in header_fields:
if not input[0]:
raise "Input error: '"+field+"' missing"

disk_info[field] = input.pop(0).strip()

# check to see if we are encoding a multi-artist album
if disk_info['artist'].upper() == "VARIOUS":
disk_info['multi artist'] = 1
del(disk_info['artist']) # artist differs for each track
else:
disk_info['multi artist'] = 0

# get any optional album comment
comment = parse_possible_comment(input)
if comment: disk_info['album comment'] = comment

return disk_info

###############################################################################
# now we should just be left with the track data.
# the first line is the track name, followed (optionally) by track comments
# - comments have whitespace at the start of a line.

def parse_track_list(input, disk_info):
tracks = []
track_number = 1

# keep on going until end of file, or until we hit a blank line
while input and input[0]:
# extract the track information
track_info = {}

# if multiple artist, the line contains both artist and track
# title information, otherwise it just contains the track title
if disk_info['multi artist']:
fields = input.pop(0).split(":",1)
if len(fields) != 2:
raise "Error: ':' missing in track description"
track_info['artist'] = fields[0].strip()
track_info['track'] = fields[1].strip()
else:
track_info['track'] = input.pop(0).strip()

# extract any track comments
comment = parse_possible_comment(input)
if comment: track_info['track comment'] = comment

# store the track index and increment
track_info['track index'] = track_number
track_number += 1

# append track information to the track list
tracks.append(track_info)

if not tracks:
raise "Input error: No track names specified in file"

return tracks

###############################################################################
# display the disk information to the user

def display_album_information(disk_info):
if not disk_info['multi artist']:
print "Artist: ",disk_info['artist']
print "Album Title: ", disk_info['album']
print "Date: ", disk_info['date']
print "Publisher: ", disk_info['publisher']
if disk_info.has_key('album comment'):
print "Album Comment:"
print disk_info['album comment']

###############################################################################
# display the track list to the user

def display_tracklist(disk_info, tracks):
print "There are %d tracks:" % (len(tracks),)
print

if disk_info['multi artist']:
fields = ('track index', 'artist', 'track')
else: fields = ('track index', 'track',)

width = {}
for field in fields:
width[field] = max([len(str(track[field])) for
track in tracks])

for track in tracks:
print " - ".join([str(track[field]).center(width[field]) for
field in fields])

###############################################################################
# creates the directory that the encoded files will be stored in

def create_destination_directory(disk_info):
destination = destination_base

if not disk_info['multi artist']:
destination = os.path.join(destination,
strip_illegal(disk_info['artist']))
destination = os.path.join(destination,
strip_illegal(disk_info['album']))

try:
os.makedirs(destination)
except:
pass
# actually we only want to ignore directory already existing...
# how do I do that?

return destination

###############################################################################
# we need to strip illegal characters out of the destination filenames

def strip_illegal(source):
dest = source

for char in illegal_file_chars:
dest = dest.replace(char, replace_file_char)

return dest

###############################################################################
# generates the encoder command line from the track and album information

encoder_option_format = { 'encoded name': '-o "%s"',
'artist': '-a "%s"',
'album': '-l "%s"',
'date': '-d "%s"',
'track index': '-N "%s"',
'track': '-t "%s"',
'publisher': '-c "publisher=%s"',
'album comment': '-c "album_comment=%s"',
'track comment': '-c "comment=%s"',
}

def build_command_line(info, general_options, destination):
# build the name of the encoded file
encoded_name = destination_template % info
encoded_name = strip_illegal(encoded_name)
encoded_name = os.path.join(destination, encoded_name)

info['encoded name'] = encoded_name

# build the command line required to encode the file
command_line = encoding_command
command_line += ' ' + general_options

for (info_name, info_value) in info.items():
if encoder_option_format.has_key(info_name):
command_line += ' ' + \
encoder_option_format[info_name] % (info_value,)

# the last option to be added is the name of the source .wav file
command_line += ' ' + source_template % info

return command_line

###############################################################################
# main routines:
###############################################################################

def parse_description_file():
print "-"*screen_width
print "Reading album description file '%s'..." % (description_file,),
input = [a.rstrip() for a in open(description_file, 'r').readlines()]
print "done"
print "-"*screen_width

# input is now a list of strings, each containing one line of the
# album description file, with the trailing whitespace stripped away.
# (initial whitespace is not stripped because it has syntactic value).

# first parse the general album information
disk_info = parse_album_information(input)
display_album_information(disk_info)

# skip any blank lines between the album information and the track list
while input and not input[0]:
input.pop(0)

# parse the track information
print "-"*screen_width
tracklist_info = parse_track_list(input, disk_info)
display_tracklist(disk_info, tracklist_info)

return (disk_info, tracklist_info)

###############################################################################

def check_source_files(disk_info, tracklist_info):
print "-"*screen_width
print "Checking that all the source files are present..."

file_list = os.listdir(source_directory)

for track_info in tracklist_info:
# merge the information for the current track
info = {}
info.update(disk_info)
info.update(track_info)

source_file = source_template % info
if source_file not in file_list:
raise "File Error: '"+source_file+"' not found"

###############################################################################

def encode_the_tracks(disk_info, tracklist_info):
print "-"*screen_width
# create the destination directory to store the encoded tracks
destination = create_destination_directory(disk_info)

for track_info in tracklist_info:
# merge the information for the current track
info = {}
info.update(disk_info)
info.update(track_info)

# construct the command to be used to encode the file
command_line = build_command_line(info,
general_options, destination)

# run the command and encode the track
os.system(command_line)

###############################################################################

if __name__ == '__main__':
# process any command line options
process_command_line()

print the_program_title. center(screen_width)
print ("-"*len(the_program_title)).center(screen_width)
print

# first read and process the album description file.
(disk_info, tracklist_info) = parse_description_file()

check_source_files(disk_info, tracklist_info)

# then encode the tracks with the options specified
encode_the_tracks(disk_info, tracklist_info)

print "-"*screen_width
print
print "Completed successfully. Come again soon".center(screen_width)
print "---------------------------------------".center(screen_width)

###############################################################################

John Hunter

unread,

Sep 8, 2001, 6:40:26 PM9/8/01

to

My guess is that you are a C programmer, not a C++ programmer. Right?

The main 'pythonic' thing that is missing here is object orientation.
An OO programmer would probably have a class for album which supplies
methods like 'get_track_list', a class for track which supplies
methods like 'get_title' and 'get_length', a class for comments, a
class for command line options, a class for the description file, and
so forth.

Best regards,
JDH

Paul Boddie

unread,

Sep 10, 2001, 6:14:18 AM9/10/01

to

John Hunter <jdhu...@nitace.bsd.uchicago.edu> wrote in message news:<1rbsklm...@video.bsd.uchicago.edu>...

But it really depends on what class of activity one is writing code
for, though, doesn't it? For small, quick tasks, procedural code is
frequently quicker to write and more understandable for other people,
especially for "data processing" tasks.

I would accept, however, that should the author want to include the
functionality in other applications, or to develop the existing
functionality further, an object-oriented solution would be worth
attempting.

Paul

Andrew Dalke

unread,

Sep 10, 2001, 2:06:26 PM9/10/01

to

Jon:

> If anyone has the time, I'd be interested in comments
> about my style: what I am doing right, and what I need
> to improve.

You've a lot of code, and much of it needs a bit of changing.
And I can't give it enough time to think about how to make
the whole thing more Pythonic. What follows is only
addressing parts.

Turn

options = {
'screen_width': 80,
'description_file': "info.txt",

...

into
options = {
'screen_width': (80, "w"),
'description_file': ("info.txt", "f"),
...
and move the initialization of 'options' to module
scope (makes it easier to understand what 'process_command_line'
does and reduces typo errors from having to repeat the key
twice). Could also move short_options and long_options to module
scope.

Much as you like list comprehensions,

short_options = "".join([short_form[opt]+':' for opt in options.keys])

the following is easier for me to understand

short_options = ":".join(options.keys) + ":"

The code all inside the try block

try:
(matched_list, remainder) = \
getopt(sys.argv[1:], short_options, long_options)

[lots of code]
except:
...

should have much less of the code in the try block, and should
not do a blanket except:

try:
matched_list, remainder = getopt(sys.argv[1:] ... )
except getopt.error, err:
usage(sys.stderr)
raise SystemExit(str(err))

(Note that this means having a 'usage' function which takes
an output file handle. Very useful since it also allows you
to have a --help command line option, which then just calls
usage(sys.stdout).)

def usage(outfile):
print >>outfile, the_program_title
print >>outfile, "-"*len(the_program_title)
print >>outfile,
print >>outfile, sys.argv[0]," [--options] [information file name]"
print >>outfile,
print >>outfile, "Possible command line arguments:"

keys = options.key() # Note that I'm sorting the keys
keys.sort()
for key in keys:
default, short_form = options[key]
print >>outfile, "\t-%s\t--%s" % (short_form, key)

In the main sys.argv parsing loop, I'll often raise a SystemExit
exception rather than

raise "Too many arguments on command line"

That should not be done if you want to use the parsing code as
part of a larger library. In that case, your function would
need to take the argv to use (instead of looking up sys.argv[1:])
and would need to declare its own exception class, which would
be explicitly caught in the main(). You're not interested in
that for this code so I won't go into details.

Try replacing your argument processing to

args = {}
for k, v in matched_list:
if k.startswith("--"):
assert options.has_key(k[2:]), k
args[k[2:]] = v
elif k.startswith("-"):
assert options.has_key(k[1:]), k
args[k[1:]] = v
else:
raise AssertionError(k)

(the assertion checks arise from a general desire to double
check that the input really is parsed correctly. I've never
had getopt fail on me to produce those exceptions.)

then replace your post-processing to

for k, (default, short_form) in options.items():
if args.has_key(k):
if isinstance(default, type(0)):
args[k] = int(args[k])
elif isinstance(default, type("")): # You don't need these
pass # lines for your code,
elif isinstance(default, type(0.0)): # since you only have
args[k] = float(args[k]) # strings and ints.
else:
args[k] = default

(In 2.2 you can replace this with

for k, (default, short_form) in options.items():
if k in args:
args[k] = type(default)(args[k])
else:
args[k] = default

> # finally, turn the options into global variables so all the other
> # functions can use them easily
> for (option, value) in options.items():
> globals()[option] = value

Very non-Pythonic. Just pass the list of arguments around.
Better yet, I have my main code know which functions use which
arguments.

You use string exceptions:

> if len(input)<len(header_fields):
> raise "Input error: Album description truncated

These are deprecated. If you don't want to make your own
exception type, then for these sorts of errors I like using
AssertionError, as in

assert len(input) >= len(header_files), "Input error: Alb ..."

The variable name 'input' in things like
> def parse_album_information(input):

isn't descriptive enough. It looks like a list of lines, but
other implementations could use, say, a file handle. If you
say 'input_lines' then it's easier to understand.

You could also recast your input code as a reader:

for disk_info in DiskInfoReader(open(description_file, 'r')):
tracklist_info = parse_track_list(disk_info)
display_tracklist(disk_info, tracklist_info)

# Switch to using an iterator for 2.2
class DiskInfoReader:
def __init__(self, file):
self.file = file
self._n = 0
def __getitem__(self, i):
assert self._n == i, "forward iteration only"
...
self._n += 1
return disk_info

but I don't understand your code well enough to do the
refactoring for that, or even if it's appropriate (there
seem to be two different chunks of code that alternate
consuming from input_lines.)

Don't use 'except:', as in
> try:
> os.makedirs(destination)
> except:

Check for the minimal error you can, which is os.error in
this case.

build_command_line should ensure the info_value passed to
the 'encoder_option_format[info_name]' string doesn't contain
a " or other escape character. I always find escapeing
characters for the shell an annoyance. I've only found one
function in the std. Python libs that helps - commands.mkarg,
and it always adds a space at the front.

Uggh, I knew this would take a long time to write. That's
why few others have responded to this thread.

Andrew
da...@dalkescientific.com

Jon

unread,

Sep 10, 2001, 5:21:50 PM9/10/01

to

In article <9nj0tk$v2o$1...@slb2.atl.mindspring.net>,
da...@dalkescientific.com says...

> Jon:
> > If anyone has the time, I'd be interested in comments
> > about my style: what I am doing right, and what I need
> > to improve.
>

Thanks to everyone that responded to my post (particularly Andrew Dalke,
who seems to have spent far too much of his own precious time reading my
horrible code :).

I've suddenly become a lot busier than I expected, so I won't have time
to read everything you have written for the last couple of days. However,
a couple of points --

1) This wasn't supposed to indicate how I write an immaculate, designed
from the beginning piece of code. It was a challenge I set myself, to see
whether I could write something in Python in a way that seemed 'obvious'
to me, and actually have it work (which it did :). As others have pointed
out, there are many ways I could refactor the code -- and as soon as my
workload drops I probably will.

Responses to Andrew's comments:

2)

> You could also recast your input code as a reader

Very good idea, and the skeleton code after is just the pointer I was
looking for. As someone else noted, I'm orginally a C coder, and haven't
really immersed myself in OO (apart from an optional software engineering
course I took back when I was an undergrad). From my brief skimming,
Python's approach to OO feels more like Pascal (or Modula-2) than C++ --
which is a good thing :).

3)

>> for (option, value) in options.items():
>> globals()[option] = value
>Very non-Pythonic.

Sorry :). And looking at it with the benefit of sleep, you are right --
it's ugly and I should be ashamed. I still think Python is funky for
allowing you to do this, however.

4) Particular thanks for your suggestions regarding how to deal with
command line parsing -- again, I did it in the way that seemed obvious to
me at the time. I am sure that, after close reading, your way will become
obvious...

'it won't take you more than a week of analysis to conclude that this is
obvious' as a lecturer of mine once said... :)

John Hunter

unread,

Sep 11, 2001, 9:32:59 PM9/11/01

to

>>>>> "Paul" == Paul Boddie <pa...@boddie.net> writes:
Paul> But it really depends on what class of activity one is
Paul> writing code for, though, doesn't it? For small, quick
Paul> tasks, procedural code is frequently quicker to write and
Paul> more understandable for other people, especially for "data
Paul> processing" tasks.

Absolutely, and I did not mean to be critical. I just wanted to point
out to the author, who wants to expand his python reach, that OO is a
useful strategy that python supports and his code lacks. Lots of
great procedural code powers some of the best software in the world.

JDH