I wrote the following program mainly for educational purpose of improving my Python programming skills -- would Pythonistas here please look at it and point out areas that could use improvement?
This thing analyzes the stored load average record file, calculates simple moving average and produces Gnuplot program to plot the thing.
Constructive criticism is welcome.
#!/usr/bin/python
import re import sys import tempfile import subprocess import os
def extrfromfile(self): vre = re.compile("(\d+-\d+-\d+) (\d\d:\d\d) (\d+\.\d+)") for line in self.fo: res = vre.search(line) if res: self.reslist.append({'day':res.group(1),\ 'time':res.group(2),\ 'val':float(res.group(3))})
def calc_sma(self, smalen=4): if smalen == 0: raise AssertionError, "Moving Average sample length cannot be 0" if not isinstance(smalen, int): raise AssertionError, "Moving Average sample length has to be integer" total = 0 total = sum( [ x['val'] for x in self.reslist[0:smalen] ] ) sma = total / smalen smaidx = int(smalen/2) self.smalist.append((self.reslist[0]['day'],\ self.reslist[0]['time'],\ self.reslist[0]['val'],\ self.reslist[0]['val'])) for i in range(smalen, len(self.reslist)): curval = self.reslist[i]['val'] self.maxval = max(self.maxval, curval) total += curval total -= self.reslist[i - smalen]['val'] sma = total / smalen smaidx += 1 self.smalist.append((self.reslist[smaidx]['day'],\ self.reslist[smaidx]['time'],\ self.reslist[smaidx]['val'],\ sma))
self.gprogfile.write("""set terminal png size %d, %d set style line 1 lt 1 lw 2 set style line 2 lt 2 lw 2 set output "%s" set xtics(%s) set yrange [0:%f] set y2range [0:%f] plot "%s" using 1 with lines ls 1 title "orig" axes x1y1, "%s" using 2 with lines ls 2 title "Moving Average" axes x1y2 """ % (self.graphlen, self.graphheight, self.outfname, labelstr, float(maxval), float(maxval),\ self.gdata[1], self.gdata[1]) ) self.gprogfile.close() for elem in self.smalist: self.gdatafile.write("%f, %f\n" % (elem[2], elem[3])) self.gdatafile.close()
def plot(self): self._writefiles() gplot = subprocess.Popen(self.gnuplot + " " + self.gprog[1],\ shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE) print "Plotting data (%s)..." % self.outfname so, se = gplot.communicate() if se: print "Gnuplot problem output:", se os.remove(self.gprog[1]) os.remove(self.gdata[1])
if __name__ == "__main__": try: fname = sys.argv[1] except IndexError: print "Specify filename with data as first argument." sys.exit(1) except Exception, e: print "Error:", e sys.exit(2) values = MovingAvg(fname) values.extrfromfile() values.calc_sma(smalen = 10) (reslist, smalist, maxval) = values.return_results() gp = GnuplotWrapper(smalist, maxval) gp.plot()
On Fri, 23 Jan 2009 12:17:41 +0100, mk <mrk...@gmail.com> wrote: >Hello everyone,
>I wrote the following program mainly for educational purpose of improving my >Python programming skills -- would Pythonistas here please look at it and >point out areas that could use improvement?
>This thing analyzes the stored load average record file, calculates simple >moving average and produces Gnuplot program to plot the thing.
>Constructive criticism is welcome.
>#!/usr/bin/python
>import re >import sys >import tempfile >import subprocess >import os
>class MovingAvg(object):
You should have a docstring here. It should describe the overall purpose of the MovingAvg class and it should probably document the meaning of each attribute instances of the class have.
This isn't very good. Your MovingAvg class is otherwise relatively general- purpose and re-usable, but this makes it unusable instead. You shouldn't use print to report errors in general code and you shouldn't turn specific exceptions into SystemExit. This kind of whole-program error handling belongs in a different layer if your code. Your moving average code should just focus on computing a moving average.
Also, filenames are rather clunky. Another improvement here would be to pass in a file object which the calling code has already opened. Or, better still, just pass in an iterator of data. Do the file handling and the parsing in a different layer. Then you'll be able to re-use your moving average class with data files in a different format - all you'll have to do is write another parser for the new format.
There should be a docstring here describing what this method does.
> vre = re.compile("(\d+-\d+-\d+) (\d\d:\d\d) (\d+\.\d+)") > for line in self.fo: > res = vre.search(line) > if res: > self.reslist.append({'day':res.group(1),\ > 'time':res.group(2),\ > 'val':float(res.group(3))})
The trailing backslashes on the previous lines are superfluous. The code is just as valid without them.
> def calc_sma(self, smalen=4):
Another missing docstring. Make sure you also document the meaning of the parameter the method accepts.
> if smalen == 0: > raise AssertionError, "Moving Average sample length >cannot be 0" > if not isinstance(smalen, int): > raise AssertionError, "Moving Average sample length >has to be integer"
The conventional way to write the previous lines would be:
assert smallen != 0, "Moving Average sample length cannot be 0" assert isinstance(smallen, int), "Moving Average Sample length has to be integer"
However, I would just leave them out. If you document the meaning of the smalen parameter, then callers will know it can't be 0 and must be an int.
> total = 0
Superfluous line above.
> total = sum( [ x['val'] for x in self.reslist[0:smalen] ] ) > sma = total / smalen > smaidx = int(smalen/2) > self.smalist.append((self.reslist[0]['day'],\ > self.reslist[0]['time'],\ > self.reslist[0]['val'],\ > self.reslist[0]['val']))
Superfluous backslashes again.
> for i in range(smalen, len(self.reslist)): > curval = self.reslist[i]['val'] > self.maxval = max(self.maxval, curval) > total += curval > total -= self.reslist[i - smalen]['val'] > sma = total / smalen > smaidx += 1 > self.smalist.append((self.reslist[smaidx]['day'],\ > self.reslist[smaidx]['time'],\ > self.reslist[smaidx]['val'],\ > sma))
Generally, I wonder whether MovingAvg should just be a function rather than a class. The only place it maintains state it does so confusingly. I would just make the input data a parameter to calc_sma and have it return the results it computes. Drop the rest of the class and make calc_sma a free function.
You should *avoid* using shell=True with subprocess.Popen. There's no reason to use it here, as far as I can tell.
> print "Plotting data (%s)..." % self.outfname > so, se = gplot.communicate() > if se: > print "Gnuplot problem output:", se > os.remove(self.gprog[1]) > os.remove(self.gdata[1])
Reporting the gnuplot failure with print is not the best thing. Reporting errors to users should be done at a separate layer. This is your gnuplot wrapper - its job is to wrap gnuplot, not to talk to the user. If there's a problem, make it available via some documented API (for example, the plot method might raise an exception if there is a gnuplot problem). Handle the exception at a higher level in your application where you know it's correct to print things for the user to read.
>if __name__ == "__main__": > try: > fname = sys.argv[1] > except IndexError: > print "Specify filename with data as first argument." > sys.exit(1) > except Exception, e: > print "Error:", e > sys.exit(2)
What is the second exception handler doing? There is no expected other exception from the code being protected, so you should just get rid of this handler. If, through some crazy accident, another exception gets raised, Python will take care of reporting it to the user, and it will do so with far more information - information that will make your job of debugging the problem vastly easier.
> The most significant thing missing from this code is unit tests. > Developing automated tests for your code will help you learn a lot.
Thanks for all the remarks, I'll restructure my code. Probably the biggest mistake was not keeping "moving average" class and others focused on its respective jobs only.
However, unit tests are a tougher cookie, do you have any resource that's truly worth recommending for learning unit tests? The only resource I found that was talking at some length about it was "Dive into Python" and I was wondering if there's anything better out there...
On Fri, 23 Jan 2009 16:57:16 +0100, mk <mrk...@gmail.com> wrote: >Jean-Paul Calderone wrote: ><snip> >>The most significant thing missing from this code is unit tests. >>Developing automated tests for your code will help you learn a lot.
>Thanks for all the remarks, I'll restructure my code. Probably the biggest >mistake was not keeping "moving average" class and others focused on its >respective jobs only.
>However, unit tests are a tougher cookie, do you have any resource that's >truly worth recommending for learning unit tests? The only resource I found >that was talking at some length about it was "Dive into Python" and I was >wondering if there's anything better out there...
mk wrote: > Jean-Paul Calderone wrote: > <snip> >> The most significant thing missing from this code is unit tests. >> Developing automated tests for your code will help you learn a lot.
> However, unit tests are a tougher cookie, do you have any resource > that's truly worth recommending for learning unit tests? The only > resource I found that was talking at some length about it was "Dive into > Python" and I was wondering if there's anything better out there...
Do the "Dive In" thing. You'll quickly develop instincts. Start with the simplest possible test of your function, run it, and fix your code until it passes. Add another test, and loop. The simplest possible test will unfortunately need a bit of bolerplate, but adding tests is easy.
to start:
import unittest import moving_average as ma
class MovingAverageTest(unittest.TestCase): def test_simplest(self): 'Check the mechanics of getting a simple result' self.assertEqual(ma.moving_average(3, []), [])
if __name__ == '__main__': unittest.main()
Once that works, add another test: def test_simple(self): self.assertAlmostEqual(ma.moving_average(3, [1, 2, 3]), [2])
Once that works, check something a bit easier to get wrong: def test_floating(self): 'Check we have reasonable results' self.assertAlmostEqual(ma.moving_average(3, [1, 1, 3])[0], 1.6666667)
and, as they so often say, "lather, rinse, repeat."
mk wrote: > Jean-Paul Calderone wrote: > <snip> >> The most significant thing missing from this code is unit tests. >> Developing automated tests for your code will help you learn a lot.
> Thanks for all the remarks, I'll restructure my code. Probably the > biggest mistake was not keeping "moving average" class and others > focused on its respective jobs only.
I would consider writing the moving-average 'function' as a generator function. First version takes a sequence and window size as inputs. Second, slightly more difficult, takes an iterator and window size as inputs. An exponentially-weighted average generator would, of course, take an iterator and weight as input.
Weinberg's Second Law: If builders built buildings the way programmers wrote programs, then the first woodpecker that came along would destroy civilization.