Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

My first ever Python program, comments welcome

99 views
Skip to first unread message

Lipska the Kat

unread,
Jul 21, 2012, 3:08:07 PM7/21/12
to
Greetings Pythoners

A short while back I posted a message that described a task I had set
myself. I wanted to implement the following bash shell script in Python

Here's the script

sort -nr $1 | head -${2:-10}

this script takes a filename and an optional number of lines to display
and sorts the lines in numerical order, printing them to standard out.
if no optional number of lines are input the script prints 10 lines

Here's the file.

50 Parrots
12 Storage Jars
6 Lemon Currys
2 Pythons
14 Spam Fritters
23 Flying Circuses
1 Meaning Of Life
123 Holy Grails
76 Secret Policemans Balls
8 Something Completely Differents
12 Lives of Brian
49 Spatulas


... and here's my very first attempt at a Python program
I'd be interested to know what you think, you can't hurt my feelings
just be brutal (but fair). There is very little error checking as you
can see and I'm sure you can crash the program easily.
'Better' implementations most welcome

#! /usr/bin/env python3.2

import fileinput
from sys import argv
from operator import itemgetter

l=[]
t = tuple
filename=argv[1]
lineCount=10

with fileinput.input(files=(filename)) as f:
for line in f:
t=(line.split('\t'))
t[0]=int(t[0])
l.append(t)
l=sorted(l, key=itemgetter(0))

try:
inCount = int(argv[2])
lineCount = inCount
except IndexError:
#just catch the error and continue
None

for c in range(lineCount):
t=l[c]
print(t[0], t[1], sep='\t', end='')

Thanks

Lipska


--
Lipska the Kat: Troll hunter, Sandbox destroyer
and Farscape dreamer of Aeryn Sun.

Ian Foote

unread,
Jul 21, 2012, 3:34:48 PM7/21/12
to pytho...@python.org
What is this line supposed to do? If you're trying to make an empty
tuple, you can write:
t = ()
But I don't think this is needed at all.
> filename=argv[1]
> lineCount=10
>
> with fileinput.input(files=(filename)) as f:
> for line in f:
> t=(line.split('\t'))
> t[0]=int(t[0])
> l.append(t)
> l=sorted(l, key=itemgetter(0))
>
> try:
> inCount = int(argv[2])
> lineCount = inCount
I don't think you need to split this into two lines here.
try:
lineCount = int(argv[2])
should work.
> except IndexError:
> #just catch the error and continue
> None
I would use pass instead of None here - I want to "do nothing" rather
than create a None object.
> for c in range(lineCount):
> t=l[c]
> print(t[0], t[1], sep='\t', end='')
>
> Thanks
>
> Lipska
>

My only other point is that you might find it helpful to use slightly
more verbose names than l or t - its not immediately obvious to the
reader what these are intended to represent.

Regards,
Ian

MRAB

unread,
Jul 21, 2012, 3:40:46 PM7/21/12
to pytho...@python.org
What's the purpose of this line?

> filename=argv[1]
> lineCount=10
>
> with fileinput.input(files=(filename)) as f:
> for line in f:
> t=(line.split('\t'))
> t[0]=int(t[0])
> l.append(t)
> l=sorted(l, key=itemgetter(0))
Short is:

l.sort(key=itemgetter(0))

>
> try:
> inCount = int(argv[2])
> lineCount = inCount
You may as well say:

lineCount = int(argv[2])

> except IndexError:
> #just catch the error and continue
> None
The do-nothing statement is:

pass

>
> for c in range(lineCount):
> t=l[c]
If there are fewer than 'lineCount' lines, this will raise IndexError.
You could do this instead:

for t in l[ : lineCount]:

Dave Angel

unread,
Jul 21, 2012, 4:10:51 PM7/21/12
to Lipska the Kat, pytho...@python.org
On 07/21/2012 03:08 PM, Lipska the Kat wrote:
> Greetings Pythoners
>
> A short while back I posted a message that described a task I had set
> myself. I wanted to implement the following bash shell script in Python
>

You already have comments from Ian and MRAB, and I'll try to point out
only things that they did not.

Congratulations on getting your first program running. And when reading
the following, remember that getting it right is more important than
getting it pretty.
I prefer to initialize an empty collection just before the loop that's
going to fill it. Then if you later decide to generalize some other
part of the code, it's less likely to break. So i'd move this line to
right-before the for loop.

> t = tuple

Even if you were going to use this initialization later, it doesn't do
what you think it does. It doesn't create a tuple, it just makes
another reference to the class. If you had wanted an empty tuple, you
should either do t=tuple(), or better t=()

> filename=argv[1]
> lineCount=10
>

I'd suggest getting into the habit of doing all your argv parsing in one
place. So check for argv[2] here, rather than inside the loop below.
Eventually you're going to have code complex enough to use an argument
parsing library. And of course, something to tell your use what the
arguments are supposed to be.

> with fileinput.input(files=(filename)) as f:

fileinput is much more general than you want for processing a single
file. That may be deliberate, if you're picturing somebody using
wildcards on their input. But if so, you should probably use a
different name, something that indicates plural.

> for line in f:
> t=(line.split('\t'))
> t[0]=int(t[0])
> l.append(t)


> l=sorted(l, key=itemgetter(0))
>

Your sample data has duplicate numbers. So you really ought to decide
how you'd like such lines sorted in the output. Your present code
simply preserves the present order of such lines. But if you remove the
key parameter entirely, the default sort order will sort with t[0] as
primary key, and t[1] as tie-breaker. That'd probably be what I'd do,
after trying to clarify with the client what the desired sort order was.

> try:
> inCount = int(argv[2])
> lineCount = inCount
> except IndexError:
> #just catch the error and continue
> None
>
> for c in range(lineCount):
> t=l[c]
> print(t[0], t[1], sep='\t', end='')
>
> Thanks
>
> Lipska
>
>

A totally off-the-wall query. Are you using a source control system,
such as git ? It can make you much braver about refactoring a working
program.

--

DaveA

Steven D'Aprano

unread,
Jul 21, 2012, 8:32:39 PM7/21/12
to
On Sat, 21 Jul 2012 20:40:46 +0100, MRAB wrote:

> On 21/07/2012 20:08, Lipska the Kat wrote:
>> l=sorted(l, key=itemgetter(0))
>
> Short is:
>
> l.sort(key=itemgetter(0))

Shorter, and the semantics are subtly different.

The sorted function returns a copy of the input list.

The list.sort method sorts the list in place.



--
Steven

Steven D'Aprano

unread,
Jul 21, 2012, 8:56:55 PM7/21/12
to
On Sat, 21 Jul 2012 16:10:51 -0400, Dave Angel wrote:

>> with fileinput.input(files=(filename)) as f:
>
> fileinput is much more general than you want for processing a single
> file. That may be deliberate, if you're picturing somebody using
> wildcards on their input. But if so, you should probably use a
> different name, something that indicates plural.

Also, fileinput is more a convenience module than a serious production
quality tool. It works, it does the job, but it can be slow. From the
source:

Performance: this module is unfortunately one of the slower ways of
processing large numbers of input lines.


--
Steven

MRAB

unread,
Jul 21, 2012, 9:56:20 PM7/21/12
to pytho...@python.org
Since the result is bound to the original name, the
result is the same.

Chris Angelico

unread,
Jul 21, 2012, 9:59:38 PM7/21/12
to pytho...@python.org
On Sun, Jul 22, 2012 at 11:56 AM, MRAB <pyt...@mrabarnett.plus.com> wrote:
> Since the result is bound to the original name, the
> result is the same.

Yes, assuming there are no other refs.

>>> a=[3,2,1]
>>> b=a
>>> a=sorted(a)
>>> a
[1, 2, 3]
>>> b
[3, 2, 1]

ChrisA

Dave Angel

unread,
Jul 21, 2012, 10:01:42 PM7/21/12
to pytho...@python.org
On 07/21/2012 09:56 PM, MRAB wrote:
> On 22/07/2012 01:32, Steven D'Aprano wrote:
> Since the result is bound to the original name, the
> result is the same.
>

In this particular program, yes. But if there's another variable bound
to the same list, then the fact that there's a new object from sorted()
makes a difference.



--

DaveA

rusi

unread,
Jul 21, 2012, 10:55:47 PM7/21/12
to
On Jul 22, 1:10 am, Dave Angel <d...@davea.name> wrote:

> A totally off-the-wall query.  Are you using a source control system,
> such as git ?  It can make you much braver about refactoring a working
> program.

Question in a similar vein: What development environment do you use?
My impression is that the majority of pythonistas use a non-ide editor
like vi or emacs
Ive been using emacs for 20 years and python-mode of emacs is very
useful but I am increasingly concerned that emacs is refusing to move
with the times.

Which is why I am particularly curious how an ol Java-head finds
eclipse+python (http://pydev.org/ )

Peter Otten

unread,
Jul 22, 2012, 3:56:50 AM7/22/12
to pytho...@python.org
Note that (filename) is not a tuple, just a string surrounded by superfluous
parens.

>>> filename = "foo.bar"
>>> (filename)
'foo.bar'
>>> (filename,)
('foo.bar',)
>>> filename,
('foo.bar',)

You are lucky that FileInput() tests if its files argument is just a single
string.

> for line in f:
> t=(line.split('\t'))
> t[0]=int(t[0])
> l.append(t)
> l=sorted(l, key=itemgetter(0))
>
> try:
> inCount = int(argv[2])
> lineCount = inCount
> except IndexError:
> #just catch the error and continue
> None
>
> for c in range(lineCount):
> t=l[c]
> print(t[0], t[1], sep='\t', end='')
>

I prefer a more structured approach even for such a tiny program:

- process all commandline args
- read data
- sort
- clip extra lines
- write data

I'd break it into these functions:

def get_commmandline_args():
"""Recommended library: argparse.
Its FileType can deal with stdin/stdout.
"""
def get_quantity(line):
return int(line.split("\t", 1)[0])

def sorted_by_quantity(lines):
"""Leaves the lines intact, so you don't
have to reassemble them later on."""
return sorted(lines, key=get_quantity)

def head(lines, count):
"""Have a look at itertools.islice() for a more
general approach"""
return lines[:count]

if __name__ == "__main__":
# protecting the script body allows you to import
# the script as a library into other programs
# and reuse its functions and classes.
# Also: play nice with pydoc. Try
# $ python -m pydoc -w ./yourscript.py

args = get_commandline_args()
with args.infile as f:
lines = sorted_by_quantity(f)
with args.outfile as f:
f.writelines(head(lines, args.line_count))

Note that if you want to handle large files gracefully you need to recombine
sorted_by_quantity() and head() (have a look at heapq.nsmallest() which was
already mentioned in the other thread).

Mark Lawrence

unread,
Jul 22, 2012, 4:14:34 AM7/22/12
to pytho...@python.org
Wouldn't describe myself as "an ol Java-head" but I disliked eclipse 10
years ago. I tried it again earlier this year and still disliked it.
It's like entering a legless cart horse for the Derby or the Grand National.

YMMV.

--
Cheers.

Mark Lawrence.

Lipska the Kat

unread,
Jul 22, 2012, 4:37:11 AM7/22/12
to
On 21/07/12 21:10, Dave Angel wrote:
> On 07/21/2012 03:08 PM, Lipska the Kat wrote:
>> Greetings Pythoners
>>
>> A short while back I posted a message that described a task I had set
>> myself. I wanted to implement the following bash shell script in Python
>>

snip

>>
>>
>
> A totally off-the-wall query. Are you using a source control system,
> such as git ? It can make you much braver about refactoring a working
> program.

Thanks for your comments, I've taken them on board,
I'm most familiar with with cvs and svn for source control. I've also
used Microsoft source safe. I generally just use what's given to me by
whoever is paying me and don't worry too much about the details. Many in
the Linux world seem to use git. Seeing as I've been using Linux at home
since the early days of slackware I suppose I'd better look into it.
Strangely enough I've never had a paid job using Linux. I've worked on
multiuser UNIX systems and Sun Workstations but never Linux so I guess
the need has never arisen.

The Java world seems to largely use Maven to manage their code. It's a
bit of a headbanger and I've never used it. Ant and FTP are my current
faves at home. Primitive but good enough for my personal and business sites.

Andrew Berg

unread,
Jul 22, 2012, 4:49:12 AM7/22/12
to comp.lang.python
On 7/22/2012 3:37 AM, Lipska the Kat wrote:
> Many in
> the Linux world seem to use git. Seeing as I've been using Linux at home
> since the early days of slackware I suppose I'd better look into it.
There are Mercurial (aka Hg) and Bazaar as well for DVCS. AFAIK, git,
Mercurial, and Bazaar are all fine choices and the one to use will
mainly boil down to personal preference. I prefer Mercurial myself.
--
CPython 3.3.0b1 | Windows NT 6.1.7601.17803

Lipska the Kat

unread,
Jul 22, 2012, 5:20:35 AM7/22/12
to
On 22/07/12 03:55, rusi wrote:
> On Jul 22, 1:10 am, Dave Angel<d...@davea.name> wrote:
>
>> A totally off-the-wall query. Are you using a source control system,
>> such as git ? It can make you much braver about refactoring a working
>> program.
>
> Question in a similar vein: What development environment do you use?
> My impression is that the majority of pythonistas use a non-ide editor
> like vi or emacs
> Ive been using emacs for 20 years and python-mode of emacs is very
> useful but I am increasingly concerned that emacs is refusing to move
> with the times.

My current 'Python development environment' is gedit with line numbering
turned on and a terminal window to run chmodded scripts :-)

>
> Which is why I am particularly curious how an ol Java-head finds
> eclipse+python (http://pydev.org/ )

Python and eclipse ... Noooooooooooooooooooooooooooooooooooooooooo ;-)

Well I have to say that I've used Eclipse with the myEclipse plugin for
a number of years now and although it has it's moments it has earned me
LOADS of MONEY so I can't really criticise it. I have Eclipse installed
on a Windows box so I may try the plugin ... but actually I'm really
enjoying doing things the 'old fashioned way' again.

I'm going to do 'proper OO' version of the shell script to learn about
wiring different modules together ... I find the official documentation
hard to navigate though.

Chris Angelico

unread,
Jul 22, 2012, 6:17:18 AM7/22/12
to pytho...@python.org
On Sun, Jul 22, 2012 at 6:49 PM, Andrew Berg <bahamut...@gmail.com> wrote:
> On 7/22/2012 3:37 AM, Lipska the Kat wrote:
>> Many in
>> the Linux world seem to use git. Seeing as I've been using Linux at home
>> since the early days of slackware I suppose I'd better look into it.
> There are Mercurial (aka Hg) and Bazaar as well for DVCS. AFAIK, git,
> Mercurial, and Bazaar are all fine choices and the one to use will
> mainly boil down to personal preference. I prefer Mercurial myself.

Agreed. I poked around with Bazaar a bit this year, and it seems to
lack some features. But certainly hg and git are both excellent
choices, with bzr not significantly far behind. I prefer git,
personally; on Windows, though, I would recommend hg.

Probably the best feature of any of them (one which, I believe, is now
standard in all three) is 'bisect' with a command. It's "git bisect
run", or "hg bisect -c", or "bzr bisect run". You can search back
through a huge time period without any human interaction. I did that a
while ago with a docs-building problem; the current state wouldn't
successfully generate its docs from a fresh start, even though it
could update them from a previous state. It took 45 minutes (!) of
chuggity-chug compilation to find the actual cause of the problem, and
no effort from me (since "make doc" already gave the right exit
codes). Use source control now; you'll reap the benefits later!

ChrisA

David

unread,
Jul 22, 2012, 6:46:25 AM7/22/12
to Lipska the Kat, pytho...@python.org
On 22/07/2012, Lipska the Kat <lip...@lipskathekat.com> wrote:
> On 21/07/12 21:10, Dave Angel wrote:
>>
>> A totally off-the-wall query. Are you using a source control system,
>> such as git ? It can make you much braver about refactoring a working
>> program.
>
> Thanks for your comments, I've taken them on board,
> I'm most familiar with with cvs and svn for source control. I've also
> used Microsoft source safe. I generally just use what's given to me by
> whoever is paying me and don't worry too much about the details. Many in
> the Linux world seem to use git. Seeing as I've been using Linux at home
> since the early days of slackware I suppose I'd better look into it.

What Dave said. I used CVS briefly and then git and its gui tools for
last 5 years.
Took me a while to get comfortable with it, but now it turns managing complex,
evolving text files into fun and I cannot imagine working without its power and
flexibility. First thing I do on any programming task: git init

Lipska the Kat

unread,
Jul 22, 2012, 8:36:50 AM7/22/12
to
On 22/07/12 11:17, Chris Angelico wrote:
> On Sun, Jul 22, 2012 at 6:49 PM, Andrew Berg<bahamut...@gmail.com> wrote:
>> On 7/22/2012 3:37 AM, Lipska the Kat wrote:
>>> Many in
>>> the Linux world seem to use git.

snip

> Use source control now; you'll reap the benefits later!


from sudo apt-get install git to git add *.py was about 5 minutes
and that included reading the basic documentation. POSITIVELY the
fastest install and the least trouble from any source control app ever

rusi

unread,
Jul 22, 2012, 12:18:26 PM7/22/12
to
On Jul 22, 2:20 pm, Lipska the Kat <lip...@lipskathekat.com> wrote:

> Well I have to say that I've used Eclipse with the myEclipse plugin for
> a number of years now and although it has it's moments it has earned me
> LOADS of MONEY so I can't really criticise it.

Ive probably tried to use eclipse about 4 times in the last 8 years.
Always run away in terror.
Still I'm never sure whether eclipse is stupid or I am...

First time I'm hearing of myEclipse. Thanks. What does it have/do that
standard eclipse (JDT?) does not?


> Python and eclipse ... Noooooooooooooooooooooooooooooooooooooooooo ;-)


Very curious about this. You made 'Loads of money' with eclipse but
want to stay away from it?
Simply cannot make out this thing called 'java-programmer-culture'...

Lipska the Kat

unread,
Jul 22, 2012, 1:23:26 PM7/22/12
to
On 22/07/12 17:18, rusi wrote:
> On Jul 22, 2:20 pm, Lipska the Kat<lip...@lipskathekat.com> wrote:
>
>> Well I have to say that I've used Eclipse with the myEclipse plugin for
>> a number of years now and although it has it's moments it has earned me
>> LOADS of MONEY so I can't really criticise it.
>
> Ive probably tried to use eclipse about 4 times in the last 8 years.
> Always run away in terror.
> Still I'm never sure whether eclipse is stupid or I am...
>
> First time I'm hearing of myEclipse. Thanks. What does it have/do that
> standard eclipse (JDT?) does not?

Eclipse for Java supports development of 'standard' Java applications
that is applications that use the standard Java Distributions (JDK).
MyEclipse adds support for J2EE and a bunch of other stuff. I used it
mainly for jsp syntax highlighting, HTML syntax highlighting and Servlet
development. It's marketed as a J2EE and web development IDE. It comes
with an embedded Tomcat server and some versions support common
frameworks such as Spring and Hibernate. Struts is supported by default
I think although I always stayed away from frameworks when I could. I
preferred to write Java rather than XML :-) Check out
http://www.myeclipseide.com/ for an example of marketing bling.

>> Python and eclipse ... Noooooooooooooooooooooooooooooooooooooooooo ;-)

> Very curious about this. You made 'Loads of money' with eclipse but
> want to stay away from it?
> Simply cannot make out this thing called 'java-programmer-culture'...

How dare you sir, I'm not a Java programmer I am a 'retired' software
engineer ;-)

Heh heh, Nothing to do with Eclipse, just another thing to get my head
around. For work and Java IMHO you can't beat eclipse... at the moment
I'm getting my head around git, reminding myself of C, learning Python
and re-learning make. Enough already; but if there's a python plugin I
guess I'll get around to it eventually

Ivan@work

unread,
Jul 23, 2012, 3:12:02 AM7/23/12
to
You can do without this, see below.

> t = tuple

This initialization does nothing. Assignment t=(line.split('\t')) makes
`t` a list (not a tuple), discarding any previous value. And you don't
really need t:

> with fileinput.input(files=(filename)) as f:
> for line in f:
> t=(line.split('\t'))
> t[0]=int(t[0])
> l.append(t)

List comprehension is your friend, and now you don't need to initialize
l to an empty list.

with open(filename) as f:
l = [line.split('\t') for line in f]

The first element of each row is now a string, but it's easy to fix:

> l=sorted(l, key=itemgetter(0))

Use in-place sorting and cast the sorting element to int

l.sort(key=lambda t: int(t[0]))


> inCount = int(argv[2])
> lineCount = inCount

lineCount = int(argv[2]) works just fine


>
> for c in range(lineCount):
> t=l[c]
> print(t[0], t[1], sep='\t', end='')

Whenever you write "for i in range(n)" you're (probably) doing it wrong.
Here you can use list slicing, and as a bonus the program doesn't bomb
when lineCount is greater than length(l)

for t in l[:lineCount]:

rusi

unread,
Jul 24, 2012, 1:13:30 AM7/24/12
to
On Jul 22, 10:23 pm, Lipska the Kat <lip...@lipskathekat.com> wrote:

> Heh heh, Nothing to do with Eclipse, just another thing to get my head
> around. For work and Java IMHO you can't beat eclipse...
> at the moment I'm getting my head around git,

Bumped into this yesterday. Seems like a good aid to git-comprehension
https://github.com/git/git/blob/master/contrib/completion/git-prompt.sh

> reminding myself of C, learning Python
> and re-learning make. Enough already; but if there's a python plugin I
> guess I'll get around to it eventually

Seems like a strange combo. It should be
(C&make)|(python&X)|(Java&Ant)
where X could range from
Setup http://docs.python.org/distutils/setupscript.html
to distribute http://guide.python-distribute.org/
to scons http://www.scons.org/

Why burden yourself by making the '|'s into '&'s?

Lipska the Kat

unread,
Jul 24, 2012, 7:34:14 AM7/24/12
to
On 24/07/12 06:13, rusi wrote:
> On Jul 22, 10:23 pm, Lipska the Kat<lip...@lipskathekat.com> wrote:
>
>> Heh heh, Nothing to do with Eclipse, just another thing to get my head
>> around. For work and Java IMHO you can't beat eclipse...
>> at the moment I'm getting my head around git,
>
> Bumped into this yesterday. Seems like a good aid to git-comprehension
> https://github.com/git/git/blob/master/contrib/completion/git-prompt.sh

eek ... now that's a shell script to be proud of isn't it .. and it
works [lipska@ubuntu fileio (master)]$ impressive. Good find, thanks.

>> reminding myself of C, learning Python
>> and re-learning make. Enough already; but if there's a python plugin I
>> guess I'll get around to it eventually
>
> Seems like a strange combo. It should be
> (C&make)|(python&X)|(Java&Ant)
> where X could range from

%-}
Well that's the joy of life in semi-retirement. I can do as I please
I read something somewhere the other day about living longer if you
retire early, well I retired early but if I live another 50 years it
won't be long enough to learn everything I want to.

Java&Ant everyday
C&make a while back

python& well I sort of got sidetracked by python
... and then I got sidetracked by git !!!
0 new messages