Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

I need help speeding up an app that reads football scores and generates rankings

1 view
Skip to first unread message

jocknerd

unread,
May 2, 2007, 3:00:22 PM5/2/07
to
About 10 years ago, I wrote a C app that would read scores from
football games and calculate rankings based on the outcome of the
games. In fact, I still use this app. You can view my rankings at
http://members.cox.net/jocknerd/football.

A couple of years ago, I got interested in Python and decided to
rewrite my app in Python. I got it to work but its painfully slow
compared to the C app. I have a file containing scores of over 1500
high school football games for last season. With my Python app, it
takes about 3 minutes to process the rankings. With my C app, it
processes the rankings in less than 15 seconds.

The biggest difference in my two apps is the C app uses linked lists.
I feel my Python app is doing too many lookups which is causing the
bottleneck.

I'd love some feedback regarding how I can improve the app. I'd like
to drop the C app eventually. Its really ugly. My goal is to
eventually get the data stored in PostgreSQL and then have a Django
powered site to process and display my rankings.

You can download the source code from http://members.cox.net/jocknerd/downloads/fbratings.py
and the data file from http://members.cox.net/jocknerd/downloads/vhsf2006.txt

Thanks!

Marc 'BlackJack' Rintsch

unread,
May 2, 2007, 3:16:56 PM5/2/07
to

> The biggest difference in my two apps is the C app uses linked lists.
> I feel my Python app is doing too many lookups which is causing the
> bottleneck.

Then replace those linear searches you wrote in Python with a dictionary.

Ciao,
Marc 'BlackJack' Rintsch

Arnaud Delobelle

unread,
May 2, 2007, 4:49:38 PM5/2/07
to
On May 2, 4:00 pm, jocknerd <jeff.s...@gmail.com> wrote:
> About 10 years ago, I wrote a C app that would read scores from
> football games and calculate rankings based on the outcome of the
> games. In fact, I still use this app. You can view my rankings athttp://members.cox.net/jocknerd/football.

>
> A couple of years ago, I got interested in Python and decided to
> rewrite my app in Python. I got it to work but its painfully slow
> compared to the C app. I have a file containing scores of over 1500
> high school football games for last season. With my Python app, it
> takes about 3 minutes to process the rankings. With my C app, it
> processes the rankings in less than 15 seconds.
>
> The biggest difference in my two apps is the C app uses linked lists.
> I feel my Python app is doing too many lookups which is causing the
> bottleneck.
>
> I'd love some feedback regarding how I can improve the app. I'd like
> to drop the C app eventually. Its really ugly. My goal is to
> eventually get the data stored in PostgreSQL and then have a Django
> powered site to process and display my rankings.
>
> You can download the source code fromhttp://members.cox.net/jocknerd/downloads/fbratings.py

> and the data file fromhttp://members.cox.net/jocknerd/downloads/vhsf2006.txt
>
> Thanks!

A simple improvement is to change your list of teams('teamlist') to a
dictionary of teams (call it say 'teamdict') mapping team names to
teams.

You have lots of
#Some code
for row in teamlist:
if teamname == row['name']:
#Do something with row

These can all be replaced with:
#Some code
row = teamdict[teamname]
#Do something with row

(Although I wouldn't call it 'row' but rather 'team')

That may speed up your code significantly.

Moreover you can make the main loop (in calcTeamRatings) faster by
avoiding looking up a team each time you need some info on it.

Finally I would change your schedule list to a list of tuples rather
than a list of dictionaries: each game in the schedule would be a
tuple (team1, team2, ratio) and wouldn't include the actual team
scores as you don't seem to use them in your calcTeamRatings function
(that means moving the ratio calculation into the loop that creates
the schedule)

Disclaimer: I only looked at your code superficially and I don't claim
to understand it !

HTH

--
Arnaud

Terry Reedy

unread,
May 2, 2007, 4:38:17 PM5/2/07
to pytho...@python.org

"jocknerd" <jeff...@gmail.com> wrote in message
news:1178118022.8...@h2g2000hsg.googlegroups.com...

| About 10 years ago, I wrote a C app that would read scores from
| football games and calculate rankings based on the outcome of the
| games. In fact, I still use this app. You can view my rankings at
| http://members.cox.net/jocknerd/football.
|
| A couple of years ago, I got interested in Python and decided to
| rewrite my app in Python. I got it to work but its painfully slow
| compared to the C app. I have a file containing scores of over 1500
| high school football games for last season. With my Python app, it
| takes about 3 minutes to process the rankings. With my C app, it
| processes the rankings in less than 15 seconds.

A ratio of 12 to 1 is not bad. However....

| The biggest difference in my two apps is the C app uses linked lists.
| I feel my Python app is doing too many lookups which is causing the
| bottleneck.

You have to do as many lookups as you have to do, but looking up teams by
name in a linear scan of a list is about the slowest way possible. Replace
'teamlist' with a dict 'teams' keyed by team name. Replace
'lookupTeam(team)' by 'if team not in teams: addTeam(team)' and delete the
lookupTeam function. Similarly 'lookupTeamRate(team)' becomes
'teams[team]['grate'] (and delete function). And
'updateTeamRate(team,rate)' becomes teams[team]['rate'] = rate' (and delete
function. And similarly for updateTeamRating and anything else using
teamlist. In many places, multiple lookups in teams could be eliminated.
For instance, 'team1 = teams[g['team1']]. Then use 'team1' to manipulate
its rating and other attributes.

Minor point. Multiple functions do 'localvar = <expression>; return
localvar'. The simpler 'return <expression>' will be slightly faster.
Your comments and function name eliminate any documentary need for the
otherwise useless local var.

Function calls are relatively slow in Python. So calling
def totalPtsGame (score1, score2): return score1 + score2
is slower than simply adding the scores 'in place'.

Terry Jan Reedy


You can also, people say, use the profiler to find where time is going.

Gabriel Genellina

unread,
May 2, 2007, 5:09:23 PM5/2/07
to
En Wed, 02 May 2007 12:16:56 -0300, Marc 'BlackJack' Rintsch
<bj_...@gmx.net> escribió:

As an example: using a Team object instead of a dictionary, and using
teamlist (not a good name now) as a dictionary of Team objects indexed by
name:

def lookupTeam (teamname):
team = teamlist.get(teamname)
if team is None:
teamlist[teamname] = team = Team(teamname)
return team

def updateTeamStats (tname1, score1, tname2, score2):
team1 = lookupTeam (tname1)
team2 = lookupTeam (tname2)

team1.pf += score1
team1.pa += score2
if (score1 > score2):
team1.won += 1
elif (score1 < score2):
team1.lost += 1
else:
team1.tied += 1

team2.pf += score2
team2.pa += score1
if (score1 < score2):
team2.won += 1
elif (score1 > score2):
team2.lost += 1
else:
team2.tied += 1

Then you should realize that those last two blocks are too similar, and
you can make a function of it. And then you realize that in fact they act
on a Team object, so you should make a Team method...

--
Gabriel Genellina

0 new messages