Newsgroups: comp.lang.python
From: Paul Rubin <http://phr...@NOSPAM.invalid>
Date: 18 Jan 2008 09:58:57 -0800
Local: Fri, Jan 18 2008 12:58 pm
Subject: Re: Efficient processing of large nuumeric data file
David Sanders <dpsand...@gmail.com> writes: wc is written in carefully optimized C and will almost certainly > The data files are large (~100 million lines), and this code takes a > long time to run (compared to just doing wc -l, for example). run faster than any python program. > Am I doing something very inefficient? (Any general comments on my Your implementation's efficiency is not too bad. Stylistically it's > pythonic (or otherwise) style are also appreciated!) Is > "line.split()" efficient, for example? not quite fluent but there's nothing to really criticize--you may develop a more concise style with experience, or maybe not. One small optimization you could make is to use collections.defaultdict to hold the counters instead of a regular dict, so you can get rid of the test for whether a key is in the dict. Keep an eye on your program's memory consumption as it runs. The If I were writing this program and didn't have to run it too often, You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
| ||||||||||||||