Profiling Lundh's widefinder

0 views
Skip to first unread message

Michael Kremer

unread,
Nov 29, 2007, 7:30:23 PM11/29/07
to ghop-...@googlegroups.com
Profiling results of Profile Fredrik Lundh's single-threaded
"widefinder" implementation:

I benchmarked and profiled the algorithm on an 2.4 GHz Intel Core 2
Duo using the cProfile module in Python 2.6, with the same log file as
Lundh. The plain single-threaded implementation took about 1.5 seconds
to scan a million lines. The profiling results are fairly obvious –
the overwhelming majority of the processing time is spent matching
lines to the regular expression, and to a lesser degree, grouping the
results. The primary bottlenecks are in no particular order: file I/O,
the regular expression matching, and the preprocessing to eliminate
strings early.

File I/O:
In order to be processed, the log file must be opened and iterated
over. Lundh's implementation did this by opening the file as binary
instead of text.
file("o10k.ap", "rb")
In his multi-threaded implementation, Lundh memory maps the file, once
for each process. However, this approach is still effective in a
single-threaded implementation.
filemap = mmap.mmap(

fileobj.fileno(),

os.path.getsize(FILE),

access=mmap.ACCESS_READ
)
This optimization nearly doubled the benchmark speed on my testing
machine to a speedy 0.7 seconds, and so I profiled the code again.One
possible problem with this approach is the amount of memory required
for this operation on a large log file.


Reading through the comments and other blog posts on the same topic, I
read about several additional optimizations. Andrew Dalke was able to
further optimize the code using mxTextTools external library. He also
was able to further optimize Lundh's code using only the standard
libraries. I benchmarked some of his implementations, only to find
that they were slower on my machine than the mmapped single-threaded
implementation. It is probably that at total processing time under
second, the benchmarking is unreliable from machine to machine.

Links:
Original blog post:
http://effbot.org/zone/wide-finder.htm
Generates log file:
http://svn.effbot.org/public/stuff/sandbox/wide-finder/getdata.py
Lundh's single-threaded implementation
http://svn.effbot.org/public/stuff/sandbox/wide-finder/wf-2.py
Andrew Dalke's More notes on Wide Finder
http://www.dalkescientific.com/writings/diary/archive/2007/10/07/wide_finder.html

mmap_profile.txt
lundh_profile.txt
mmap_dalke_profile.txt
wf-profile.py
Reply all
Reply to author
Forward
0 new messages