Since I don't want to have to deal with maintaining a blog (spammers
and such),
I'll discuss my implementation here.
The README (also in the git source tree) explains a few things
http://bogomips.org/wf2/README
Basically it's all very UNIX-y and combines several small
(interchangeable)
tools into something big and powerful.
Git repository:
http://bogomips.org/wf2/wf2.git
PERFORMANCE
-----------
Not tested on a full run with the mawk/gawk workers. Perl was kind of
disappointing yesterday, partially because I picked a bad partition
size
(32 workers, 48 partitions, so I was only running on 16 at the end :X)
real 47:07.7
user 18:02:01.3
sys 32:19.5
On my x86 Centrino, gawk takes about 30% less time than Perl, and mawk
takes
around 60% less time. I hope I'l be able to do a full run later
today.
I'm not I/O bound at all on reads so the pipe from cat-range-exec to
the
worker process isn't showing up as a bottleneck, either. Since I use
GNU Make, temporary files are used so that may have an impact on
performance under Solaris, too. (I'm used to Linux which does async
write(2) by default).
I'm probably going to lose LOC counts or win them depending
on how creatively we count :)
$ make total-loc-and-yes-I-mean-all-of-it
git ls-files | xargs wc
1 1 5 .gitignore
353 3044 18497 COPYING # I used some trivial code from git
31 127 809 Makefile # includes tasks to report LOCs :)
85 371 2498 README # Same as the above link
39 189 1393 doc/cat-range-exec.txt # should become a manpage
80 263 1880 doc/reference.rb # Tim's implementation
24 110 773 doc/split-print.txt # should become a manpage
55 138 1299 home-ssh.rb # example for use with my home
network
# perl/ and ruby/ both contain alternative
# worker/reducer implementations all using the same core:
27 115 653 perl/reducer-txt.perl
27 110 642 perl/reducer.perl
49 157 1110 perl/worker-txt.perl
45 150 1077 perl/worker.perl
42 134 911 ruby/reducer.rb
55 162 1289 ruby/worker.rb
37 134 1219
runner.mk # *** THE MOST IMPORTANT PART ***
27 86 522 sh/reducer.sh # mostly awk
60 235 1503 sh/worker.sh # mostly awk
5 5 67 src/.gitignore
54 192 1579 src/Makefile # Yes, I can also use make to build
stuff :)
66 205 1576 src/cat-range-exec.c # Important for single big
files
30 121 929 src/compat/splice.h # Linux-only
168 531 3458 src/io_util.h # git do_x_or_die routines
97 288 2363 src/os_compat.h # some headers and utilities
71 233 1715 src/split-print.c # like split(1) w/o copying
data
9 18 188 src/sys_detect.c # I didn't feel like using
autotools
13 41 285 src/validate-cat-range-exec.sh # test script
13 75 484 src/validate-split-print.sh # test script
154 496 3902 src/zero_copy_io.h # some redundant code for
portability
45 145 1017 src/zero_copy_io_linux.h # Linux-only stuff
58 188 1382 src/zero_copy_io_solaris.h # Solaris-only, broken,
not used
68 180 1519 wf2-mk # The main UI
1888 8244 56544 total
--
Eric Wong