GNU Make-based parallelization

Eric Wong

unread,

Jun 8, 2008, 8:28:50 PM6/8/08

to wide-finder

Since I don't want to have to deal with maintaining a blog (spammers
and such),

I'll discuss my implementation here.

The README (also in the git source tree) explains a few things

http://bogomips.org/wf2/README

Basically it's all very UNIX-y and combines several small
(interchangeable)
tools into something big and powerful.

Git repository: http://bogomips.org/wf2/wf2.git

PERFORMANCE
-----------

Not tested on a full run with the mawk/gawk workers. Perl was kind of
disappointing yesterday, partially because I picked a bad partition
size
(32 workers, 48 partitions, so I was only running on 16 at the end :X)

real 47:07.7
user 18:02:01.3
sys 32:19.5

On my x86 Centrino, gawk takes about 30% less time than Perl, and mawk
takes
around 60% less time. I hope I'l be able to do a full run later
today.

I'm not I/O bound at all on reads so the pipe from cat-range-exec to
the
worker process isn't showing up as a bottleneck, either. Since I use
GNU Make, temporary files are used so that may have an impact on
performance under Solaris, too. (I'm used to Linux which does async
write(2) by default).

I'm probably going to lose LOC counts or win them depending
on how creatively we count :)

$ make total-loc-and-yes-I-mean-all-of-it
git ls-files | xargs wc
1 1 5 .gitignore
353 3044 18497 COPYING # I used some trivial code from git
31 127 809 Makefile # includes tasks to report LOCs :)
85 371 2498 README # Same as the above link
39 189 1393 doc/cat-range-exec.txt # should become a manpage
80 263 1880 doc/reference.rb # Tim's implementation
24 110 773 doc/split-print.txt # should become a manpage
55 138 1299 home-ssh.rb # example for use with my home
network

# perl/ and ruby/ both contain alternative
# worker/reducer implementations all using the same core:
27 115 653 perl/reducer-txt.perl
27 110 642 perl/reducer.perl
49 157 1110 perl/worker-txt.perl
45 150 1077 perl/worker.perl
42 134 911 ruby/reducer.rb
55 162 1289 ruby/worker.rb

37 134 1219 runner.mk # *** THE MOST IMPORTANT PART ***
27 86 522 sh/reducer.sh # mostly awk
60 235 1503 sh/worker.sh # mostly awk

5 5 67 src/.gitignore
54 192 1579 src/Makefile # Yes, I can also use make to build
stuff :)
66 205 1576 src/cat-range-exec.c # Important for single big
files
30 121 929 src/compat/splice.h # Linux-only
168 531 3458 src/io_util.h # git do_x_or_die routines
97 288 2363 src/os_compat.h # some headers and utilities
71 233 1715 src/split-print.c # like split(1) w/o copying
data
9 18 188 src/sys_detect.c # I didn't feel like using
autotools
13 41 285 src/validate-cat-range-exec.sh # test script
13 75 484 src/validate-split-print.sh # test script
154 496 3902 src/zero_copy_io.h # some redundant code for
portability
45 145 1017 src/zero_copy_io_linux.h # Linux-only stuff
58 188 1382 src/zero_copy_io_solaris.h # Solaris-only, broken,
not used
68 180 1519 wf2-mk # The main UI
1888 8244 56544 total

--
Eric Wong

Eric Wong

unread,

Jun 8, 2008, 9:23:15 PM6/8/08

to wide-finder

On Jun 8, 5:28 pm, Eric Wong <normalper...@gmail.com> wrote:
> PERFORMANCE
> -----------
>
> Not tested on a full run with the mawk/gawk workers. Perl was kind of
> disappointing yesterday, partially because I picked a bad partition

Running now. Either mawk is really fast and starving on I/O or
something is broken... I'm using mmap + madvise(MADV_SEQUENTIAL)
right now to do I/O with cat-range-exec and 32-processes and
the load average is only around 5...

Last night with the perl run I was able to bring the load avg
to 32 easily.

Eric Wong

unread,

Jun 8, 2008, 9:41:11 PM6/8/08

to wide-finder

Ok, mawk did finish. Unfortunately I was using the Perl reducer
instead of the awk + sh version so that's a little slower
than awk+sh, but faster than pure Perl

This is with 32-processes and 128 parts
real 27:28.7
user 1:32:44.0
sys 23:15.6

I'm trying 128-processes and 128 parts now, but it's
not pegging the CPUs at all, either...

Eric Wong

unread,

Jun 8, 2008, 9:57:55 PM6/8/08

to wide-finder

Nope, I cancelled that one. I've switched to using a read()/write()
loop since mmap doesn't seem to be performing very well (probably
due to VM lock contention) and it's using more CPU.

Eric Wong

unread,

Jun 8, 2008, 10:06:26 PM6/8/08

to wide-finder

Ok, much better:

real 9:55.6
user 2:12:22.3
sys 24:41.3

A good chunk of the time was spent in the reduce phase, so I'll have
to optimize that.

I also noticed read()/write() are only using a 8K buffer which is
pretty
small by today's standards.

Eric Wong

unread,

Jun 8, 2008, 10:26:07 PM6/8/08

to wide-finder

Command-line used was

git revision 618d8e31dfe6286e443fc7a16d606be2dfdfa7ad
( http://bogomips.org/wf2/wf2.git )

V=1 NJOBS=32 NPARTS=128 IMPL=sh time ./wf2-mk /wf1/data/logs/O.all
> ,sh 2>&1

Eric Wong

unread,

Jun 9, 2008, 12:12:58 AM6/9/08

to wide-finder

I'll probably be busy with other things the next few days, but this
was
a nice break from the rest of the things I've been avoiding work on :)

I've written up some API documentation for writing alternative
backends to runner.mk here:

http://bogomips.org/wf2/doc/runner_api.txt

The ideas behind runner.mk, cat-range-exec and split-print combo
actually have great usage scenarios on some projects I'm working
on at $DAY_JOB so I look forward to using them there.

I've also been drifting away from extremely powerful languages like
Perl
and Ruby in my life and forcing myself to use more specialized
languages
like awk, so this was a good way to use more of it :)

I've already implemented things like runner.mk several times (it's
only
a few lines of Make!) but cat-range-exec/split-print provide a great
replacement to split(1) which I used in the past.

I'll be thinking about optimizing the reduce/merge phase but
probably won't get around to implementing anything for at
least a few days.

Some ideas I'll float (this is for my sh+awk[1] implementation):

gzipping intermediate output files. gzip helped enormously on a
another project that also used temporary files, but these temporary
files here are already fairly small..

sorting them first by key + gzip would reduce size even more, but
sorting by value (numeric) would allow a smarter reduce to not need
to
read the entire file...

I also can't help but think there's a good use for sort -m in
here somewhere, but the most obvious use is way too I/O intensive...

Perhaps I should also figure out what Mauricio's 7-line
O(n * log(m)) does and if I can implement it in awk...

[1] Perhaps I'll rename it to the SHAWK-er :)

Reply all

Reply to author

Forward