Wide Finder 2 results (OCaml implementation with trivial parallelism)

35 views
Skip to first unread message

Mauricio Fernandez

unread,
Jun 8, 2008, 8:54:56 PM6/8/08
to wide-...@googlegroups.com
(Tim: I don't know if/when I'll be able to blog about this, so feel free to
link to this message.)

Here are the results from my last full run (and actually only the 4th
successful one, since I ran into issues with fork() and the 32 bit address
space):

Language: OCaml
LoCs: 124 + ~20 (depends on how the code is partitioned into modules)
http://eigenclass.org/misc/wf2_multicore.ml

real 8m12.155s
user 137m24.981s
sys 19m52.597s

I have introduced a couple modifications since that could save one minute or
two (real time). Unfortunately, the machine has been pretty busy with
rwaldin's debugging and it's too late in my locale now, so this will remain
hypothetical for a while.

I have just learned about the canonical results at
/wf1/data/code/twbray/coolstack.log, and fortunately the output is identical,
as was the case in the first actual run. Not bad :)

The program is mostly I/O-bound, which is why the CPU time is not that large
compared to the wall-clock time. I haven't optimized I/O properly: the program
just uses a number of workers which perform line-oriented(!) I/O in separate
regions of the file. It still manages to read at >100MB/s, and the workers are
waiting for the data from disk half of the time.

I didn't use anything fancy to achieve concurrency this time, just good old
fork+pipe, wrapped in a neat "invoke" function (due to Jon Harrop) of type
('a -> 'b) -> 'a -> unit -> 'b

(For non-ML speakers: this is a function that takes a function, applies it to
the given argument in another process, and returns a function that will block
until the result is ready. You get parallelism by applying the "invoke"
function repeatedly, thereby launching several processes at once.)

I am pretty fond of the simple 7-line O(n * log(m)) function that keeps the
top m elements from a collection of n items (the original Ruby implementation
and most others are O(n * m * log(m)) or worse). Generating the reports for
the 42GB file takes 5s in my OCaml program, so it might take up to 1 minute in
most of the other solutions. The time will however remain nearly constant if
the top 100 or 1000 items are requested; the original Ruby solution could take
10 and 100 times longer (easily several minutes to a couple hours!) than with
10 elements.

--
Mauricio Fernandez - http://eigenclass.org

Mauricio Fernandez

unread,
Jun 8, 2008, 9:07:17 PM6/8/08
to wide-...@googlegroups.com
On Mon, Jun 09, 2008 at 02:54:56AM +0200, Mauricio Fernandez wrote:
> Here are the results from my last full run (and actually only the 4th
> successful one, since I ran into issues with fork() and the 32 bit address
> space):
>
> Language: OCaml
> LoCs: 124 + ~20 (depends on how the code is partitioned into modules)
> http://eigenclass.org/misc/wf2_multicore.ml
>
> real 8m12.155s
> user 137m24.981s
> sys 19m52.597s

I forgot to add a pointer to the 68-line, single-core, minimalistic OCaml
implementation equivalent to the original Ruby one. It should be able to
process the 42GB dataset in around 2 hours, compared to ~25H for Ruby
(I haven't actually executed it, but it's essentially the same code as in the
multicore case, and it can only take less than the CPU time of the latter
since there's no interprocess communication cost --- I/O is largely irrelevant
in this case).

Here it is: http://eigenclass.org/misc/wf2_simple.ml

Reply all
Reply to author
Forward
0 new messages