wf2-multicore2.ml (OCaml), 135 + ~20 LoCs
real 7m1.812s
user 129m0.202s
sys 14m43.638s
The latency of the merge stage is masked by processing the results as they
arrive (the workers perform at different speeds).
Now, regarding I/O... Tim got a 150MB/s sequential read speed with 76% CPU
usage in Bonnie[1], but I'm getting a mere ~100 MB/s when reading O.all
sequentially... with mmap(!).
When reading O.10m (hot cache, no disk activity, as verified with iostat) with
mmap, I get ~120MB/s in the first run, ~440MB/s in the second one; in both
cases 100% of one core is used. This is all quite strange.
[1] I took a look at the bonnie-64-read-only tree; there's nothing fancy in
the "Reading intelligently phase", it just gets (up to) 16384-byte chunks with
read(2).
--
Mauricio Fernandez - http://eigenclass.org