My latest OCaml implementation, with some incremental refinements on the
previous one:
real 5m35.658s
user 98m5.770s
sys 5m58.114s
(31 workers)
I switched from line-oriented to block I/O, bumping the line count to 150 +
~30 (mainly for the higher-order function used to apply a function in a
different process). At this point, the speed is largely determined by the
residual data in the buffer cache, and there's a considerable variance across
executions.
Small improvements should be possible at the cost of higher line counts; there
are two obvious changes that might make this particular program faster:
single-copy read() (currently, the data is first read into a buffer on the
stack and then copied to the destination string) and mmap-based I/O.
--
Mauricio Fernandez - http://eigenclass.org