More results (real 7m02s) and ? about I/O performance of the T2000

Mauricio Fernandez

unread,

Jun 10, 2008, 9:27:45 PM6/10/08

to wide-...@googlegroups.com

The small changes I referred to in my previous post were successful, and I got
these times in a lucky run yesterday:

wf2-multicore2.ml (OCaml), 135 + ~20 LoCs

real 7m1.812s
user 129m0.202s
sys 14m43.638s

The latency of the merge stage is masked by processing the results as they
arrive (the workers perform at different speeds).

Now, regarding I/O... Tim got a 150MB/s sequential read speed with 76% CPU
usage in Bonnie[1], but I'm getting a mere ~100 MB/s when reading O.all
sequentially... with mmap(!).

When reading O.10m (hot cache, no disk activity, as verified with iostat) with
mmap, I get ~120MB/s in the first run, ~440MB/s in the second one; in both
cases 100% of one core is used. This is all quite strange.

[1] I took a look at the bonnie-64-read-only tree; there's nothing fancy in
the "Reading intelligently phase", it just gets (up to) 16384-byte chunks with
read(2).
--
Mauricio Fernandez - http://eigenclass.org

Tim Bray

unread,

Jun 10, 2008, 10:48:03 PM6/10/08

to wide-...@googlegroups.com

Mauricio, you going to update the results page? And maybe update the
thread it points to? Or even better, write a considered blog entry
talking about all this stuff and link to that? -T

Eric Wong

unread,

Jun 11, 2008, 1:33:18 AM6/11/08

to wide-finder

On Jun 10, 6:27 pm, Mauricio Fernandez <m...@acm.org> wrote:
> The small changes I referred to in my previous post were successful, and I got
> these times in a lucky run yesterday:
>
> wf2-multicore2.ml (OCaml), 135 + ~20 LoCs
>
> real 7m1.812s
> user 129m0.202s
> sys 14m43.638s

Wow! Running mine with an improved reduce phase now :)

> The latency of the merge stage is masked by processing the results as they
> arrive (the workers perform at different speeds).
>
> Now, regarding I/O... Tim got a 150MB/s sequential read speed with 76% CPU
> usage in Bonnie[1], but I'm getting a mere ~100 MB/s when reading O.all
> sequentially... with mmap(!).

I didn't get very good results with mmap (I was using MAP_PRIVATE,
maybe
MAP_SHARED is less VM-intensive?). I was using madvise
MADV_SEQUENTIAL too.

I'm also feed my mawks through (what appears to be) a 4K pipe buffer
and I
can't get the load average even >16 right now. At least that's the
pipe
buffer size bash/ksh are reporting (when they were compiled).

Too bad {posix_,}fadvise doesn't appear to be an option on Solaris,
and
sendfile{,v} + socketpair doesn't work either. I'll try using
socketpair()
again in a bit, too; since it lets me pick bigger buffers than pipe().

Reply all

Reply to author

Forward