I'm off for now.
-Justin
Should we agree, for benchmarking purposes, that approximate results
are OK for Wide Finder 2? I.e. go ahead and split on spaces? -Tim
> Should we agree, for benchmarking purposes, that approximate results
> are OK for Wide Finder 2? I.e. go ahead and split on spaces?
I suspect that the difference between approximate results and correct
results is likely to affect either speed or beauty of a given
implementation.
I wouldn't find an approximate solution objectionable, but would be much
more impressed by one that is beautiful, uses the performance aspects of
the 32-core system well, and is also correct.
-Justin
FWIW, I did a quick run against O.10m (which involves no actual I/O) with a
regexp-based version and the one that splits on spaces; the difference is
below 5%, with both running in around 20s (my best time is ~18.5s with some
additional micro-optimization), so it's not really very important, as far as
optimizations go. Switching from line-oriented to block I/O makes a larger
difference, but I don't need that to be I/O bound anyway for O.all (as in:
iostat showing 100% use on c0t2d0 and c0t3d0).
IMO there's nothing wrong with the reference implementation, and a couple
incorrectly handled lines out of over 200 million aren't worth the bother of
running the Ruby script for >24H and having everybody run their programs
again.
--
Mauricio Fernandez - http://eigenclass.org
> Anyway, I think it should be handled because, as you say, this is about
> concurrency. Concurrency is real-world, simple, common applications.
> Taking on a problem that is too simple allows for easy optimizations that
> don't have much to do with concurrency but have a very noticeable impact on
> performance.
I think the same argument applies in the other direction: unless
someone can think of a clever way of parallelising the parsing of an
individual line, correct versus loose parsing is just an optimisation.
(That's only true if the cost of correct processing is so high as to
require task decomposition as well as data decomposition, of
course. That probably doesn't apply here, and in any case I think some
people are already decomposing reading and parsing/processing.)
James
--
/--------------------------------------------------------------------------\
James Aylett xapian.org
ja...@tartarus.org uncertaintydivision.org