Hi there!
Thanks for taking the time to look into my proposal. Let me try to
respond to all of your messages in one go.
2017-07-04 6:09 GMT+02:00 Ville Voutilainen <
ville.vo...@gmail.com>:
> Crochemore-Perrin is a constant-space algorithm, whereas Boyer-Moore isn't. That
> would seem nice-to-have for uses that do not wish to have the dynamic
> allocation.
> In other words, Boyer-Moore would win every benchmark you throw at these
> algorithms, except the one where you can't run Boyer-Moore at all. :)
Exactly. This is not a race to find the fastest search algorithm. It's
about providing a somewhat decent string searching algorithm that
works well on resource-constrained systems.
2017-07-04 6:46 GMT+02:00 Ville Voutilainen <
ville.vo...@gmail.com>:
> On 4 July 2017 at 07:33, <
zamaz...@gmail.com> wrote:
> Seems handy to have when BM can't be used. But, here's a question: is
> there some reason why a string search can't use TW internally? It can't use BM due to
> the preprocessing, but why not TW? Why would it need to be a separate searcher?
So the problem with all of the existing searchers is that they are
defined to only do comparisons for equality. As Two-Way's
preprocessing algorithm depends on having a total order on the
alphabet, my proposal adds a searcher that may depend on std::less<>.
I have to confess, I am not an expert when it comes to benchmarking
string search algorithms. Question: are there some kind of
standardised test vectors that represent realistic workloads any of
you want me to use?
Anyway, as a quick experiment, I benchmarked the running time of
various string searching algorithms when given the following input:
Needle: 'A' * 999 + 'B'
Haystack: 'A' * 1000000
This means we'll be searching for a kilobyte-sized needle in a
megabyte-sized haystack, where the needle and haystack are structured
in such a way that a naive algorithm would perform very poorly. There
will obviously be no full match.
When running string searchers on FreeBSD 12.0 in VirtualBox on my 2,8
GHz Intel i7 Macbook Pro (late 2013 model), I observe the following
average running times for searching (i.e., not counting the
preprocessing phase), when built with Clang 4.0 at -O3:
Naive std::search(): 629.4 milliseconds (avg of 100 iterations)
libc++'s std::experimental::boyer_moore_searcher: 5.916 milliseconds
(avg of 10000 iterations)
libc++'s std::experimental::boyer_moore_horspool_searcher: 3.494
milliseconds (avg of 10000 iterations)
My implementation of std::crochemore_perrin_searcher: 1.079
milliseconds (avg of 10000 iterations)