automated test for sub-string search algorithms...

Chris M. Thomasson

unread,

Feb 20, 2010, 5:16:16 PM2/20/10

to

This code includes my sub-string search algorithm, the automated test, and a
stress test for my algorithm:

http://clc.pastebin.com/f5cb3bbd2

This testing framework helped me find several bugs in my algorithm. I have
fixed them and so far, so good... I will continue to blast the shi% out of
my algorithm with this test and report any errors here.

The automated test randomly generates a comparand string, then it builds a
source string and inserts the comparand at random positions within it. It
records all the offsets, and the count of matches. Now, you can use this
information to verify that your search algorithm is getting everything
correct. Please refer to the `strstr_test_create()' function which generates
the random data. Then refer to the `strstr_test_xstrstr()' function which
actually verifies that my algorithm is getting things right.

AFAICT, it's a fairly decent way to blast your sub-string search algorithm
with tremendous forms of random data.

Enjoy!

BTW, if you have any improvements in mind, please share all of them.

;^)

Chris M. Thomasson

unread,

Feb 20, 2010, 8:11:45 PM2/20/10

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message
news:BaZfn.2632$BD2...@newsfe14.iad...

> This code includes my sub-string search algorithm, the automated test, and
> a stress test for my algorithm:
>
>
> http://clc.pastebin.com/f5cb3bbd2

[...]

There is a little bug on `line 68':

printf("source(%u): %s\n",
(unsigned long)src_size,
src);

The first formatter is not compatible with the damn argument! Here, let me
fix that:

http://clc.pastebin.com/f32655568

Sorry about that crap. Anyway, I changed a couple of things. One, I funnel
all debug messages through a function macro `DBG_PRINTF()' which
automatically disables output if `NDEBUG' is defined. Two, I made a little
tweak to the search algorithm itself. It now searches from left-to-right
after it determines that a rightmost character matches the rightmost
character from the comparand.

IMHO, this simple little test program comes in handy when you are trying to
debug a sub-string search algorithm. I can play around with all sorts of
improvements and see exactly where each one of them fail. One thing I am
exploring is reading 32/64-bits worth of characters at a time. This allows
me reduce the total number of comparisons...

Chris M. Thomasson

unread,

Feb 22, 2010, 7:28:29 AM2/22/10

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message
news:BaZfn.2632$BD2...@newsfe14.iad...

> This code includes my sub-string search algorithm, the automated test, and
> a stress test for my algorithm:
>
>
> http://clc.pastebin.com/f5cb3bbd2
>
>
> This testing framework helped me find several bugs in my algorithm. I have
> fixed them and so far, so good... I will continue to blast the shi% out of
> my algorithm with this test and report any errors here.

Here is latest version of my sub-string search algorithm:

http://clc.pastebin.com/f5c1d3e5a

This one allows search to skip many more characters than before. I don't
think its like Boyer-Moore because I am not using two tables. Humm...

Any thoughts?

Chris M. Thomasson

unread,

Feb 23, 2010, 6:40:39 PM2/23/10

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message

news:uLugn.5580$Cw3...@newsfe21.iad...

I would say it's close to the following algorithm:

http://en.wikipedia.org/wiki/Boyer-Moore-Horspool_algorithm

I think I am initializing the hash table a little bit differently. For
instance, my bad character hash hit is zero, while the Horspool algorithm
would have the length of the comparand stored in the table.