http://clc.pastebin.com/f5cb3bbd2
This testing framework helped me find several bugs in my algorithm. I have
fixed them and so far, so good... I will continue to blast the shi% out of
my algorithm with this test and report any errors here.
The automated test randomly generates a comparand string, then it builds a
source string and inserts the comparand at random positions within it. It
records all the offsets, and the count of matches. Now, you can use this
information to verify that your search algorithm is getting everything
correct. Please refer to the `strstr_test_create()' function which generates
the random data. Then refer to the `strstr_test_xstrstr()' function which
actually verifies that my algorithm is getting things right.
AFAICT, it's a fairly decent way to blast your sub-string search algorithm
with tremendous forms of random data.
Enjoy!
BTW, if you have any improvements in mind, please share all of them.
;^)
There is a little bug on `line 68':
printf("source(%u): %s\n",
(unsigned long)src_size,
src);
The first formatter is not compatible with the damn argument! Here, let me
fix that:
http://clc.pastebin.com/f32655568
Sorry about that crap. Anyway, I changed a couple of things. One, I funnel
all debug messages through a function macro `DBG_PRINTF()' which
automatically disables output if `NDEBUG' is defined. Two, I made a little
tweak to the search algorithm itself. It now searches from left-to-right
after it determines that a rightmost character matches the rightmost
character from the comparand.
IMHO, this simple little test program comes in handy when you are trying to
debug a sub-string search algorithm. I can play around with all sorts of
improvements and see exactly where each one of them fail. One thing I am
exploring is reading 32/64-bits worth of characters at a time. This allows
me reduce the total number of comparisons...
Here is latest version of my sub-string search algorithm:
http://clc.pastebin.com/f5c1d3e5a
This one allows search to skip many more characters than before. I don't
think its like Boyer-Moore because I am not using two tables. Humm...
Any thoughts?
I would say it's close to the following algorithm:
http://en.wikipedia.org/wiki/Boyer-Moore-Horspool_algorithm
I think I am initializing the hash table a little bit differently. For
instance, my bad character hash hit is zero, while the Horspool algorithm
would have the length of the comparand stored in the table.