It is indeed not easy to beat both heuristics at the same time: this
tells us about the power of the Boyer-Moore algorithm in practice.
However, I have this construction which theoretically beats both
heuristics and achieves the worst-case O(nm) running time:
T = (bb(ab)^(k-1))*
P = (ab)^k
for instance: T = bbababab bbababab bbababab bbababab ... (white spaces
are put for ease of read)
P = abababab
You can observe that the majority of shifts are shift by twos which
gives a O(nm) running time.
I'm sort of confused (as usual).
T=b*
P=a(b^k)
...bbbbbbbbbbbbbbb...
.....abbbbbbbb...
.....X^^^^^^^^...
The mismatch causes the character shift to attempt to shift "to line up
with the last occurrence of _b_ in P" where b is the character
mismatched in T (which happens to be a "b"). But this would require
shifting P *left* k positions. So, instead, it shifts by one position
right -- this means that a pure character shift performs poorly.
Suffix shift, though, says that we should line up the characters already
matched (b^k) with an occurrence of (b^k) further left; this is
impossible, so the suffix shift will shift the whole way through.
...or did I miss something?
There's another version of Boyer-Moore which is sometimes cited, and
which (supposedly) has worst-case linear run-time. In this version, the
suffix shift shifts to the left to align the already-matched suffix, but
also specifies that the character immediately to the left of that suffix
(i.e. the one which mismatched in T) must not be equal to the character
in P in the current position -- this ensures that repeats, like that in
the second example, are skipped over.
Using that example:
bbabababbbababab
..abababab..
........X^
Ordinary suffix shift would look for the next "b" in P, which is a shift
of two. The improved suffix shift would look for the next "(not-a)b" in
P, which doesn't occur at all, so it would shift right past.
This slight modification, as it turns out, was suggested by the diagram
on slide 167 -- the "ordinary" suffix shift is described in words, but
the "improved" suffix shift is what appears to be demonstrated (if you
assume that a!=c).
Robert
right.
>
> ...or did I miss something?
no.
>
>
> There's another version of Boyer-Moore which is sometimes cited, and
> which (supposedly) has worst-case linear run-time. In this version, the
> suffix shift shifts to the left to align the already-matched suffix, but
> also specifies that the character immediately to the left of that suffix
> (i.e. the one which mismatched in T) must not be equal to the character
> in P in the current position -- this ensures that repeats, like that in
> the second example, are skipped over.
>
> Using that example:
> bbabababbbababab
> ..abababab..
> ........X^
>
> Ordinary suffix shift would look for the next "b" in P, which is a shift
> of two. The improved suffix shift would look for the next "(not-a)b" in
> P, which doesn't occur at all, so it would shift right past.
>
> This slight modification, as it turns out, was suggested by the diagram
> on slide 167 -- the "ordinary" suffix shift is described in words, but
> the "improved" suffix shift is what appears to be demonstrated (if you
> assume that a!=c).
>
> Robert
That slight modification (looking for a no non-fit character in front of
a good suffix) makes the algorithm to perform linearly. The version
presented in class (which is the original version) does not require a
different character in front and therefore performs in O(nm). And yes,
there is no implicit assumption on character 'c' being the same or
different from 'a' in that slide).