Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

worst-case example for the Boyer-Moore algorithm

1,414 views
Skip to first unread message

Arash Farzan

unread,
Mar 24, 2009, 5:56:18 PM3/24/09
to
Some students observed that the example given in class does not produce
the worst-case behavior of the Boyer-Moore algorithm. In fact the
example given beats the suffix shift heuristic but the character jump
heuristic compensates for it.

It is indeed not easy to beat both heuristics at the same time: this
tells us about the power of the Boyer-Moore algorithm in practice.


However, I have this construction which theoretically beats both
heuristics and achieves the worst-case O(nm) running time:

T = (bb(ab)^(k-1))*
P = (ab)^k


for instance: T = bbababab bbababab bbababab bbababab ... (white spaces
are put for ease of read)
P = abababab


You can observe that the majority of shifts are shift by twos which
gives a O(nm) running time.

Robert Xiao

unread,
Mar 25, 2009, 12:02:41 AM3/25/09
to

I'm sort of confused (as usual).

T=b*
P=a(b^k)

...bbbbbbbbbbbbbbb...
.....abbbbbbbb...
.....X^^^^^^^^...
The mismatch causes the character shift to attempt to shift "to line up
with the last occurrence of _b_ in P" where b is the character
mismatched in T (which happens to be a "b"). But this would require
shifting P *left* k positions. So, instead, it shifts by one position
right -- this means that a pure character shift performs poorly.

Suffix shift, though, says that we should line up the characters already
matched (b^k) with an occurrence of (b^k) further left; this is
impossible, so the suffix shift will shift the whole way through.

...or did I miss something?


There's another version of Boyer-Moore which is sometimes cited, and
which (supposedly) has worst-case linear run-time. In this version, the
suffix shift shifts to the left to align the already-matched suffix, but
also specifies that the character immediately to the left of that suffix
(i.e. the one which mismatched in T) must not be equal to the character
in P in the current position -- this ensures that repeats, like that in
the second example, are skipped over.

Using that example:
bbabababbbababab
..abababab..
........X^

Ordinary suffix shift would look for the next "b" in P, which is a shift
of two. The improved suffix shift would look for the next "(not-a)b" in
P, which doesn't occur at all, so it would shift right past.

This slight modification, as it turns out, was suggested by the diagram
on slide 167 -- the "ordinary" suffix shift is described in words, but
the "improved" suffix shift is what appears to be demonstrated (if you
assume that a!=c).

Robert

Arash Farzan

unread,
Mar 25, 2009, 10:55:31 AM3/25/09
to

right.

>
> ...or did I miss something?

no.

>
>
> There's another version of Boyer-Moore which is sometimes cited, and
> which (supposedly) has worst-case linear run-time. In this version, the
> suffix shift shifts to the left to align the already-matched suffix, but
> also specifies that the character immediately to the left of that suffix
> (i.e. the one which mismatched in T) must not be equal to the character
> in P in the current position -- this ensures that repeats, like that in
> the second example, are skipped over.
>
> Using that example:
> bbabababbbababab
> ..abababab..
> ........X^
>
> Ordinary suffix shift would look for the next "b" in P, which is a shift
> of two. The improved suffix shift would look for the next "(not-a)b" in
> P, which doesn't occur at all, so it would shift right past.
>
> This slight modification, as it turns out, was suggested by the diagram
> on slide 167 -- the "ordinary" suffix shift is described in words, but
> the "improved" suffix shift is what appears to be demonstrated (if you
> assume that a!=c).
>
> Robert

That slight modification (looking for a no non-fit character in front of
a good suffix) makes the algorithm to perform linearly. The version
presented in class (which is the original version) does not require a
different character in front and therefore performs in O(nm). And yes,
there is no implicit assumption on character 'c' being the same or
different from 'a' in that slide).

ricky fan

unread,
Apr 15, 2021, 2:58:47 PM4/15/21
to
lkkl
0 new messages