m = 0
s1 = "me"
s2 = "locate me"
s1len = len(s1)
s2len = len(s2)
found = False
while m + s1len <= s2len:
if s1 == s2len[m:m+s1len]:
found = True
break
m += 1
> I would be surprised if it is the naive:
Yep -- it's "a mix between Boyer-Moore and Horspool with a few more
bells and whistles on the top", as documented and implemented in
Objects/stringlib/fastsearch.h in the Python sources and well discussed
and explained at http://effbot.org/zone/stringlib.htm .
Alex
> I would be surprised if it is the naive:
Why?
I guess it simply calls an appropriate C library function.
Ciao,
Marc 'BlackJack' Rintsch
Ok. Two questions:
1. Is "a in b" simply an alias for "b.find(a)"?
2. Is this algorithm exclusive to Python 2.5, or is it contained in 2.4
aswell?
> Alex Martelli wrote:
> > Tor Erik <tore...@gmail.com> wrote:
> >
> >> I would be surprised if it is the naive:
> >
> > Yep -- it's "a mix between Boyer-Moore and Horspool with a few more
> > bells and whistles on the top", as documented and implemented in
> > Objects/stringlib/fastsearch.h in the Python sources and well discussed
> > and explained at http://effbot.org/zone/stringlib.htm .
> >
> >
> > Alex
>
> Ok. Two questions:
>
> 1. Is "a in b" simply an alias for "b.find(a)"?
The 'in' operator can be minutely better optimized, but they share the
underlying algorithm (in 2.5).
> 2. Is this algorithm exclusive to Python 2.5, or is it contained in 2.4
> aswell?
It's 2.5 novelty. Look at the performance on the same machine (my 2.0
GHz MBP, MacOSX 10.4.7):
brain:~ alex$ python2.4 -mtimeit -s'x="foo";y="bar"*99+x+"baz"*77' 'x in
y'
100000 loops, best of 3: 9.04 usec per loop
brain:~ alex$ python2.4 -mtimeit -s'x="foo";y="bar"*99+x+"baz"*77'
'y.find(x)!=-1'
100000 loops, best of 3: 2.01 usec per loop
brain:~ alex$ python2.5 -mtimeit -s'x="foo";y="bar"*99+x+"baz"*77' 'x in
y'1000000 loops, best of 3: 0.452 usec per loop
brain:~ alex$ python2.5 -mtimeit -s'x="foo";y="bar"*99+x+"baz"*77'
'y.find(x)!=-1'
1000000 loops, best of 3: 0.842 usec per loop
find used to be way faster than 'in' -- now they share algorithms and
'in' can be more optimized (no need to track ``where'' it finds a match,
so to speak;-), so find is over twice as fast as it used to be, 'in' is
about 20 times as fast as it used to be, in this example -- it gets even
better if you look at larger and larger strings, e.g...:
brain:~ alex$ python2.4 -mtimeit -s'x="foo"*123;y="bar"*999+x+"baz"*777'
'x in y'
10000 loops, best of 3: 91.9 usec per loop
brain:~ alex$ python2.5 -mtimeit -s'x="foo"*123;y="bar"*999+x+"baz"*777'
'x in y'
100000 loops, best of 3: 3.01 usec per loop
here, we're going _30_ times as fast, not "just" 20;-).
Alex
> I would be surprised if it is the naive:
2.4 and earlier uses that algorithm (but with a better implementation).
And "naive" isn't really the right word here; on average, a brute force search is pretty
good for the find/index/in use case. Most fancy algorithms ignore the setup costs and
other overhead, which works well in theory, but not that well in practice.
</F>