William McBrine <wmcbr...@users.sf.net> wrote: > Hi all,
> I'm pretty new to Python (a little over a month). I was wondering -- is > something like this:
> s = re.compile('whatever')
> def t(whatnot): > return s.search(whatnot)
> for i in xrange(1000): > print t(something[i])
> significantly faster than something like this:
> def t(whatnot): > s = re.compile('whatever') > return s.search(whatnot)
> for i in xrange(1000): > result = t(something[i])
> ? Or is Python clever enough to see that the value of s will be the same > on every call, and thus only compile it once?
The best way to answer these questions is always to try it out for yourself. Have a look at 'timeit.py' in the library: you can run it as a script to time simple things or import it from longer scripts.
C:\Python25>python lib/timeit.py -s "import re;s=re.compile('whatnot')" "s.search('some long string containing a whatnot')" 1000000 loops, best of 3: 1.05 usec per loop
C:\Python25>python lib/timeit.py -s "import re" "re.compile('whatnot').search('some long string containing a whatnot')" 100000 loops, best of 3: 3.76 usec per loop
C:\Python25>python lib/timeit.py -s "import re" "re.search('whatnot', 'some long string containing a whatnot')" 100000 loops, best of 3: 3.98 usec per loop
So it looks like it takes a couple of microseconds overhead if you don't pre-compile the regular expression. That could be significant if you have simple matches as above, or irrelevant if the match is complex and slow.
You can also try measuring the compile time separately:
C:\Python25>python lib/timeit.py -s "import re" "re.compile('whatnot')" 100000 loops, best of 3: 2.36 usec per loop
C:\Python25>python lib/timeit.py -s "import re" "re.compile('<(?:p|div)[^>]*>(?P<pat0>(?:(?P<atag0>\\<a[^>]*\\>)\\<img[^>]+ class\\s*=[^=>]*captioned[^>]+\\>\\</a\\>)|\\<img[^>]+class\\s*=[^=>]*capti oned[^>]+\\>)</(?:p|div)>|(?P<pat1>(?:(?P<atag1>\\<a[^>]*\\>)\\<img[^>]+cla ss\\s*=[^=>]*captioned[^>]+\\>\\</a\\>)|\\<img[^>]+class\\s*=[^=>]*captione d[^>]+\\>)')" 100000 loops, best of 3: 2.34 usec per loop
It makes no difference whether you use a trivial regular expression or a complex one: Python remembers (if I remember correctly) the last 100 expressions it compiled,so the compilation overhead will be pretty constant.
No, the Python compiler doesn't know anything about regular expression objects, so it compiles a call to the RE engine which is executed every time the function is called.
However, the re module keeps its own cache, so in fact the regular expression itself may only get compiled once regardless.
Here's another approach that avoids the use of a global variable for the regular expression:
>>> def spam2(x, s=re.compile('nobody expects the Spanish Inquisition!')):
What happens now is that the regex is compiled by the RE engine once, at Python-compile time, then stored as the default value for the argument s. If you don't supply another value for s when you call the function, the default regex is used. If you do, the over-ridden value is used instead:
"Steven D'Aprano" <st...@REMOVE-THIS-cybersource.com.au> wrote in message
news:13mq041tef27vad@corp.supernews.com... | >>> def spam2(x, s=re.compile('nobody expects the Spanish Inquisition!')): | ... return s.search(x) | | I suspect that this will be not only the fastest solution, but also the | most flexible.
> "Steven D'Aprano" <st...@REMOVE-THIS-cybersource.com.au> wrote in message
> news:13mq041tef27vad@corp.supernews.com... > | >>> def spam2(x, s=re.compile('nobody expects the Spanish > Inquisition!')): > | ... return s.search(x) > | > | I suspect that this will be not only the fastest solution, but also the > | most flexible.
"John Machin" <sjmac...@lexicon.net> wrote in message
news:ab88db50-ce4e-4298-bcec-079de67dbcb8@e25g2000prg.googlegroups.com... | On Dec 23, 5:38 am, "Terry Reedy" <tjre...@udel.edu> wrote: | > 'Most flexible' in a different way is | > | > def searcher(rex): | > crex = re.compile(rex) | > def _(txt): | > return crex.search(txt) | > return _ | > | | I see your obfuscatory ante and raise you several dots and | underscores:
I will presume you are merely joking, but for the benefit of any beginning programmers reading this, the closure above is a standard functional idiom for partial evaluation of a function (in this this, re.search(crex,txt))
> "John Machin" <sjmac...@lexicon.net> wrote in message
> news:ab88db50-ce4e-4298-bcec-079de67dbcb8@e25g2000prg.googlegroups.com... > | On Dec 23, 5:38 am, "Terry Reedy" <tjre...@udel.edu> wrote: > | > 'Most flexible' in a different way is > | > > | > def searcher(rex): > | > crex = re.compile(rex) > | > def _(txt): > | > return crex.search(txt) > | > return _ > | > > | > | I see your obfuscatory ante and raise you several dots and > | underscores:
> I will presume you are merely joking, but for the benefit of any beginning > programmers reading this, the closure above is a standard functional idiom > for partial evaluation of a function (in this this, re.search(crex,txt))
was somewhat over-complicated, and possibly slower than already- mentioned alternatives. The standard idiom etc etc it may be, but the OP was interested in getting overhead out of his re searching loop. Let's trim it a bit.
> On Dec 23, 2:39 pm, "Terry Reedy" <tjre...@udel.edu> wrote: >> I will presume you are merely joking, but for the benefit of any >> beginning >> programmers reading this, the closure above is a standard functional >> idiom >> for partial evaluation of a function (in this this, re.search(crex,txt))
> Semi-joking; I thought that your offering of this:
> was somewhat over-complicated, and possibly slower than already- > mentioned alternatives. The standard idiom etc etc it may be, but the > OP was interested in getting overhead out of his re searching loop. > Let's trim it a bit.
I get class Searcher(object) but can't for the life of me see why (except to be intentionally obtuse) one would use the def searcher(rex) pattern which I assure you would call with searcher(r)(t) right?
On Dec 28, 7:53 am, "Matthew Franz" <mdfr...@gmail.com> wrote:
> I get class Searcher(object) but can't for the life of me see why > (except to be intentionally obtuse) one would use the def > searcher(rex) pattern which I assure you would call with > searcher(r)(t) right?
The whole point of the thread was performance across multiple searches for the one pattern. Thus one would NOT do searcher(r)(t) each time a search was required; one would do s = searcher(r) ONCE, and then do s(t) each time ...
Thanks, that makes more sense. I got tripped up by the function returning a function thing and (for a while) thought _ was some sort of spooky special variable.
> On Dec 28, 7:53 am, "Matthew Franz" <mdfr...@gmail.com> wrote: > > I get class Searcher(object) but can't for the life of me see why > > (except to be intentionally obtuse) one would use the def > > searcher(rex) pattern which I assure you would call with > > searcher(r)(t) right?
> The whole point of the thread was performance across multiple searches > for the one pattern. Thus one would NOT do > searcher(r)(t) > each time a search was required; one would do > s = searcher(r) > ONCE, and then do > s(t) > each time ...