I was a trying to profile a small script and after shrinking the code
to the minimum I got a interesting profile difference.
Given two test functions test1 and test2, that only differs from an
extra level of indirection (i.e. find_substr), I wonder why I got a
timming difference >50%? What is the recommended way to factorize the
code? Should I write a big method containing everything?
#!/usr/bin/python
def find_substr(l):
return l[15:20]
def test1(t):
for i in xrange(1000000):
s = find_substr(t)
def test2(t):
for i in xrange(1000000):
sub = t[15:20]
import cProfile
t = "This a long string containing several things apart from the end"
cProfile.run("test1(t)")
cProfile.run("test2(t)")
----Profiles test1
Profiles results:
1000003 function calls in 0.666 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno
(function)
1 0.000 0.000 0.666 0.666 <string>:1(<module>)
1000000 0.260 0.000 0.260 0.000 test.py:3(find_substr)
1 0.406 0.406 0.666 0.666 test.py:7(test1)
1 0.000 0.000 0.000 0.000 {method 'disable' of
'_lsprof.Profiler' objects}
----Profile test2:
3 function calls in 0.248 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno
(function)
1 0.000 0.000 0.248 0.248 <string>:1(<module>)
1 0.248 0.248 0.248 0.248 test.py:12(test2)
1 0.000 0.000 0.000 0.000 {method 'disable' of
'_lsprof.Profiler' objects}
Thank you very much for the advice!
marc
That's because there is a function call overhead.
> I wonder why I got a
> timming difference >50%?
A very simple function body will (in terms of percentage) have larger
function overhead. With a slightly more complex function body, the body
will takes much more time than the function call overhead.
> What is the recommended way to factorize the
> code? Should I write a big method containing everything?
Look in the absolute numbers: 0.666 CPU seconds vs. 0.248 CPU seconds
over 1000000 loops means if you put everything into one big method
you'll only save 418 nanoseconds per loop. Even over 1000000 loops; you
only save 0.418 seconds. Is it worth optimizing?
> ----Profiles test1
> Profiles results:
> 1000003 function calls in 0.666 CPU seconds
>
> ----Profile test2:
> 3 function calls in 0.248 CPU seconds
I got a more striking difference: 5.291 CPU seconds vs 0.589 CPU
seconds. But knowing how the profiler works, this is to be expected.
Function call overhead become much (and I mean much) heavier with
profiler ON. I get a more sane result with timing manually:
import time
start = time.time()
test1(t)
print time.time() - start
start = time.time()
test2(t)
print time.time() - start
It's 1.186 vs 0.608, which is blink of an eye vs. blink of an eye.