Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

strange exponentiation performance

0 views
Skip to first unread message

Jeff Davis

unread,
Nov 23, 2003, 6:31:37 AM11/23/03
to
I was doing some thinking about exponentiation algorithms along with a
friend, and I happened upon some interesting results. Particularly, I was
able to outperform the ** operator in at least one case, with a recursive
algorithm. This leads me to believe that perhaps the ** operator should
tune it's algorithm based on inputs or some such thing. Here is my data:


>>> def h(b,e):
... if(e==0): return 1
... if(e==1): return b
... t = h(b,e >> 1)
... return t*t*h(b,e & 1)
...

Above is the recursive exponentiation algorithm. I tried some test data
and it appears to work. This just popped into my head out of nowhere and I
optimized it with some trivial optimizations (I used e>>1 instead of e/2;
I used e&1 instead of e%2).

>>> def f(b,e):
... n = 1
... while(e>0):
... if(e & 1): n = n * b
... e >>= 1
... b *= b
... return n
...

I then made this algorithm which I thought basically unwrapped the
recursion in h(). It seems to work also.

Then, the more trivial exponentiation algorithm:

>>> def g(b,e):
... n = 1
... while(e>0):
... n *= b
... e -= 1
... return n
...

For consistency, I wrapped ** in a function call:

>>> def o(b,e):
... return b**e
...

then I made a test function to time the computation time:

>>> def test(func,b,e):
... t1 = time.time()
... x = func(b,e)
... t2 = time.time()
... print t2-t1
...


now, I compared:

>>> test(f,19,100000)
0.765542984009
>>> test(g,19,100000)
11.4481400251
>>> test(h,19,100000)
0.370787024498
>>> test(o,19,100000)
0.457986950874

now, g() was blown out of the water, as expected, but the others were
close enough for another test at a higher "e" value.

>>> test(f,19,500000)
8.02366995811
>>> test(h,19,500000)
3.66968798637
>>> test(o,19,500000)
5.29517292976
>>>

Now, that is the interesting part. How did ** not measure up to h()? It's
also interesting that f(), which is supposed to be a more efficient
version of h(), is lagging behind.

I would like help explaining the following:
(1) How did my less-than-perfectly-optimized recursive algorithm win
against the ** operator?
(2) How can I unwrap and optimize h()? From what I understand, recursion
is never supposed to be the most efficient. I suspect there are some
hidden inefficiencies in using while(), but I'd like to know the specifics.

If my algorithm h() is better, why can't ** use a quick test to change
algorithms based on inputs? Or is mine better in all cases?

BTW: python2.3.2 compiled with gcc 3.3.2 on linux2.4.19 all on debian &
i386. I have an AMD athlon xp 1800+.

I ran all test cases several times and results were very consistant.

Also note that I'm not exactly an algorithms expert, I just happened upon
these results while chatting with a friend.

Regards,
Jeff Davis

Tim Peters

unread,
Nov 23, 2003, 2:45:36 PM11/23/03
to pytho...@python.org
[Jeff Davis]

> I was doing some thinking about exponentiation algorithms along with a
> friend, and I happened upon some interesting results. Particularly, I
> was able to outperform the ** operator in at least one case, with a
> recursive algorithm. This leads me to believe that perhaps the **
> operator should tune it's algorithm based on inputs or some such
> thing. Here is my data:

I'm taking the liberty of rewriting your functions to stop trying to squash
as much as they can on each line. It's important to do that if you want to
modify the code quickly when experimenting.

First you rediscovered the "left to right" binary exponentiation algorithm:

def h(b, e):
if e == 0:
return 1
if e == 1:
return b
t = h(b, e >> 1)
return t*t * h(b, e & 1)

When you tried to remove recursion from it, you rediscovered the "right to
left" binary exponentiation algorithm:

def f(b, e):
n = 1
while e > 0:
if e & 1:
n *= b
e >>= 1
b *= b
return n

That's close to what Python's builtin ** does, but is missing an important
speed trick (explained later). Note that h and f aren't really the *same*
algorithm! f squares the current value of b on every trip through the loop.
h does not. If you print out all the inputs to all the multiplications,
you'll see that they're not at all the same.

Finally (I'll skip g(), as it isn't interesting):

def o(b, e):
return b**e

> ...


> now, g() was blown out of the water, as expected, but the others were
> close enough for another test at a higher "e" value.
>
> >>> test(f,19,500000)
> 8.02366995811
> >>> test(h,19,500000)
> 3.66968798637
> >>> test(o,19,500000)
> 5.29517292976
> >>>

Your box is faster than mine. Here under 2.3.2 on Windows:

f 14.6648669941
h 6.53576912117
o 9.59630399437

> Now, that is the interesting part. How did ** not measure up to h()?

The left-to-right exponentiation algorithm can be faster than the
right-to-left one. The former is clumsier to code in C, though, and nobody
ever bothered to endure that pain in Python's implementation. If you search
the archives hard enough, you'll eventually find a thread about this from
Christian Tismer.

Neither algorithm is optimal for all inputs, BTW. Turns out optimal
exponentiation is a very hard problem; Knuth Volume 2 has an extensive
discussion of this ("Evaluation of Powers"); another algorithm based on
first factoring the exponent (expressing it as a product of primes) is
better "on average" than either binary method, but sometimes loses to them.

> It's also interesting that f(), which is supposed to be a more
> efficient version of h(), is lagging behind.

The biggest trick it's missing is that it *always* does

b *= b

even on the last trip through the loop. The value of b it computes on the
last trip is never used, and b has grown very large by that time so
computing b*b is *very* expensive then. Change f like so:

def f(b, e):
n = 1
while e > 0:
if e & 1:
n *= b
e >>= 1
if e: # new guard: only square b if the result will be used
b *= b
return n

and then it's essentially identical to the builtin **:

f 9.5959586986
h 6.54231046447
o 9.59677081413

> I would like help explaining the following:
> (1) How did my less-than-perfectly-optimized recursive algorithm win
> against the ** operator?

It used left-to-right instead of right-to-left, which on this pair of inputs
is a better approach.

> (2) How can I unwrap and optimize h()?

Make it work left-to-right instead of right-to-left. That's tedious to do.
You can't get the same sequence of multiplication inputs by going
right-to-left, so the code will have to be more complicated. Start by
finding the highest bit set in e. For example,

def q(b, e):
if e == 0:
return 1
if e == 1:
return b
e2, numbits = e, 0
while e2:
e2 >>= 1
numbits += 1
assert numbits >= 2
result = b
mask = 1L << (numbits - 2)
for dummy in range(numbits - 1):
result *= result
if e & mask:
result *= b
mask >>= 1
return result

That's bound to be a little faster than h on most inputs, because it also
optimizes away multiplications by 1 (well, unless b happens to be 1). Let's
see:

f 9.53526793946
o 9.53408287098
h 6.54592768903
q 6.11897031462

Yup, those matter, but not a whale of a lot. The savings in skipping
multiplications by 1 is proportional to the number of 0 bits in the
exponent. What *didn't* matter is whether it's recursive or iterative.

> From what I understand,recursion is never supposed to be the most


> efficient. I suspect there are some hidden inefficiencies in using
> while(), but I'd like to know the specifics.

Na, none of that matters. All these algorithms do a number of
multiplications proportional to log(e, 2), which is insignificantly tiny
compared to the magnitude of the result. Long-int multiplication is the
only thing that consumes significant time here, so the only thing that
matters to the time is the exact sequence of inputs fed to *. Even whether
you write the *driver* in Python or C or assembly language is insignificant
compared to that.

> If my algorithm h() is better, why can't ** use a quick test to change
> algorithms based on inputs? Or is mine better in all cases?

See Knuth <wink>.

> ...


> Also note that I'm not exactly an algorithms expert, I just happened
> upon these results while chatting with a friend.

You did good! The first couple things you tried only nick the surface of
what's known about this problem. You *could* spend a year learning
everything that's known about it. In the end, if you implemented all that,
you could easily end up with 1000x more code, which sometimes ran faster.
To become an algorithm expert, you should do that <wink>. In real life, a
crucial complementary skill is learning when to say "good enough", and move
on. That's indeed why Python's implementation settled for the
easy-to-code-in-C right-to-left binary exponentiation method. If that part
of Python had been written in Python instead, I would have used the
easy-to-code-in-Python recursive left-to-right method instead. Sticking to
easy-to-code methods also has good influence on minimizing the # of bugs, of
course.


Jeff Davis

unread,
Nov 24, 2003, 1:35:23 AM11/24/03
to
> I'm taking the liberty of rewriting your functions to stop trying to squash
> as much as they can on each line. It's important to do that if you want to
> modify the code quickly when experimenting.

Fair enough.

>
> First you rediscovered the "left to right" binary exponentiation algorithm:
>

Interesting.

<snip>

> When you tried to remove recursion from it, you rediscovered the "right to
> left" binary exponentiation algorithm:

Interesting. I suspected it wasn't a perfect unwrapping of the recursion,
because I didn't take the time to examine it carefully. Classic case of a
wonderful idea I had at 3:30 in the morning :)

<snip>


> That's close to what Python's builtin ** does, but is missing an important
> speed trick (explained later). Note that h and f aren't really the *same*
> algorithm! f squares the current value of b on every trip through the loop.
> h does not. If you print out all the inputs to all the multiplications,
> you'll see that they're not at all the same.
>

Also interesting.

<snip>

>
> Your box is faster than mine. Here under 2.3.2 on Windows:
>

That means I win, right ;)

<snip>

> exponentiation is a very hard problem; Knuth Volume 2 has an extensive
> discussion of this ("Evaluation of Powers"); another algorithm based on
> first factoring the exponent (expressing it as a product of primes) is
> better "on average" than either binary method, but sometimes loses to
> them.

Ahh... I'm on vol. 1. I saw that it would be a while before I made it to
page 100, so I actually loaned out vol. 2. I'll be getting that back now I
guess :)

<snip>

>
> def q(b, e):
> if e == 0:
> return 1
> if e == 1:
> return b
> e2, numbits = e, 0
> while e2:
> e2 >>= 1
> numbits += 1
> assert numbits >= 2
> result = b
> mask = 1L << (numbits - 2)
> for dummy in range(numbits - 1):
> result *= result
> if e & mask:
> result *= b
> mask >>= 1
> return result
>

Now I understand what you mean about the right-left and left-right
versions. I only had a vague understanding before.

> That's bound to be a little faster than h on most inputs, because it
> also optimizes away multiplications by 1 (well, unless b happens to be
> 1). Let's see:

Might as well eliminate the multiplication by 1, for some reason I
assumed that could be optimized out somehow by the computer.

<snip>


> Yup, those matter, but not a whale of a lot. The savings in
skipping
> multiplications by 1 is proportional to the number of 0 bits in the
> exponent. What *didn't* matter is whether it's recursive or iterative.

Yeah, I guess it's always the case: algorithm first, then optimize. I know
the rule, and I follow it when being paid (so as not to waste anyone's
money). When doing something more academic I always feel like I need to
break it down to the most simple operations so I understand what's going
on. To me, recursion hides what's going on to an extent, and I felt like I
didn't entirely understand the algorithm.

<snip>

>> If my algorithm h() is better, why can't ** use a quick test to change
>> algorithms based on inputs? Or is mine better in all cases?
>
> See Knuth <wink>.
>

Uh-oh, another problem that's more difficult than meets the eye. I'll be
back with more questions in a few years I guess ;)

Thanks,
Jeff

Tim Peters

unread,
Nov 24, 2003, 1:23:27 PM11/24/03
to pytho...@python.org
[Tim]
>> ... Knuth Volume 2 has an extensive discussion of this ("Evaluation

>> of Powers"); another algorithm based on first factoring the exponent
>> (expressing it as a product of primes) is better "on average" than
>> either binary method, but sometimes loses to them.

[Jeff Davis]


> Ahh... I'm on vol. 1. I saw that it would be a while before I made it
> to page 100, so I actually loaned out vol. 2. I'll be getting that
> back now I guess :)

On second thought, Knuth may not be helpful here: he's overwhelmingly
concerned with minimizing the number of multiplications performed, which is
"interesting" only if all multiplications take the same amount of time.
With ordinary care in coding, the left-to-right and right-to-left binary
exponentiation methods do exactly the same number of multiplications, so
there's no basis for favoring one based on that alone.

But large-int multiplications vary wildly in expense. For example, it's
enormously more expensive to square a million-bit int than to multiply a
million-bit int by a 10-bit int. The left-to-right and right-to-left
methods *do* differ in expense when that's taken into account too
(left-to-right is usually cheaper, in essence because each step either
squares or multiplies by the *original* base, while each step in R-to-L
either squares or multiplies two ever-growing intermediate results), and I
don't think Knuth talks about this complication.

Offhand, I don't know of anyone who does talk about it! A few hours of
Googling on it may be an interesting project for you. One reason it may be
hard to find info about this is that raw exponentiation of unbounded
integers has few practical use cases. In real life, modular exponentiation
(Python's pow(a, b, c)) is overwhelmingly more common, and then all
intermediate results are bounded above by c, so there's an upper bound on
the expense of each multiply too, and then "all multiplications cost the
same" is closer to being true. If a is very small compared to c, though, I
still expect L-to-R to have an advantage.

don't-forget-to-come-up-for-air<wink>-ly y'rs - tim


0 new messages