math.Pow(N, 2) slowness

689 views
Skip to first unread message

S.Çağlar Onur

unread,
Feb 25, 2014, 11:14:06 PM2/25/14
to golan...@googlegroups.com
Hi,

I was profiling a code that generates complete graphs, where the vertices are points chosen randomly inside the hypercube and edges have the weight of the Euclidean distance between its endpoints. 

Profiling showed that I was spending a huge amount of time inside the math.Pow function. I tried to search the list and the issue tracker, but couldn’t find anything so I implemented the naive approach (just multiplying the numbers). That change improved my situation a lot, as I call the euclidean function millions of times.

I decided to write the following benchmark to see what’s going on.

{{{ — BEGIN — }}}

package pow

import (
"math"
"math/rand"
"testing"
"time"
)

type Vertex struct {
x, y, z, w float64
}

func Euclidean(first, second Vertex, unroll bool) float64 {
x := first.x - second.x
y := first.y - second.y
z := first.z - second.z
w := first.w - second.w
if unroll {
return math.Sqrt(x*x + y*y + z*z + w*w)
}
return math.Sqrt(math.Pow(x, 2) + math.Pow(y, 2) + math.Pow(z, 2) + math.Pow(w, 2))
}

func BenchmarkUnrollTrue(b *testing.B) {
vertices := make([]Vertex, b.N)
rand.Seed(time.Now().UnixNano())

for n := 0; n < b.N; n++ {
vertices[n].x = rand.Float64()
vertices[n].y = rand.Float64()
vertices[n].z = rand.Float64()
vertices[n].w = rand.Float64()
}

b.ResetTimer()
for n := 0; n < b.N; n++ {
Euclidean(vertices[0], vertices[n], true)
}
}

func BenchmarkUnrollFalse(b *testing.B) {
vertices := make([]Vertex, b.N)
rand.Seed(time.Now().UnixNano())

for n := 0; n < b.N; n++ {
vertices[n].x = rand.Float64()
vertices[n].y = rand.Float64()
vertices[n].z = rand.Float64()
vertices[n].w = rand.Float64()
}

b.ResetTimer()
for n := 0; n < b.N; n++ {
Euclidean(vertices[0], vertices[n], false)
}
}

{{{ — END — }}}

Running this on my 2GHz Intel Core i7 laptop (OSX 10.9.1) with 2 weeks old go showed following;

[sonur@kokai:~/g/pow] go test -bench=.
testing: warning: no tests to run
PASS
BenchmarkUnrollTrue 100000000         26.1 ns/op
BenchmarkUnrollFalse 10000000       269 ns/op
ok  _/Users/sonur/g/pow 34.602s

[sonur@kokai:~/g/pow] go version
go version devel +6d3bdbd27761 Thu Feb 06 09:11:00 2014 -0800 darwin/amd64

so I updated my system to the latest go but nothing really changed.

[sonur@kokai:~/g/pow] go version
go version devel +bca0032e86af Wed Feb 26 13:20:36 2014 +1100 darwin/amd64

[sonur@kokai:~/g/pow] go test -bench=.
testing: warning: no tests to run
PASS
BenchmarkUnrollTrue 100000000         24.8 ns/op
BenchmarkUnrollFalse 10000000       254 ns/op
ok  _/Users/sonur/g/pow 41.491s

Assuming the benchmark code is correct, I’m wondering whether this difference is expected or is this a regression?

Best,
-- 
S.Çağlar Onur <cag...@10ur.org>

Dave Cheney

unread,
Feb 25, 2014, 11:41:57 PM2/25/14
to golan...@googlegroups.com
Perhaps you are benchmarking the cost of 5 function calls vs 1 ?
Message has been deleted

S.Çağlar Onur

unread,
Feb 26, 2014, 12:22:57 AM2/26/14
to golan...@googlegroups.com
Hi Dave,

[Resending as for some reason my previous message has been deleted]


On Tuesday, February 25, 2014 11:41:57 PM UTC-5, Dave Cheney wrote:
Perhaps you are benchmarking the cost of 5 function calls vs 1 ?

So you are saying the 10x difference is caused by calling math.Pow 4 times hence function call overhead? I'm not trying to imply something but isn't that a lot? In my specific case not calling math.Pow decreased the the total runtime from ~5 min to well under 3 min for a complete graph with 65356 vertices and 2135670690 edges.

I checked the go build output by passing -gcflags -m and it does not mention anything about the Euclidean function, is this because go cannot inline non-leaf functions (if my terminology correct)?

S.Çağlar Onur

unread,
Feb 26, 2014, 12:29:16 AM2/26/14
to golan...@googlegroups.com
Hi,

Another data point, I just added following special case to src/pkg/math/pow.go and re-built it. 

[sonur@kokai:/opt/golang/src] hg diff
diff -r bca0032e86af src/pkg/math/pow.go
--- a/src/pkg/math/pow.go Wed Feb 26 13:20:36 2014 +1100
+++ b/src/pkg/math/pow.go Wed Feb 26 00:23:10 2014 -0500
@@ -41,6 +41,8 @@
  return 1
  case y == 1:
  return x
+ case y == 2:
+ return x*x
  case y == 0.5:
  return Sqrt(x)
  case y == -0.5:

and I think numbers are looking more reasonable now;

[sonur@kokai:~/g/pow] go test -bench=.
testing: warning: no tests to run
PASS
BenchmarkUnrollTrue 100000000        27.5 ns/op
BenchmarkUnrollFalse 50000000        46.9 ns/op
ok   _/Users/sonur/g/pow 54.729s

so based on this I'm assuming the 10x change is not the function call overhead.

Dave Cheney

unread,
Feb 26, 2014, 12:29:30 AM2/26/14
to S.Çağlar Onur, golang-nuts
On Wed, Feb 26, 2014 at 4:22 PM, S.Çağlar Onur <cag...@10ur.org> wrote:
Hi Dave,

[Resending as for some reason my previous message has been deleted]


On Tuesday, February 25, 2014 11:41:57 PM UTC-5, Dave Cheney wrote:
Perhaps you are benchmarking the cost of 5 function calls vs 1 ?

So you are saying the 10x difference is caused by calling math.Pow 4 times hence function call overhead? I'm not trying to imply something but isn't that a lot?

It was a guess. Probably it's wrong, but it should be easy for you to test.
 
In my specific case not calling math.Pow decreased the the total runtime from ~5 min to well under 3 min for a complete graph with 65356 vertices and 2135670690 edges.

I checked the go build output by passing -gcflags -m and it does not mention anything about the Euclidean function, is this because go cannot inline non-leaf functions (if my terminology correct)?

That is correct, if math.Pow called another function, is longer than a few lines (very poor rule of thumb, there are many reasons why a leaf function may not be inlined) or was not written in Go, it will not be inlined at the default inlining level. 
 

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/GOviOxfkt2E/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Dave Cheney

unread,
Feb 26, 2014, 12:35:05 AM2/26/14
to S.Çağlar Onur, golang-nuts
There you go. I was completely wrong, not unexpected. 

So the real cause is no fast path for x^2.

I recommend doing the following (in order)

1. raise an issue, golang.org/issue/new linking to this discussion and your diagnosis.
2. if you want to, propose a change with your fix. If Pow(x, 2) doesn't have a test case you will need to add one, as well as including a benchmark if the benchmarks in the math package don't already cover this.

Cheers

Dave


--

Charlie Dorian

unread,
Feb 26, 2014, 12:43:40 AM2/26/14
to golan...@googlegroups.com
Paraphrasing Michael Jones (in another discussion): Generally, using a general tool (Pow, which works for all real exponents) when all you really want is x**2, is very expensive.

Caleb Spare

unread,
Feb 26, 2014, 12:55:39 AM2/26/14
to Charlie Dorian, golang-nuts
(well, in Go, x*x)


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

S.Çağlar Onur

unread,
Feb 26, 2014, 12:58:39 AM2/26/14
to golan...@googlegroups.com, S.Çağlar Onur
Hi Dave,


On Wednesday, February 26, 2014 12:35:05 AM UTC-5, Dave Cheney wrote:
There you go. I was completely wrong, not unexpected. 

So the real cause is no fast path for x^2.

I recommend doing the following (in order)


Been there, done that :)
 
1. raise an issue, golang.org/issue/new linking to this discussion and your diagnosis.

Will do in a minute.
 
2. if you want to, propose a change with your fix. If Pow(x, 2) doesn't have a test case you will need to add one, as well as including a benchmark if the benchmarks in the math package don't already cover this.

I think I would like to see people's reaction first before proposing a solution with test cases and benchmarks. I fear that others would object this idea given the fact that it's not possible to add every special case into Pow function's fast path (Eg; this change fixed the Pow(N, 2) but Pow(N, 3) suffers with they same symptom). Maybe one could argue that it's OK to add special cases for 2 and 3 by assuming they will be used more frequently.

Thanks.

Dave Cheney

unread,
Feb 26, 2014, 1:10:22 AM2/26/14
to "S.Çağlar Onur", golan...@googlegroups.com, "S.Çağlar Onur"


On 26 Feb 2014, at 16:58, S.Çağlar Onur <cag...@10ur.org> wrote:

Hi Dave,

On Wednesday, February 26, 2014 12:35:05 AM UTC-5, Dave Cheney wrote:
There you go. I was completely wrong, not unexpected. 

So the real cause is no fast path for x^2.

I recommend doing the following (in order)


Been there, done that :)
 
1. raise an issue, golang.org/issue/new linking to this discussion and your diagnosis.

Will do in a minute.
 
2. if you want to, propose a change with your fix. If Pow(x, 2) doesn't have a test case you will need to add one, as well as including a benchmark if the benchmarks in the math package don't already cover this.

I think I would like to see people's reaction first before proposing a solution with test cases and benchmarks. I fear that others would object this idea given the fact that it's not possible to add every special case into Pow function's fast path (Eg; this change fixed the Pow(N, 2) but Pow(N, 3) suffers with they same symptom). Maybe one could argue that it's OK to add special cases for 2 and 3 by assuming they will be used more frequently.


Yup. As you said, we can't add special cases for every integer, so possibly the line has been drawn already. Best to wait for a decision before proceeding. 

Michael Jones

unread,
Feb 26, 2014, 1:18:59 AM2/26/14
to S.Çağlar Onur, golang-nuts
This is a good illustration of where compilers can help. 

The general Pow(x,n) function is in reality, Exp(n*Log(x)) and that's a dozen multiplies. That means that specific code for Pow(x, i int), -256 <= i <= 256 will be faster. Notice that the source of Pow is a big switch on special cases, and adding a few more seems ok but obviously incomplete.

On the other hand, if we put exponentiation into a language, say x**2 a la FORTRAN, then the compiler has a chance to look at that 2 and make a rational judgement on how to evaluate it. Of course the preference will be something direct (x*x), as will be the case for x**15:

n*n = n^2
n^2*n = n^3
n^3*n^3 = n^6
n^6*n^6 = n^12
n^12*n^3 = n^15

​This derivation is explained in Knuth's TAOCP, though I'm in a hotel in Miami Beach and can't cite the page.


​Compilers are a great place to do this analysis. Macho compile-time libraries like Boost are an alternative place. The Pow() function is really too late, since the testing costs in every evaluation.​

Even if you special-case Pow(), it will still be slower than just doing x*x, so why not do that?

Michael


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.



--
Michael T. Jones | Chief Technology Advocate  | m...@google.com |  +1 650-335-5765

S.Çağlar Onur

unread,
Feb 26, 2014, 1:23:42 AM2/26/14
to golan...@googlegroups.com
Hi Charlie,

On Wednesday, February 26, 2014 12:43:40 AM UTC-5, Charlie Dorian wrote:
Paraphrasing Michael Jones (in another discussion): Generally, using a general tool (Pow, which works for all real exponents) when all you really want is x**2, is very expensive.

I completely agree with that but that was just an example. I just tried glibc's pow function which is pretty generic too and although it showed some slowness, it was not 10x like in our case. I'll raise an issue and see what will happen :)

Best,

Dave Cheney

unread,
Feb 26, 2014, 1:58:23 AM2/26/14
to "S.Çağlar Onur", golan...@googlegroups.com
Nice issue, good summary. 
--

Charlie Dorian

unread,
Feb 26, 2014, 2:41:14 AM2/26/14
to golan...@googlegroups.com
I guess I'm hoping to make the point that, if you want faster execution, it is worthwhile to think about how to avoid function calls and to minimize the number of operations. Even after you added a special case for Pow(x, 2) your benchmark showed that calling Pow(x, 2) took 1.71 times longer than just x*x.

Konstantin Khomoutov

unread,
Feb 26, 2014, 5:08:36 AM2/26/14
to S.Çağlar Onur, golan...@googlegroups.com
On Tue, 25 Feb 2014 22:23:42 -0800 (PST)
S.Çağlar Onur <cag...@10ur.org> wrote:

> > Paraphrasing Michael Jones (in another discussion): Generally,
> > using a general tool (Pow, which works for all real exponents) when
> > all you really want is x**2, is very expensive.
>
> I completely agree with that but that was just an example. I just
> tried glibc's pow function which is pretty generic too and although
> it showed some slowness, it was not 10x like in our case. I'll raise
> an issue and see what will happen :)

From what I see in the GNU glibc sources, implementations of IEEE754
pow() are coded in plain assembly for each supported architecture (or
at least popular ones, like x86 and amd64). Since Go's toolchain has
assembler you could possibly embark on borrowing this idea for your
arch and translate their assembler implementation to Go's assembler.

In case you're interested, you might look at the files whose names
match the "*pow*.S" pattern under the "sysdeps/$arch/fpu"
directories in the glibc source tree.

Note that if you're about to "mostly copy" the code targeting Go
itself, taking it from glibc is a bad idea as it's virally licensed
under the terms of LGPL and Go's ecosystem prefers BSD-style licensing.

Michael Jones

unread,
Feb 26, 2014, 9:40:57 AM2/26/14
to S.Çağlar Onur, golang-nuts
Function call overhead is non-zero. Doing a floating-point multiply is fast. So several functions calls, each of which has to navigate the switch table (if/else if/else if/...) and then do the multiply, is going to be slower. If your program is dominated by the computation, then yes, 10x is very possible. As has been stated by others, Pow(x,y) is capable of Pow(Pi,E) and all you want is X*X, so just compute X*X. (Depending on the larger algorithm, you might get away without the sqrt too --- if all you need is to order based on sqrt then ordering on the square of that will give you the same order.)


On Wed, Feb 26, 2014 at 12:08 AM, S.Çağlar Onur <cag...@10ur.org> wrote:
Hi Dave,


On Tuesday, February 25, 2014 11:41:57 PM UTC-5, Dave Cheney wrote:
Perhaps you are benchmarking the cost of 5 function calls vs 1 ?

So are you saying the ~10x difference is caused by calling the math.Pow 4 times and hence function call overhead? I'm not trying to imply something but isn't that sounds a lot. In my specific case not using math.Pow decreases my the total runtime from ~5min to well under 3min for a complete graph with 65356 vertices and 2135670690 edges. 

Also, passing "-gcflags -m" to go build shows nothing about the Euclidean function. Is it because go cannot inline non-leaf functions (if my terminology correct)?

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

Charlie Dorian

unread,
Feb 26, 2014, 6:12:30 PM2/26/14
to golan...@googlegroups.com, S.Çağlar Onur
Actually, there are no fast paths in the special cases of Pow. It may look like there is one for Pow(x, 0.5), but it's not. The IEEE 754 standard says that Sqrt(-0.0) = -0.0, so Pow(-0.0, 0.5) does, too.

I was surprised when I learned that.

On Wednesday, February 26, 2014 12:35:05 AM UTC-5, Dave Cheney wrote:

minux

unread,
Feb 27, 2014, 1:31:55 AM2/27/14
to Charlie Dorian, golang-nuts, S.Çağlar Onur
On Wed, Feb 26, 2014 at 6:12 PM, Charlie Dorian <cldo...@gmail.com> wrote:
Actually, there are no fast paths in the special cases of Pow. It may look like there is one for Pow(x, 0.5), but it's not. The IEEE 754 standard says that Sqrt(-0.0) = -0.0, so Pow(-0.0, 0.5) does, too.
There do exist special cases for Pow(x, 0.5) and Pow(x, -0.5). Am I missing anything?

Rob Pike

unread,
Feb 27, 2014, 1:41:16 AM2/27/14
to minux, Charlie Dorian, golang-nuts, S.Çağlar Onur
I'm surprised there are special cases for 0 and 1. Those seem so
unlikely to be not worth the cost of checking. For 2 and .5, they seem
only useful to those not skilled in the art. (They'll never appear
dynamically, only with a constant exponent.)

-rob

Robert Johnstone

unread,
Feb 27, 2014, 9:14:51 AM2/27/14
to golan...@googlegroups.com
Hello,

There are two additional wrinkles to consider.

First, there is a faster algorithm for computing the power of a float when the power is known to be an integer.  The name escapes me for the moment, but you can find it in many mathematical libraries.

Second, if you add fast-paths to math.Pow to calculate the function using an alternative method, then be aware that the alternative method will not necessarily exhibit the same error.  This could introduce some difficult to solve bugs in floating point code.

For this particular use case, it is not clear to me why using math.Pow is preferable to simply writing x*x.

Michael Jones

unread,
Feb 27, 2014, 9:47:58 AM2/27/14
to Robert Johnstone, golang-nuts

"Binary Powering"

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

Charlie Dorian

unread,
Feb 27, 2014, 2:37:52 PM2/27/14
to golan...@googlegroups.com, Charlie Dorian, S.Çağlar Onur
Yes. Pow(x, 0.5) [and Pow(x, -0.5)] returns the correct answer when x is 0.0 and when x is -0.0. The answers are equal but they are not identical.

Charlie Dorian

unread,
Feb 27, 2014, 2:40:33 PM2/27/14
to golan...@googlegroups.com, minux, Charlie Dorian, S.Çağlar Onur
The code was written to return correct values for Pow(x, y) for all values of x and y.

S.Çağlar Onur

unread,
Feb 27, 2014, 3:02:24 PM2/27/14
to golan...@googlegroups.com
Hi Micheal,


On Wednesday, February 26, 2014 9:40:57 AM UTC-5, Michael Jones wrote:
Function call overhead is non-zero. Doing a floating-point multiply is fast. So several functions calls, each of which has to navigate the switch table (if/else if/else if/...) and then do the multiply, is going to be slower. If your program is dominated by the computation, then yes, 10x is very possible. As has been stated by others, Pow(x,y) is capable of Pow(Pi,E) and all you want is X*X, so just compute X*X. (Depending on the larger algorithm, you might get away without the sqrt too --- if all you need is to order based on sqrt then ordering on the square of that will give you the same order.)

You don't have to convince me about using x*x because as I said before I agree with that statement. My initial question was about understanding the 10x difference between those two methods. I thought there could be a regression in math.Pow function but further discussion suggested otherwise so I moved on :)

Rob Pike

unread,
Feb 27, 2014, 3:13:00 PM2/27/14
to Charlie Dorian, golan...@googlegroups.com, minux, S.Çağlar Onur
I don't understand what you're saying here as a response to what I
said. Of course the code is written to be correct, but why are those
special cases necessary to make that possible?

-rob
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an

Charlie Dorian

unread,
Feb 27, 2014, 3:51:26 PM2/27/14
to golan...@googlegroups.com, Charlie Dorian, minux, S.Çağlar Onur
Well, for example, Pow(±0, ±0) = 1 (not 0), Pow(NaN, ±0) = 1 (not NaN), and Pow(-0, 1) = -0 (not 0). The IEEE 754 standard defines these values, which have changed from time to time.

Charlie Dorian

unread,
Feb 27, 2014, 4:14:10 PM2/27/14
to golan...@googlegroups.com, Charlie Dorian, minux, S.Çağlar Onur
Another example, Pow(0.999, NaN) = NaN, Pow(1, NaN) = 1, Pow(1.001, NaN) = NaN.

Harald Weidner

unread,
Feb 28, 2014, 9:24:31 AM2/28/14
to golan...@googlegroups.com
Hello,

Robert Johnstone <r.w.jo...@gmail.com>:

>First, there is a faster algorithm for computing the power of a float when
>the power is known to be an integer. The name escapes me for the moment,
>but you can find it in many mathematical libraries.

Exponentiation by squaring, see
http://en.wikipedia.org/wiki/Exponentiation_by_squaring

Also sometimes refered to as Square and Multiply, or Binary Exponentation.

Harald

egon

unread,
Feb 28, 2014, 9:32:17 AM2/28/14
to golan...@googlegroups.com, Charlie Dorian, minux, S.Çağlar Onur


On Thursday, February 27, 2014 10:51:26 PM UTC+2, Charlie Dorian wrote:
Well, for example, Pow(±0, ±0) = 1 (not 0), Pow(NaN, ±0) = 1 (not NaN), and Pow(-0, 1) = -0 (not 0). The IEEE 754 standard defines these values, which have changed from time to time.

This http://play.golang.org/p/GrL2QmJmrj looks very wrong...
I would expect only NaN from an operation involving NaN.

+ egon

Rob Pike

unread,
Feb 28, 2014, 11:29:11 AM2/28/14
to egon, golan...@googlegroups.com, Charlie Dorian, minux, S.Çağlar Onur
Pow(±0, ±0) = 1 seems nuts to me, about as plausible as 1 - 1 + 1 - 1
... equaling 1/2. Yet it does, I am told. I defer to the experts.

-rob

Jesper Louis Andersen

unread,
Feb 28, 2014, 11:41:25 AM2/28/14
to Rob Pike, egon, golan...@googlegroups.com, Charlie Dorian, minux, S.Çağlar Onur

On Fri, Feb 28, 2014 at 5:29 PM, Rob Pike <r...@golang.org> wrote:
Pow(±0, ±0) = 1 seems nuts to me, about as plausible as 1 - 1 + 1 - 1
... equaling 1/2. Yet it does, I am told. I defer to the experts.

Pow(0, 0) is mathematically undefined. But IEEE 754 is not math in the traditional sense, so it may give a definition, however insane that definition is. The series you refer to is called Grandi's Series and the Wikipedia article kind-of explains why that series is problematic:


One possible interpretation is that it is 1/2, but that is not the whole story :)


--
J.

Charlie Dorian

unread,
Feb 28, 2014, 1:05:19 PM2/28/14
to golan...@googlegroups.com, Charlie Dorian, minux, S.Çağlar Onur
There are other math functions that do not propagate a NaN argument. For examples, http://play.golang.org/p/1chu38skNd.
Reply all
Reply to author
Forward
0 new messages