Performance issue when using Pointers ( instead of Values ) of Vectors in functions?

17 views
Skip to first unread message

Byron

unread,
Nov 12, 2009, 7:02:22 PM11/12/09
to golang-nuts
I found a simple go based raytracer and thought it would be nice to
test its performance. The implementation is said to be about 2 times
slower than the C++ version.

By actually using Pointers a little more I managed to increase
performance by 65 %. One more source of speed loss would be ( in a C
world ) the use of value types in vector functions. Hence vectors will
be copied into the function and the result will be copied out of it.
In C++, one would have some in-place functions as well that take
pointers to prevent excessive copying.

To my surprise, when using pointers in a dot-product function, the
performance _decreased_ by the factor of 2 ! To illustrate my change,
have a look at this diff:

http://gitorious.org/gotracer/gotracer/commit/3e8499814b6f66021025e457a6354c9d14e4a1c2

If the same function uses value types instead of pointers, its as fast
as before:

http://gitorious.org/gotracer/gotracer/commit/87211c6baec27b003a051b1a04034ad9f528d2cc


If you want to have a look at the source, please feel free to clone
the project on gitorious:

http://gitorious.org/gotracer/gotracer

The branch named http://gitorious.org/gotracer/gotracer/commits/vector_with_pointers_slow
pinpoints the pointer issue, the branch
http://gitorious.org/gotracer/gotracer/commits/vector_with_values_fast
is the fix.

What do you think, is this supposed to be like that ? I know some high-
performance C++ code where vectors/Point3 are passed by reference/
pointer preferably to increase performance, hence I consider this to
be a language bug, possibly ?

Thanks for your opinion on this.

Russ Cox

unread,
Nov 12, 2009, 7:50:57 PM11/12/09
to Byron, golang-nuts
> To my surprise, when using pointers in a dot-product function, the
> performance _decreased_ by the factor of 2 ! To illustrate my change,
> have a look at this diff:

The most likely explanation is that the processor's memory
is sensitive to cache effects, and you can control those
effects by deciding when to use a pointer and when to
copy the values around.

Part of Go being a systems language is that you get control
over this.

Russ

Seer

unread,
Nov 12, 2009, 7:54:47 PM11/12/09
to golang-nuts
Byron,

When I compile your Go program with varying sizes of chunkh and
chunkw, I get significant changes in measured performance. (2x speedup
for 4x4 and 8x8 blocks vs. 16x16 from your zip file. Hmmm... Seems
interesting!

Antoine Chavasse

unread,
Nov 12, 2009, 7:58:44 PM11/12/09
to Byron, golang-nuts
On Fri, Nov 13, 2009 at 1:02 AM, Byron <byro...@googlemail.com> wrote:

To my surprise, when using pointers in a dot-product function, the
performance _decreased_ by the factor of 2 ! To illustrate my change,
have a look at this diff:

http://gitorious.org/gotracer/gotracer/commit/3e8499814b6f66021025e457a6354c9d14e4a1c2


It makes a similar speed difference with gccgo as well. From a quick glance at the generated code for the RaySphere function, in the slow version there's a slew of additional code that allocate and copy stuff (two extra calls to _go_new and some code that looks like it initialize a struct with zeros). I don't know the details of what happens there but it likely have to do with the fact that since you're passing a pointer the compiler do some extra work to ensure that the pointer value won't be collected.

Byron

unread,
Nov 13, 2009, 4:18:54 AM11/13/09
to golang-nuts
Hmm, I couldn't reproduce your speedup using linux 64 and 6g.
Adjusting the block size didn't have a measurable effect, neither for
4x4 nor for values up to 32x32, no matter which version ( value vs.
pointers ) I used.

To me this probably means that the task-splitting is not related to
the bottlenecking.

Byron

unread,
Nov 13, 2009, 4:27:40 AM11/13/09
to golang-nuts
@Russ
Even if some processor caches are involved in this, could it ever
justify a speed loss by the factor of 2 ? Have you seen Antoines
discovery about the extra code being inserted in the slow version ?
Its very hard for me to believe that passing around pointers and
dereferencing them is that slow, and I never experienced such dramatic
effects when using C++. If so, the opposite happened as things got
faster when not copying by value all the time.

Regards,
Byron

Byron

unread,
Nov 13, 2009, 5:21:54 AM11/13/09
to golang-nuts
To the master branch of the project, I added the c++ version of the
raytracer which only runs on one core ( see http://gitorious.org/gotracer/gotracer/commits/master
).

It c++ finishes in 0.65 seconds whereas go takes 1.3 seconds on one
core, and 0.8 seconds on two cores. What I actually would like to see
is that go is just 20 percent slower than c++ as claimed, just to feel
confident about it.

This one major difference is that the vector math operates on
references in c++, but on copies ( due to the pointer issue ) in go.
An improvement is not possible as long as pointers in go show this
'slowness', and I hope this project as well as the code give enough
hints to improve that on the implementation side.

Thank you .

Byron

unread,
Nov 13, 2009, 5:32:28 AM11/13/09
to golang-nuts
I ran one more test to eliminate the possibility that all this is
related to goroutines.

In the branch http://gitorious.org/gotracer/gotracer/commits/no_go_routines
I removed the goroutine and readded the pointer based dot-product.
This still showed a performance loss from formerly 1.3 s to 5 seconds.
This example clearly shows that something terribly wrong on the
language side as there is simply no excuse for a performance loss of
that dimension ( or is it ? ).

Thanks for looking into this :) .

Antoine Chavasse

unread,
Nov 13, 2009, 5:57:31 AM11/13/09
to Byron, golang-nuts
I should have pasted the generated code somewhere, I'm at work right now so I don't have it at hand. But I looked a bit more into it yesterday and the extra code in the slow verfsion was related to this: http://gcc.gnu.org/viewcvs/branches/gccgo/libgo/runtime/go-refcount.h?view=markup

Basically it seems that there is actually already at least a partial implementation of a garbage collector in gccgo based on reference counting (perhaps the start of an ibm recycler like implementation?).
So that when you take a pointer to a variable it needs to increment a reference count and to that effect it allocates a __go_refcount to track all the reference counting done by the function, as well as a __go_refcount_entry for the particular pointer that is having its reference counter incremented (which explain the two calls to __go_new)

I've read the ibm paper about their recycler garbage collector (which is what google aims to impement in go) and it's based of reference counting so I believe that there's always going to be an overhead with using pointers because of that. I suppose that the trade off is that garbage collection itself is pretty fast and supposedly will play well with concurrency.

However, perhaps also the compiler will in time be able to be made clever enough to prove that some reference counting can safely be omitted in some situations, especially when involving only non-exported functions?
One example I can think of is that unless I'm missing something taking the address of a variable passed as a function parameter shouldn't require to increment reference counting because the caller is already holding it (except if the function is called as a goroutine but then the necessary reference counting could be generated by the go statement)

Ian Lance Taylor

unread,
Nov 13, 2009, 10:49:38 AM11/13/09
to Antoine Chavasse, Byron, golang-nuts
Antoine Chavasse <a.cha...@gmail.com> writes:

> However, perhaps also the compiler will in time be able to be made clever
> enough to prove that some reference counting can safely be omitted in some
> situations, especially when involving only non-exported functions?

Yes, that is a goal. In fact the current reference counting code in
gccgo is more or less broken. Fortunately it doesn't actually do
anything except waste time.

> One example I can think of is that unless I'm missing something taking the
> address of a variable passed as a function parameter shouldn't require to
> increment reference counting because the caller is already holding it
> (except if the function is called as a goroutine but then the necessary
> reference counting could be generated by the go statement)

Yes, assuming the address of the variable does not escape.

Ian

Byron

unread,
Nov 13, 2009, 11:08:48 AM11/13/09
to golang-nuts
This really doesn't let me go, so I went back to the c++ version and
removed the "unfair competition" flag ( -O3 ) from compiler arguments
which changed the comparison quite a bit. Now cpp dropped from 0.65s
to 1.5s ( single core ) whereas the go version takes 1.25s only ! This
in fact means go is faster by default :) !

The Pointer weirdness stays an issue though, but it might very well be
that using pointers won't make a big difference anyway. In the cpp
version, unoptimized, things became only a tiny bit slower when I
changed the references used in the vector math functions to be values
instead.

Okay, back to work, finally :) !
Reply all
Reply to author
Forward
0 new messages