Comparing Go to C/C++: Business Card Raytracer

Karan Misra

unread,

Sep 24, 2013, 1:34:09 AM9/24/13

to golan...@googlegroups.com

Hello,

I recently saw this post on Hacker News: http://fabiensanglard.net/rayTracing_back_of_business_card/index.php

I found the implementation fascinating and wanted to code the same up in Go: https://github.com/kid0m4n/gorays

The C++ version is here: https://gist.github.com/kid0m4n/6680629

Both of them produce the same (or similar looking) PPK image: http://i.imgur.com/yFicPrE.png

I have tried to make the Go version as optimal as possible (single goroutine)

Initial numbers (on my Late 2011 MBP 15, 2.2 Ghz Quad Core (2675QM), 16 GB RAM, OX 10.9, Go 1.1.2):

C++ version: 11.803 s

Go version: 28.883 s

The go version can be installed using a: go get github.com/kid0m4n/gorays

Pull requests are welcome.

Regards,

Karan

Karan Misra

unread,

Sep 24, 2013, 1:36:48 AM9/24/13

to golan...@googlegroups.com

Just so that it is clear: The intention is to get Go as fast as possible. I know this is not Go's intended application right now, but why not?

Karan Misra

unread,

Sep 24, 2013, 3:04:46 AM9/24/13

to golan...@googlegroups.com

I just tried with go 1.2rc1 and the time has improved to 25.644s

So that is definitely good news

Things I have tried in terms of optimizations:

Pointers instead of values for operating on vectors (no improvement)
float32 instead of float64 (no improvement)

Nigel Tao

unread,

Sep 24, 2013, 3:45:01 AM9/24/13

to Karan Misra, golang-nuts

On Tue, Sep 24, 2013 at 5:04 PM, Karan Misra <kid...@gmail.com> wrote:
> Things I have tried in terms of optimizations:
>
> Pointers instead of values for operating on vectors (no improvement)
> float32 instead of float64 (no improvement)

Try pulling the os.Stdout.Write call out of your inner loop.

Robert Melton

unread,

Sep 24, 2013, 4:26:36 AM9/24/13

to Nigel Tao, Karan Misra, golang-nuts

Just swapped it with a buffer, nice little 11% bump on my box.

--
Robert Melton

Sebastien Binet

unread,

Sep 24, 2013, 4:54:01 AM9/24/13

to Robert Melton, Nigel Tao, Karan Misra, golang-nuts

also, w/o looking at any cpuprofile data (so pinch of salt needed),
you might want to remove the use of (the thread-safe) rand.Float64
here:
https://github.com/kid0m4n/gorays/blob/master/main.go#L60
and use a local rand instead.

see:
http://golang.org/src/pkg/math/rand/rand.go?s=4272:4294#L136
http://golang.org/src/pkg/math/rand/rand.go?s=4272:4294#L91
http://golang.org/src/pkg/math/rand/rand.go?s=4272:4294#L179

-s

Karan Misra

unread,

Sep 24, 2013, 6:15:28 AM9/24/13

to golan...@googlegroups.com, Nigel Tao, Karan Misra

Thats weird. I had coded up a buffer based implementation, had not seen any gains. I will try again.

Karan Misra

unread,

Sep 24, 2013, 6:16:49 AM9/24/13

to golan...@googlegroups.com, Robert Melton, Nigel Tao, Karan Misra

You might be right on the money. I did examine the cpuprofile data and had seen the rand.Float64() highlighted.

Another issue though, even during the 28 second run, only 40 or so samples were collected. I thought it was one sample every 10 ms?

Karan Misra

unread,

Sep 24, 2013, 6:40:57 AM9/24/13

to golan...@googlegroups.com, Robert Melton, Nigel Tao, Karan Misra

Changed global rand to local rand: New time with go 1.2rc1: 23.816s, Previously: 25.644s, 1.828s

Nice catch!

Karan Misra

unread,

Sep 24, 2013, 7:15:49 AM9/24/13

to golan...@googlegroups.com, Nigel Tao, Karan Misra

Swapped with buffer, got some improvement:

Go 1.2rc1

Previous best: 23.816s

After change: 23.429s

Improvement: 0.387s (1.6 %)

Changeset: https://github.com/kid0m4n/gorays/commit/e9c418ec3a77d014ced05bcbd52f38aa3ef7c2af

quarnster

unread,

Sep 24, 2013, 7:33:39 AM9/24/13

to golan...@googlegroups.com

How does it perform with gccgo?

Nigel Tao

unread,

Sep 24, 2013, 7:33:28 AM9/24/13

to Karan Misra, golang-nuts

On Tue, Sep 24, 2013 at 9:15 PM, Karan Misra <kid...@gmail.com> wrote:
> Swapped with buffer, got some improvement:

You write:

buf.Write([]byte{byte(p.X), byte(p.Y), byte(p.Z)})

This is still allocating a byte slice per pixel, and that byte slice
might need to be garbage collected.

You don't need a bytes.Buffer. Just use a []byte directly.

i, buf := 0, make([]byte, 3 * *width * *height)
for y etc {
for x etc {
etc
buf[i+0] = byte(p.X)
buf[i+1] = byte(p.Y)
buf[i+2] = byte(p.Z)
i += 3
}
}
if _, err := os.Stdout.Write(buf); err != nil {
etc
}

BTW, you don't need to check n != buf.Len() || err != nil.
http://golang.org/pkg/io/#Writer says that "Write must return a
non-nil error if it returns n < len(p)".

Karan Misra

unread,

Sep 24, 2013, 7:45:04 AM9/24/13

to golan...@googlegroups.com, Karan Misra

Now we are talking:

Go1.2rc1

Previous: 23.816s

After: 22.818s

Improvement: 0.998s (4.2%)

Changeset: https://github.com/kid0m4n/gorays/commit/1d09eac86697d7f50cdf5866fd9a6988f4cf6e84

We are now at the 2x compared to the C++ version

Karan Misra

unread,

Sep 24, 2013, 7:46:03 AM9/24/13

to golan...@googlegroups.com

Honestly, have not tried... let me give it a shot though (but it will be slower than these numbers as it will be gccgo 1.1.2)

David Symonds

unread,

Sep 24, 2013, 8:01:48 AM9/24/13

to Karan Misra, golang-nuts

After these trivial changes, what does the CPU profile look like?

If you increase the image resolution to make it run for longer you'll
get a more detailed view of where the time is being spent.

Karan Misra

unread,

Sep 24, 2013, 8:22:57 AM9/24/13

to golan...@googlegroups.com, Karan Misra

Definitely going to give it a try now with a larger image (that is the reason I made it configurable :P)

Karan Misra

unread,

Sep 24, 2013, 8:32:59 AM9/24/13

to golan...@googlegroups.com, Karan Misra

Aha, hunch paid off:

Go1.2rc1

Changeset: https://github.com/kid0m4n/gorays/commit/249f229ba8c769c38d7dc018acfdf29cc86d6e43

Before: 22.818s

After: 12.747s

Improvement: 10.1s (44.13%)

Origin version in OP: 28.883s

Improvement since OP: 16.136s (55.9%)

Original C version: 11.803s

Slowness: 0.944s

Slowness to C version: 7.9% (Go version is so close now)

I looked at the tracer function and unrolled the loops (can we call it that?) Need to do the same change in the C version though. Any volunteers?

Yet to try with gccgo

Karan Misra

unread,

Sep 24, 2013, 8:55:11 AM9/24/13

to golan...@googlegroups.com, Karan Misra

Hmm, this is weird... the .prof file generated is a mere 96 bytes long. Need to read up on the Cpu profiler to see why it is not sampling correctly. I ran this on a 2000x2000 image.

On Tuesday, September 24, 2013 5:31:48 PM UTC+5:30, David Symonds wrote:

andrey mirtchovski

unread,

Sep 24, 2013, 8:58:19 AM9/24/13

to Karan Misra, golang-nuts

you're on a mac, right?

http://research.swtch.com/macpprof

Karan Misra

unread,

Sep 24, 2013, 9:08:48 AM9/24/13

to golan...@googlegroups.com, Karan Misra

Yep. I was on the right direction then. Running gorays on a dedicated Linux machine atm

Dan Kortschak

unread,

Sep 24, 2013, 9:30:17 AM9/24/13

to Karan Misra, golan...@googlegroups.com, Robert Melton, Nigel Tao

Try replacing your calls to r.Float64() with this:

var seed = ^uint32(0)

func Rand() float64 {
seed += seed
seed ^= 1
if int32(seed) < 0 {
seed ^= 0x88888eef
}
return float64(seed%95)/float64(95)
}

Pilfered with modification from the regexp tests and proposed by Rob. It
knocks off another 5% for me.

Karan Misra

unread,

Sep 24, 2013, 10:00:01 AM9/24/13

to golan...@googlegroups.com, Karan Misra, Robert Melton

Yep, works. But didn't get the 5% improvement though. Will try on a Linux machine and report back

Go1.2rc1

Changeset: https://github.com/kid0m4n/gorays/commit/9066519c24a092b7f672b71327f5c825f84a77a4

Before: 12.747s

After: 12.644s

Improvement: 0.103s (0.8%)

Origin version in OP: 28.883s

Improvement since OP: 16.239s (56.22%)

Original C version: 11.803s

Slowness: 0.841s

Slowness to C++ version: 7.12% (Go version is so close now)

Karan Misra

unread,

Sep 24, 2013, 10:03:58 AM9/24/13

to golan...@googlegroups.com, Karan Misra, Robert Melton

0.3s improvement on Linux

Karan Misra

unread,

Sep 24, 2013, 10:17:03 AM9/24/13

to golan...@googlegroups.com, Karan Misra, Robert Melton

go1.2rc1 pprof:

Total: 10580 samples
5420 51.2% 51.2% 5435 51.4% main.tracer
2672 25.3% 76.5% 10563 99.8% main.main
659 6.2% 82.7% 1347 12.7% math.Pow
444 4.2% 86.9% 683 6.5% github.com/kid0m4n/gorays/vector.Vector.Normalize
419 4.0% 90.9% 7287 68.9% main.sampler
248 2.3% 93.2% 248 2.3% math.Sqrt
200 1.9% 95.1% 308 2.9% math.ldexp
165 1.6% 96.7% 165 1.6% math.modf
142 1.3% 98.0% 174 1.6% math.normalize
134 1.3% 99.3% 200 1.9% math.frexp
32 0.3% 99.6% 32 0.3% math.Abs
13 0.1% 99.7% 13 0.1% math.Ceil
7 0.1% 99.8% 7 0.1% math.Frexp
7 0.1% 99.8% 7 0.1% math.Modf
6 0.1% 99.9% 6 0.1% runtime.newstack

pprof web output: http://i.imgur.com/BCopize.png

Donovan Hide

unread,

Sep 24, 2013, 10:28:45 AM9/24/13

to Karan Misra, golang-nuts, Robert Melton

Hi,

try the --lines flag on pprof to get a bit more detail on main() bottlenecks. Also look at the linux perf tool:

https://perf.wiki.kernel.org/index.php/Main_Page

Cheers,

Donovan.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Karan Misra

unread,

Sep 24, 2013, 12:53:48 PM9/24/13

to golan...@googlegroups.com, Karan Misra, Robert Melton

Thanks for the tip :)

Karan Misra

unread,

Sep 24, 2013, 6:25:35 PM9/24/13

to golan...@googlegroups.com, Karan Misra, Robert Melton

Now that we are so close to C level performance, I tried to see how easy it would be to expand the solution to effectively use all cores on the host machine.

Needless to say, I was not disappointed.

Go1.2rc1

Changeset: https://github.com/kid0m4n/gorays/commit/7420ef3f94be2dd0d1887d98cdbec67a14a07f9f

Before: 12.644s

After: 2.360

Improvement: 10.284s (81.33%)

Origin version in OP: 28.883s

Improvement since OP: 26.523s (91.83%)

Original C version: 11.803s

Improvement: 9.443s

Improvement to C++ version: 80% (Hurray!)

Must say, Go is proving to be very parallel in this particular usecase. Any comments on the parallel version?

https://github.com/kid0m4n/gorays/blob/7420ef3f94be2dd0d1887d98cdbec67a14a07f9f/main.go

Dan Kortschak

unread,

Sep 24, 2013, 6:57:20 PM9/24/13

to Karan Misra, golan...@googlegroups.com, Karan Misra, Robert Melton

Did you try chunking it into n chunks and starting a goroutine for each of the n chunks as needed? This would get rid of your per-row chan communication.

Nigel Tao

unread,

Sep 24, 2013, 7:00:59 PM9/24/13

to Karan Misra, golang-nuts, Robert Melton

On Wed, Sep 25, 2013 at 8:25 AM, Karan Misra <kid...@gmail.com> wrote:
> Now that we are so close to C level performance,

IIUC, your spheres spell out "Go" but the C code spells out "aek".
Does changing your "art" strings affect your running time?

Wes Freeman

unread,

Sep 24, 2013, 7:04:19 PM9/24/13

to Nigel Tao, Karan Misra, golang-nuts, Robert Melton

I wondered this too.

Kevin Gillette

unread,

Sep 24, 2013, 8:37:30 PM9/24/13

to golan...@googlegroups.com, Karan Misra, Robert Melton

I imagine it would have to. The description of the algorithm is such that if a ray does not encounter a sphere, it falls back to a very quick checkerboard color or sky gradient determination. The scene with the fewest spheres would therefore generate more quickly than the one with the most.

Karan Misra

unread,

Sep 24, 2013, 11:24:07 PM9/24/13

to golan...@googlegroups.com, Karan Misra, Robert Melton

:)

I did: https://github.com/kid0m4n/gorays/blob/parallel/main.go

I tried 3 approaches infact: https://github.com/kid0m4n/gorays/commits/parallel

But this turned out to be 2 seconds faster on a i7 2600 for a 2000x2000 image. So merged it into master.

Karan Misra

unread,

Sep 24, 2013, 11:29:07 PM9/24/13

to golan...@googlegroups.com, Karan Misra, Robert Melton

The modified C/C++ I used to get the 11.803s runtime has been modified to generate the same exact pattern as the Go one: https://gist.github.com/kid0m4n/6680629

(while the original one used in source does generate aek)

But, since after this commit: https://github.com/kid0m4n/gorays/commit/249f229ba8c769c38d7dc018acfdf29cc86d6e43

it is no longer an apples to apples comparison.

roger peppe

unread,

Sep 25, 2013, 4:50:42 AM9/25/13

to Karan Misra, golang-nuts, Robert Melton

On 24 September 2013 23:25, Karan Misra <kid...@gmail.com> wrote:
> Must say, Go is proving to be very parallel in this particular usecase. Any
> comments on the parallel version?

I'd structure it slightly differently. There's no need to add to
the WaitGroup in every row.

Something like this, perhaps:
http://play.golang.org/p/3bq226CdPI

I see another ~6% speedup from that.

Karan Misra

unread,

Sep 25, 2013, 9:09:58 PM9/25/13

to golan...@googlegroups.com, Karan Misra, Robert Melton

Rog,

This definitely looks to be a better way of doing it: https://github.com/kid0m4n/gorays/commit/ddfe825f0902877c02467a4f65f46c4044bc7939

Thanks for the tip.

-k

Michael Jones

unread,

Sep 25, 2013, 10:17:43 PM9/25/13

to Karan Misra, golang-nuts, Robert Melton

Karan, making this change:

if st == missUpward {

//return vector.Vector{X: 0.7, Y: 0.6, Z: 1}.Scale(math.Pow(1-dir.Z, 4))

p := 1-dir.Z

p = p*p

return vector.Vector{X: 0.7, Y: 0.6, Z: 1}.Scale(p)

}

saves 5.4% on runtime. Generally, using a general tool (Pow, which works for all real exponents) when all you really want is x**4, is very expensive.

Michael

(in Kuala Lumpur)

--

You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--

Michael T. Jones | Chief Technology Advocate | m...@google.com | +1 650-335-5765

Karan Misra

unread,

Sep 25, 2013, 10:28:59 PM9/25/13

to golan...@googlegroups.com, Karan Misra, Robert Melton

Thanks Michael. I love how these numbers are adding up :)

Changeset: https://github.com/kid0m4n/gorays/commit/527e08317c9307316e2a7a8e9379cf40778eeaa1

I am summarizing the various changes in a blog post :)

Michael Jones

unread,

Sep 25, 2013, 10:30:50 PM9/25/13

to Karan Misra, golang-nuts, Robert Melton

There is also something strange about width and height. They do not scale in such a way that the same image is rendered at higher resolution. Instead, an MUCH easier image is rendered because the bulk of the image is away from the reflective spheres.

See the two attached images. "Small" was rendered in the default 512x512 size. "Large" was at 2000x2000 then reduced in Photoshop to 512x512. Note the increase in the number of "single ray intersection" cases.

small.jpg

large.jpg

Karan Misra

unread,

Sep 25, 2013, 10:35:54 PM9/25/13

to golan...@googlegroups.com, Karan Misra, Robert Melton

Yep. I have noticed that before. I am working towards a fix which will also scale the objects along with the image size.

Karan Misra

unread,

Sep 25, 2013, 10:36:17 PM9/25/13

to golan...@googlegroups.com, Karan Misra, Robert Melton

Or a pull request :P :)

Michael Jones

unread,

Sep 26, 2013, 5:54:25 AM9/26/13

to Karan Misra, golang-nuts, Robert Melton

Further, you compute an expensive Pow for "p" that you may not need because of an intervening return. Moving it below the if/return saves 5.98% of overall runtime:

if st == missDownward {

h = h.Scale(0.2)

fc := vector.Vector{X: 3, Y: 3, Z: 3}

if int(math.Ceil(h.X)+math.Ceil(h.Y))&1 == 1 {

fc = vector.Vector{X: 3, Y: 1, Z: 1}

}

return fc.Scale(b*0.2 + 0.1)

}

//moving here after possible return saves 5.98% of execution time

p := math.Pow(l.DotProduct(r.Scale(sf)), 99)

return vector.Vector{X: p, Y: p, Z: p}.Add(sampler(h, r).Scale(0.5))

...but that's not all. That Pow() of x**99 is expensive. Doing it explicitly saves an additional 2.33% of execution time.

//computing x**99 explicitly saves a further 2.33% of execution time

//p := math.Pow(l.DotProduct(r.Scale(sf)), 99)

p := l.DotProduct(r.Scale(sf))

p33 := p * p // p**2

p33 = p33 * p33 // p**4

p33 = p33 * p33 // p**8

p33 = p33 * p33 // p**16

p33 = p33 * p33 // p**32

p33 = p33 * p // p**33

p = p33 * p33 * p33 // p**99

return vector.Vector{X: p, Y: p, Z: p}.Add(sampler(h, r).Scale(0.5))

Michael

P.S. Note the way I calculate p**99 saves multiplies compared to p64*p32*p2*p. This is why I'd rather see exponentiation in a language. It allows the compiler to optimize the multiply chains.

kar...@thoughtworks.com

unread,

Oct 2, 2013, 6:23:14 AM10/2/13

to golan...@googlegroups.com, Karan Misra, Robert Melton

Michael,

Thanks for the tip. This and another optimization saved a further 12% execution time, making the single threaded Go version faster than the C++ version.

I have summarized the various steps and approached taken for everyone's benefit: https://kidoman.com/programming/go-getter.html

Hope it is helpful.

-k

Karan Misra

unread,

Oct 2, 2013, 6:35:52 AM10/2/13

to golan...@googlegroups.com, Karan Misra, Robert Melton

Michael,

Thanks to your tip and one further optimization, a further 12.01 % improvement was made which made the Go version faster than the C++ version in the single-threaded test.

I have summarized this and all other optimizations in this blog post: https://kidoman.com/programming/go-getter.html

Hope it is useful for everyone.

Jan Mercl

unread,

Oct 2, 2013, 6:39:28 AM10/2/13

to Karan Misra, golang-nuts, Robert Melton

On Wed, Oct 2, 2013 at 12:35 PM, Karan Misra <kid...@gmail.com> wrote:

Very god job!

-j

Karan Misra

unread,

Oct 2, 2013, 7:38:59 AM10/2/13

to golan...@googlegroups.com, Karan Misra, Robert Melton

Merci :)

jfcg...@gmail.com

unread,

Oct 2, 2013, 9:39:57 AM10/2/13

to golan...@googlegroups.com, Karan Misra

Hi Karan,

Tiny patch for making it flexible for any size text array to render:

--- main.go Wed Oct 2 15:49:36 2013

+++ main.go Wed Oct 2 15:59:44 2013

@@ -31,11 +31,13 @@

}

func makeObjects() []object {

- objects := make([]object, 0, len(art)*len(art[0]))

- for k := 18; k >= 0; k-- {

- for j := 8; j >= 0; j-- {

- if string(art[j][18-k]) != " " {

- objects = append(objects, object{k: k, j: 8 - j})

+ nr := len(art)

+ nc := len(art[0])

+ objects := make([]object, 0, nr*nc)

+ for k := nc - 1; k >= 0; k-- {

+ for j := nr - 1; j >= 0; j-- {

+ if art[j][nc-1-k] != ' ' {

+ objects = append(objects, object{k: k, j: nr - 1 - j})

}

Karan Misra

unread,

Oct 2, 2013, 2:13:37 PM10/2/13

to golan...@googlegroups.com, Karan Misra, jfcg...@gmail.com

Cool :) Patched in...

Karan Misra

unread,

Oct 2, 2013, 8:42:25 PM10/2/13

to golan...@googlegroups.com, Karan Misra, jfcg...@gmail.com

Guys,

So I took the plunge and ported over the "optimizations" to C++, and reran the benchmarks:

https://kidoman.com/programming/go-getter-part-2.html

The optimized C++ version: https://github.com/kid0m4n/gorays/blob/master/ref/rays.cpp

The and optimization diffs: https://github.com/kid0m4n/gorays/compare/bbb8395aa999883a595267fd0230087b1ddf646c...940c91f601ef840e6d75ddf272ab6cd3eb8d5531

At this point, they are essentially running the same algorithms as well. When rendering a 2048x2048 image, C++ wins by a margin of around ~22 seconds

Is it worth looking at why there is such a gap? (knowns / unknowns) Worth investing time optimizing Go to bridge the gap / lower it?

I know this is not Go's target market (I guess), but why shouldn't it be the fastest m*f* language out there?

-k
(go fanatic)

andrey mirtchovski

unread,

Oct 2, 2013, 8:56:16 PM10/2/13

to Karan Misra, golang-nuts, jfcg...@gmail.com

In my opinion, there's no point in optimizing unless you have a
particular goal in mind. this is not a game you can win on today's
internet. you saw what happened in the reddit thread: negativity and
outright dismissal without even considering that the point was not to
compare two languages, but to learn how to optimize one of them
(that's why the title was worded in a way that avoided the direct
comparison with C++, yet most people still latched only on that).

> I know this is not Go's target market (I guess), but why shouldn't it be the
> fastest m*f* language out there?

to many (most?) of its users, Go is already "the most" programming
language in many categories. simplicity, clarity, and generality are
the key to Go, and I'd be reluctant to give them up for a minute
improvement in a contrived microbenchmark. besides, I think your
program is impressively fast as it is.

Michael Jones

unread,

Oct 2, 2013, 9:05:33 PM10/2/13

to andrey mirtchovski, Karan Misra, golang-nuts, jfcg...@gmail.com

C compilers are very good. At some point you are simply benchmarking compiler (as opposed to language) maturity. One of the first replies to the original post was "Have you tried this with GCCGO?" You might want to do that...

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ugorji Nwoke

unread,

Oct 2, 2013, 9:12:41 PM10/2/13

to golan...@googlegroups.com, Karan Misra, jfcg...@gmail.com

On Wednesday, October 2, 2013 8:42:25 PM UTC-4, Karan Misra wrote:

Guys,

So I took the plunge and ported over the "optimizations" to C++, and reran the benchmarks:

https://kidoman.com/programming/go-getter-part-2.html

Your graph on the page above seems mislabeled. It looks like Go is fastest.

Karan Misra

unread,

Oct 2, 2013, 9:40:43 PM10/2/13

to Ugorji Nwoke, golan...@googlegroups.com, jfcg...@gmail.com

I checked it again. It looks right.

Go is faster than the unoptimized C++ version and when utilizing > 1 core.

Sent from my iPhone

Karan Misra

unread,

Oct 2, 2013, 9:43:26 PM10/2/13

to Michael Jones, andrey mirtchovski, golang-nuts, jfcg...@gmail.com

True. But then, these tests can also act as a litmus test for the maturity level of the Go compiler.

For example, I was looking at what methods which were being inlined in the vector package. Definitely saw some weird signals there.

Also, had totally forgotten about gccgo. Will definitely give it a spin.

Sent from my iPhone

Ugorji Nwoke

unread,

Oct 2, 2013, 9:57:34 PM10/2/13

to golan...@googlegroups.com, Ugorji Nwoke, jfcg...@gmail.com

Your graph has Go at 21.51, C++ at 58.15, and C++ optimized at 90.46. I'm saying this based on the color coding legend at bottom of the graph. From those numbers, it seems Go is faster than both.

Also, can you explain what each of those bars are? Are the numbers all using one core? What does C++ optimized mean - is it C++ compiled with -O3, or is it the new C++ code written to have logic parity with the latest version of the Go code?

Gustavo Niemeyer

unread,

Oct 3, 2013, 1:05:17 PM10/3/13

to Michael Jones, andrey mirtchovski, Karan Misra, golang-nuts, jfcg...@gmail.com

Not to mention the original code was optimized to be printed on a
business card. A ray tracer on a business card. Hard to beat.

gustavo @ http://niemeyer.net

Kevin Gillette

unread,

Oct 3, 2013, 10:31:47 PM10/3/13

to golan...@googlegroups.com, Michael Jones, andrey mirtchovski, Karan Misra, jfcg...@gmail.com

Agreed. I've no doubt that the original could have been much more highly optimized, but for example, manual loop unrolling would be contrary to the goal of getting the most functional value per unit of code volume; as soon as we expanded the code beyond (un-gofmt'd, minified) business-card size, the comparison to the original is moot.

How efficient and functional can we get a Go raytracer, at an equivalent-to-original text display size, when restricted to the size of a business card?

Karan Misra

unread,

Oct 4, 2013, 11:10:12 PM10/4/13

to golan...@googlegroups.com, Michael Jones, andrey mirtchovski, Karan Misra, jfcg...@gmail.com

That would be a very interesting question to answer... but I guess the property which makes go such a nice language to read/understand will stand against it as well.

I have also rerun the benchmarks again with C++ multi-threading and further optimizations: https://kidoman.com/programming/go-getter-part-3.html

Having a hard time getting a decent run out of gccgo. It is slower back a factor of 5 (gcc 4.8.1) when compared to go 1.2rc1 for the latest version of gorays

Michael Jones

unread,

Oct 5, 2013, 12:24:33 AM10/5/13

to Karan Misra, jfcg...@gmail.com, andrey mirtchovski, golan...@googlegroups.com

Good work, Karan. You've learned much about fitting a tiny program to Go's metaphors. I am concerned about the gccgo performance. It should be 10% better at least. May be a MP structure problem. Would not be surprised since your scaling us sublinear for such an easily parallelizable task.

Karan Misra

unread,

Oct 10, 2013, 2:04:03 AM10/10/13

to golan...@googlegroups.com, Karan Misra, jfcg...@gmail.com, andrey mirtchovski

Let me try to see the scaling using real cores. I had run my tests on a 2600, which has 4 real cores and 4 HT cores. That would definitely the scalability.

Michael Jones

unread,

Oct 10, 2013, 2:15:17 AM10/10/13

to Karan Misra, golang-nuts, Serhat Şevki Dinçer, andrey mirtchovski

Try it with 2x the hyperthreaded total:

4 cpus + 4 HT cpus, try GOMAXPROCS=16

That tends to be optimum for me.

Karan Misra

unread,

Oct 10, 2013, 2:31:34 AM10/10/13

to Michael Jones, golang-nuts, Serhat Şevki Dinçer, andrey mirtchovski

For such a CPU intensive activity (the entire CPU maxes out currently with GOMAXPROCS=9) would increasing GOMAXPROCS to 16 help?

Regards,
Karan 'kid0m4n' Misra

CEO, Erodov Media Pvt Ltd

Phone: +91 809 555 0069
Website: www.erodov.com

Michael Jones

unread,

Oct 10, 2013, 9:34:59 AM10/10/13

to Karan Misra, golang-nuts, Serhat Şevki Dinçer, andrey mirtchovski

It should not. I knew that would be a contentious email. But it seems to help my programs...

Job van der Zwan

unread,

Oct 12, 2013, 5:17:44 AM10/12/13

to golan...@googlegroups.com, Karan Misra, Serhat Şevki Dinçer, andrey mirtchovski

Any ideas why? Load balancing?

Michael Jones

unread,

Oct 12, 2013, 9:53:54 AM10/12/13

to Job van der Zwan, golang-nuts, Karan Misra, Serhat Şevki Dinçer, andrey mirtchovski

No. Feel free to disregard my anecdotal evidence. Maybe it is just something about my Mac's (OS X's) processor management properties.

Job van der Zwan

unread,

Oct 12, 2013, 11:17:27 AM10/12/13

to golan...@googlegroups.com, Job van der Zwan, Karan Misra, Serhat Şevki Dinçer, andrey mirtchovski

Well, if the difference pops up consistently, wouldn't finding the minimal algorithm that reproduces the quirk be useful feedback for the people working on the scheduler, for example?

atomly

unread,

Oct 14, 2013, 12:20:23 PM10/14/13

to Michael Jones, Karan Misra, golang-nuts, Serhat Şevki Dinçer, andrey mirtchovski

On Thu, Oct 10, 2013 at 9:34 AM, Michael Jones <m...@google.com> wrote:

I knew that would be a contentious email.

Pardon the pun. :P

:: atomly ::

[ ato...@atomly.com : www.atomly.com : http://blog.atomly.com/ ...
[ atomiq records : new york city : +1.347.692.8661 ...
[ e-mail atomly-new...@atomly.com for atomly info and updates ...

Reply all

Reply to author

Forward