Comparing Go to C/C++: Business Card Raytracer

3,104 views
Skip to first unread message

Karan Misra

unread,
Sep 24, 2013, 1:34:09 AM9/24/13
to golan...@googlegroups.com
Hello,

I found the implementation fascinating and wanted to code the same up in Go: https://github.com/kid0m4n/gorays
The C++ version is here: https://gist.github.com/kid0m4n/6680629

Both of them produce the same (or similar looking) PPK image: http://i.imgur.com/yFicPrE.png

I have tried to make the Go version as optimal as possible (single goroutine)

Initial numbers (on my Late 2011 MBP 15, 2.2 Ghz Quad Core (2675QM), 16 GB RAM, OX 10.9, Go 1.1.2):

C++ version: 11.803 s
Go version: 28.883 s

The go version can be installed using a: go get github.com/kid0m4n/gorays

Pull requests are welcome.

Regards,
Karan

Karan Misra

unread,
Sep 24, 2013, 1:36:48 AM9/24/13
to golan...@googlegroups.com
Just so that it is clear: The intention is to get Go as fast as possible. I know this is not Go's intended application right now, but why not?

Karan Misra

unread,
Sep 24, 2013, 3:04:46 AM9/24/13
to golan...@googlegroups.com
I just tried with go 1.2rc1 and the time has improved to 25.644s

So that is definitely good news

Things I have tried in terms of optimizations:
  1. Pointers instead of values for operating on vectors (no improvement)
  2. float32 instead of float64 (no improvement)

Nigel Tao

unread,
Sep 24, 2013, 3:45:01 AM9/24/13
to Karan Misra, golang-nuts
On Tue, Sep 24, 2013 at 5:04 PM, Karan Misra <kid...@gmail.com> wrote:
> Things I have tried in terms of optimizations:
>
> Pointers instead of values for operating on vectors (no improvement)
> float32 instead of float64 (no improvement)

Try pulling the os.Stdout.Write call out of your inner loop.

Robert Melton

unread,
Sep 24, 2013, 4:26:36 AM9/24/13
to Nigel Tao, Karan Misra, golang-nuts
Just swapped it with a buffer, nice little 11% bump on my box.

--
Robert Melton

Sebastien Binet

unread,
Sep 24, 2013, 4:54:01 AM9/24/13
to Robert Melton, Nigel Tao, Karan Misra, golang-nuts
also, w/o looking at any cpuprofile data (so pinch of salt needed),
you might want to remove the use of (the thread-safe) rand.Float64
here:
https://github.com/kid0m4n/gorays/blob/master/main.go#L60
and use a local rand instead.

see:
http://golang.org/src/pkg/math/rand/rand.go?s=4272:4294#L136
http://golang.org/src/pkg/math/rand/rand.go?s=4272:4294#L91
http://golang.org/src/pkg/math/rand/rand.go?s=4272:4294#L179

-s

Karan Misra

unread,
Sep 24, 2013, 6:15:28 AM9/24/13
to golan...@googlegroups.com, Nigel Tao, Karan Misra
Thats weird. I had coded up a buffer based implementation, had not seen any gains. I will try again.

Karan Misra

unread,
Sep 24, 2013, 6:16:49 AM9/24/13
to golan...@googlegroups.com, Robert Melton, Nigel Tao, Karan Misra
You might be right on the money. I did examine the cpuprofile data and had seen the rand.Float64() highlighted.

Another issue though, even during the 28 second run, only 40 or so samples were collected. I thought it was one sample every 10 ms?

Karan Misra

unread,
Sep 24, 2013, 6:40:57 AM9/24/13
to golan...@googlegroups.com, Robert Melton, Nigel Tao, Karan Misra
Changed global rand to local rand: New time with go 1.2rc1: 23.816s, Previously: 25.644s, 1.828s

Nice catch!

Karan Misra

unread,
Sep 24, 2013, 7:15:49 AM9/24/13
to golan...@googlegroups.com, Nigel Tao, Karan Misra
Swapped with buffer, got some improvement:

Go 1.2rc1
Previous best: 23.816s
After change: 23.429s
Improvement: 0.387s (1.6 %)

quarnster

unread,
Sep 24, 2013, 7:33:39 AM9/24/13
to golan...@googlegroups.com
How does it perform with gccgo?

Nigel Tao

unread,
Sep 24, 2013, 7:33:28 AM9/24/13
to Karan Misra, golang-nuts
On Tue, Sep 24, 2013 at 9:15 PM, Karan Misra <kid...@gmail.com> wrote:
> Swapped with buffer, got some improvement:

You write:

buf.Write([]byte{byte(p.X), byte(p.Y), byte(p.Z)})

This is still allocating a byte slice per pixel, and that byte slice
might need to be garbage collected.

You don't need a bytes.Buffer. Just use a []byte directly.

i, buf := 0, make([]byte, 3 * *width * *height)
for y etc {
for x etc {
etc
buf[i+0] = byte(p.X)
buf[i+1] = byte(p.Y)
buf[i+2] = byte(p.Z)
i += 3
}
}
if _, err := os.Stdout.Write(buf); err != nil {
etc
}

BTW, you don't need to check n != buf.Len() || err != nil.
http://golang.org/pkg/io/#Writer says that "Write must return a
non-nil error if it returns n < len(p)".

Karan Misra

unread,
Sep 24, 2013, 7:45:04 AM9/24/13
to golan...@googlegroups.com, Karan Misra
Now we are talking:

Go1.2rc1
Previous: 23.816s
After: 22.818s
Improvement: 0.998s (4.2%)

We are now at the 2x compared to the C++ version

Karan Misra

unread,
Sep 24, 2013, 7:46:03 AM9/24/13
to golan...@googlegroups.com
Honestly, have not tried... let me give it a shot though (but it will be slower than these numbers as it will be gccgo 1.1.2)

David Symonds

unread,
Sep 24, 2013, 8:01:48 AM9/24/13
to Karan Misra, golang-nuts
After these trivial changes, what does the CPU profile look like?

If you increase the image resolution to make it run for longer you'll
get a more detailed view of where the time is being spent.

Karan Misra

unread,
Sep 24, 2013, 8:22:57 AM9/24/13
to golan...@googlegroups.com, Karan Misra
Definitely going to give it a try now with a larger image (that is the reason I made it configurable :P)

Karan Misra

unread,
Sep 24, 2013, 8:32:59 AM9/24/13
to golan...@googlegroups.com, Karan Misra
Aha, hunch paid off:

Go1.2rc1
Before: 22.818s
After: 12.747s
Improvement: 10.1s (44.13%)
Origin version in OP: 28.883s
Improvement since OP: 16.136s (55.9%)
Original C version: 11.803s
Slowness: 0.944s
Slowness to C version: 7.9% (Go version is so close now)

I looked at the tracer function and unrolled the loops (can we call it that?) Need to do the same change in the C version though. Any volunteers?
Yet to try with gccgo

Karan Misra

unread,
Sep 24, 2013, 8:55:11 AM9/24/13
to golan...@googlegroups.com, Karan Misra
Hmm, this is weird... the .prof file generated is a mere 96 bytes long. Need to read up on the Cpu profiler to see why it is not sampling correctly. I ran this on a 2000x2000 image.


On Tuesday, September 24, 2013 5:31:48 PM UTC+5:30, David Symonds wrote:

andrey mirtchovski

unread,
Sep 24, 2013, 8:58:19 AM9/24/13
to Karan Misra, golang-nuts
you're on a mac, right?

http://research.swtch.com/macpprof

Karan Misra

unread,
Sep 24, 2013, 9:08:48 AM9/24/13
to golan...@googlegroups.com, Karan Misra
Yep. I was on the right direction then. Running gorays on a dedicated Linux machine atm

Dan Kortschak

unread,
Sep 24, 2013, 9:30:17 AM9/24/13
to Karan Misra, golan...@googlegroups.com, Robert Melton, Nigel Tao
Try replacing your calls to r.Float64() with this:

var seed = ^uint32(0)

func Rand() float64 {
seed += seed
seed ^= 1
if int32(seed) < 0 {
seed ^= 0x88888eef
}
return float64(seed%95)/float64(95)
}

Pilfered with modification from the regexp tests and proposed by Rob. It
knocks off another 5% for me.

Karan Misra

unread,
Sep 24, 2013, 10:00:01 AM9/24/13
to golan...@googlegroups.com, Karan Misra, Robert Melton
Yep, works. But didn't get the 5% improvement though. Will try on a Linux machine and report back

Go1.2rc1
Before: 12.747s
After: 12.644s
Improvement: 0.103s (0.8%)
Origin version in OP: 28.883s
Improvement since OP: 16.239s (56.22%)
Original C version: 11.803s
Slowness: 0.841s
Slowness to C++ version: 7.12% (Go version is so close now)

Karan Misra

unread,
Sep 24, 2013, 10:03:58 AM9/24/13
to golan...@googlegroups.com, Karan Misra, Robert Melton
0.3s improvement on Linux

Karan Misra

unread,
Sep 24, 2013, 10:17:03 AM9/24/13
to golan...@googlegroups.com, Karan Misra, Robert Melton
go1.2rc1 pprof:

Total: 10580 samples
    5420  51.2%  51.2%     5435  51.4% main.tracer
    2672  25.3%  76.5%    10563  99.8% main.main
     659   6.2%  82.7%     1347  12.7% math.Pow
     444   4.2%  86.9%      683   6.5% github.com/kid0m4n/gorays/vector.Vector.Normalize
     419   4.0%  90.9%     7287  68.9% main.sampler
     248   2.3%  93.2%      248   2.3% math.Sqrt
     200   1.9%  95.1%      308   2.9% math.ldexp
     165   1.6%  96.7%      165   1.6% math.modf
     142   1.3%  98.0%      174   1.6% math.normalize
     134   1.3%  99.3%      200   1.9% math.frexp
      32   0.3%  99.6%       32   0.3% math.Abs
      13   0.1%  99.7%       13   0.1% math.Ceil
       7   0.1%  99.8%        7   0.1% math.Frexp
       7   0.1%  99.8%        7   0.1% math.Modf
       6   0.1%  99.9%        6   0.1% runtime.newstack

Donovan Hide

unread,
Sep 24, 2013, 10:28:45 AM9/24/13
to Karan Misra, golang-nuts, Robert Melton
Hi,

try the --lines flag on pprof to get a bit more detail on main() bottlenecks. Also look at the linux perf tool:


Cheers,
Donovan.


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Karan Misra

unread,
Sep 24, 2013, 12:53:48 PM9/24/13
to golan...@googlegroups.com, Karan Misra, Robert Melton
Thanks for the tip :)

Karan Misra

unread,
Sep 24, 2013, 6:25:35 PM9/24/13
to golan...@googlegroups.com, Karan Misra, Robert Melton
Now that we are so close to C level performance, I tried to see how easy it would be to expand the solution to effectively use all cores on the host machine.

Needless to say, I was not disappointed.

Go1.2rc1
Before: 12.644s
After: 2.360
Improvement: 10.284s (81.33%)
Origin version in OP: 28.883s
Improvement since OP: 26.523s (91.83%)
Original C version: 11.803s
Improvement: 9.443s
Improvement to C++ version: 80% (Hurray!)

Must say, Go is proving to be very parallel in this particular usecase. Any comments on the parallel version?

Dan Kortschak

unread,
Sep 24, 2013, 6:57:20 PM9/24/13
to Karan Misra, golan...@googlegroups.com, Karan Misra, Robert Melton
Did you try chunking it into n chunks and starting a goroutine for each of the n chunks as needed? This would get rid of your per-row chan communication.

Nigel Tao

unread,
Sep 24, 2013, 7:00:59 PM9/24/13
to Karan Misra, golang-nuts, Robert Melton
On Wed, Sep 25, 2013 at 8:25 AM, Karan Misra <kid...@gmail.com> wrote:
> Now that we are so close to C level performance,

IIUC, your spheres spell out "Go" but the C code spells out "aek".
Does changing your "art" strings affect your running time?

Wes Freeman

unread,
Sep 24, 2013, 7:04:19 PM9/24/13
to Nigel Tao, Karan Misra, golang-nuts, Robert Melton
I wondered this too.

Kevin Gillette

unread,
Sep 24, 2013, 8:37:30 PM9/24/13
to golan...@googlegroups.com, Karan Misra, Robert Melton
I imagine it would have to. The description of the algorithm is such that if a ray does not encounter a sphere, it falls back to a very quick checkerboard color or sky gradient determination. The scene with the fewest spheres would therefore generate more quickly than the one with the most.

Karan Misra

unread,
Sep 24, 2013, 11:24:07 PM9/24/13
to golan...@googlegroups.com, Karan Misra, Robert Melton
:)


But this turned out to be 2 seconds faster on a i7 2600 for a 2000x2000 image. So merged it into master.

Karan Misra

unread,
Sep 24, 2013, 11:29:07 PM9/24/13
to golan...@googlegroups.com, Karan Misra, Robert Melton
The modified C/C++ I used to get the 11.803s runtime has been modified to generate the same exact pattern as the Go one: https://gist.github.com/kid0m4n/6680629
(while the original one used in source does generate aek)

it is no longer an apples to apples comparison.

roger peppe

unread,
Sep 25, 2013, 4:50:42 AM9/25/13
to Karan Misra, golang-nuts, Robert Melton
On 24 September 2013 23:25, Karan Misra <kid...@gmail.com> wrote:
> Must say, Go is proving to be very parallel in this particular usecase. Any
> comments on the parallel version?

I'd structure it slightly differently. There's no need to add to
the WaitGroup in every row.

Something like this, perhaps:
http://play.golang.org/p/3bq226CdPI

I see another ~6% speedup from that.

Karan Misra

unread,
Sep 25, 2013, 9:09:58 PM9/25/13
to golan...@googlegroups.com, Karan Misra, Robert Melton
Rog,

This definitely looks to be a better way of doing it: https://github.com/kid0m4n/gorays/commit/ddfe825f0902877c02467a4f65f46c4044bc7939

Thanks for the tip.

-k

Michael Jones

unread,
Sep 25, 2013, 10:17:43 PM9/25/13
to Karan Misra, golang-nuts, Robert Melton
Karan, making this change:

if st == missUpward {
//return vector.Vector{X: 0.7, Y: 0.6, Z: 1}.Scale(math.Pow(1-dir.Z, 4))
p := 1-dir.Z
p = p*p
p = p*p
return vector.Vector{X: 0.7, Y: 0.6, Z: 1}.Scale(p)
}

saves 5.4% on runtime. Generally, using a general tool (Pow, which works for all real exponents) when all you really want is x**4, is very expensive.

Michael
(in Kuala Lumpur)


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Michael T. Jones | Chief Technology Advocate  | m...@google.com |  +1 650-335-5765

Karan Misra

unread,
Sep 25, 2013, 10:28:59 PM9/25/13
to golan...@googlegroups.com, Karan Misra, Robert Melton
Thanks Michael. I love how these numbers are adding up :)


I am summarizing the various changes in a blog post :)

Michael Jones

unread,
Sep 25, 2013, 10:30:50 PM9/25/13
to Karan Misra, golang-nuts, Robert Melton
There is also something strange about width and height. They do not scale in such a way that the same image is rendered at higher resolution. Instead, an MUCH easier image is rendered because the bulk of the image is away from the reflective spheres.

See the two attached images. "Small" was rendered in the default 512x512 size. "Large" was at 2000x2000 then reduced in Photoshop to 512x512. Note the increase in the number of "single ray intersection" cases.
small.jpg
large.jpg

Karan Misra

unread,
Sep 25, 2013, 10:35:54 PM9/25/13
to golan...@googlegroups.com, Karan Misra, Robert Melton
Yep. I have noticed that before. I am working towards a fix which will also scale the objects along with the image size.

Karan Misra

unread,
Sep 25, 2013, 10:36:17 PM9/25/13
to golan...@googlegroups.com, Karan Misra, Robert Melton
Or a pull request :P :)

Michael Jones

unread,
Sep 26, 2013, 5:54:25 AM9/26/13
to Karan Misra, golang-nuts, Robert Melton
Further, you compute an expensive Pow for "p" that you may not need because of an intervening return. Moving it below the if/return saves 5.98% of overall runtime:

if st == missDownward {
h = h.Scale(0.2)
fc := vector.Vector{X: 3, Y: 3, Z: 3}
if int(math.Ceil(h.X)+math.Ceil(h.Y))&1 == 1 {
fc = vector.Vector{X: 3, Y: 1, Z: 1}
}
return fc.Scale(b*0.2 + 0.1)
}

//moving here after possible return saves 5.98% of execution time
        p := math.Pow(l.DotProduct(r.Scale(sf)), 99)
        return vector.Vector{X: p, Y: p, Z: p}.Add(sampler(h, r).Scale(0.5))

...but that's not all. That Pow() of x**99 is expensive. Doing it explicitly saves an additional 2.33% of execution time.

//computing x**99 explicitly saves a further 2.33% of execution time
//p := math.Pow(l.DotProduct(r.Scale(sf)), 99)
p := l.DotProduct(r.Scale(sf))
p33 := p * p // p**2
p33 = p33 * p33     // p**4
p33 = p33 * p33     // p**8
p33 = p33 * p33     // p**16
p33 = p33 * p33     // p**32
p33 = p33 * p       // p**33
p = p33 * p33 * p33 // p**99
return vector.Vector{X: p, Y: p, Z: p}.Add(sampler(h, r).Scale(0.5))

Michael

P.S. Note the way I calculate p**99 saves multiplies compared to p64*p32*p2*p. This is why I'd rather see exponentiation in a language. It allows the compiler to optimize the multiply chains.

kar...@thoughtworks.com

unread,
Oct 2, 2013, 6:23:14 AM10/2/13
to golan...@googlegroups.com, Karan Misra, Robert Melton
Michael,

Thanks for the tip. This and another optimization saved a further 12% execution time, making the single threaded Go version faster than the C++ version.

I have summarized the various steps and approached taken for everyone's benefit: https://kidoman.com/programming/go-getter.html

Hope it is helpful.

-k

Karan Misra

unread,
Oct 2, 2013, 6:35:52 AM10/2/13
to golan...@googlegroups.com, Karan Misra, Robert Melton
Michael,

Thanks to your tip and one further optimization, a further 12.01 % improvement was made which made the Go version faster than the C++ version in the single-threaded test.

I have summarized this and all other optimizations in this blog post: https://kidoman.com/programming/go-getter.html



Hope it is useful for everyone.

Jan Mercl

unread,
Oct 2, 2013, 6:39:28 AM10/2/13
to Karan Misra, golang-nuts, Robert Melton
On Wed, Oct 2, 2013 at 12:35 PM, Karan Misra <kid...@gmail.com> wrote:

Very god job!

-j

Karan Misra

unread,
Oct 2, 2013, 7:38:59 AM10/2/13
to golan...@googlegroups.com, Karan Misra, Robert Melton
Merci :)

jfcg...@gmail.com

unread,
Oct 2, 2013, 9:39:57 AM10/2/13
to golan...@googlegroups.com, Karan Misra
Hi Karan,

Tiny patch for making it flexible for any size text array to render:

--- main.go Wed Oct  2 15:49:36 2013
+++ main.go Wed Oct  2 15:59:44 2013
@@ -31,11 +31,13 @@
 }
 
 func makeObjects() []object {
- objects := make([]object, 0, len(art)*len(art[0]))
- for k := 18; k >= 0; k-- {
- for j := 8; j >= 0; j-- {
- if string(art[j][18-k]) != " " {
- objects = append(objects, object{k: k, j: 8 - j})
+ nr := len(art)
+ nc := len(art[0])
+ objects := make([]object, 0, nr*nc)
+ for k := nc - 1; k >= 0; k-- {
+ for j := nr - 1; j >= 0; j-- {
+ if art[j][nc-1-k] != ' ' {
+ objects = append(objects, object{k: k, j: nr - 1 - j})
  }
  }
  }

Karan Misra

unread,
Oct 2, 2013, 2:13:37 PM10/2/13
to golan...@googlegroups.com, Karan Misra, jfcg...@gmail.com
Cool :) Patched in...

Karan Misra

unread,
Oct 2, 2013, 8:42:25 PM10/2/13
to golan...@googlegroups.com, Karan Misra, jfcg...@gmail.com
Guys,

So I took the plunge and ported over the "optimizations" to C++, and reran the benchmarks:



At this point, they are essentially running the same algorithms as well. When rendering a 2048x2048 image, C++ wins by a margin of around ~22 seconds
Is it worth looking at why there is such a gap? (knowns / unknowns) Worth investing time optimizing Go to bridge the gap / lower it?

I know this is not Go's target market (I guess), but why shouldn't it be the fastest m*f* language out there?

-k
(go fanatic)

andrey mirtchovski

unread,
Oct 2, 2013, 8:56:16 PM10/2/13
to Karan Misra, golang-nuts, jfcg...@gmail.com
In my opinion, there's no point in optimizing unless you have a
particular goal in mind. this is not a game you can win on today's
internet. you saw what happened in the reddit thread: negativity and
outright dismissal without even considering that the point was not to
compare two languages, but to learn how to optimize one of them
(that's why the title was worded in a way that avoided the direct
comparison with C++, yet most people still latched only on that).

> I know this is not Go's target market (I guess), but why shouldn't it be the
> fastest m*f* language out there?

to many (most?) of its users, Go is already "the most" programming
language in many categories. simplicity, clarity, and generality are
the key to Go, and I'd be reluctant to give them up for a minute
improvement in a contrived microbenchmark. besides, I think your
program is impressively fast as it is.

Michael Jones

unread,
Oct 2, 2013, 9:05:33 PM10/2/13
to andrey mirtchovski, Karan Misra, golang-nuts, jfcg...@gmail.com
C compilers are very good. At some point you are simply benchmarking compiler (as opposed to language) maturity. One of the first replies to the original post was "Have you tried this with GCCGO?" You might want to do that...


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ugorji Nwoke

unread,
Oct 2, 2013, 9:12:41 PM10/2/13
to golan...@googlegroups.com, Karan Misra, jfcg...@gmail.com


On Wednesday, October 2, 2013 8:42:25 PM UTC-4, Karan Misra wrote:
Guys,

So I took the plunge and ported over the "optimizations" to C++, and reran the benchmarks:



Your graph on the page above seems mislabeled. It looks like Go is fastest.  

Karan Misra

unread,
Oct 2, 2013, 9:40:43 PM10/2/13
to Ugorji Nwoke, golan...@googlegroups.com, jfcg...@gmail.com
I checked it again. It looks right. 

Go is faster than the unoptimized C++ version and when utilizing > 1 core. 

Sent from my iPhone

Karan Misra

unread,
Oct 2, 2013, 9:43:26 PM10/2/13
to Michael Jones, andrey mirtchovski, golang-nuts, jfcg...@gmail.com
True. But then, these tests can also act as a litmus test for the maturity level of the Go compiler. 

For example, I was looking at what methods which were being inlined in the vector package. Definitely saw some weird signals there. 

Also, had totally forgotten about gccgo. Will definitely give it a spin. 

Sent from my iPhone

Ugorji Nwoke

unread,
Oct 2, 2013, 9:57:34 PM10/2/13
to golan...@googlegroups.com, Ugorji Nwoke, jfcg...@gmail.com
Your graph has Go at 21.51, C++ at 58.15, and C++ optimized at 90.46. I'm saying this based on the color coding legend at bottom of the graph. From those numbers, it seems Go is faster than both.

Also, can you explain what each of those bars are? Are the numbers all using one core? What does C++ optimized mean - is it C++ compiled with -O3, or is it the new C++ code written to have logic parity with the latest version of the Go code?

Gustavo Niemeyer

unread,
Oct 3, 2013, 1:05:17 PM10/3/13
to Michael Jones, andrey mirtchovski, Karan Misra, golang-nuts, jfcg...@gmail.com
Not to mention the original code was optimized to be printed on a
business card. A ray tracer on a business card. Hard to beat.
gustavo @ http://niemeyer.net

Kevin Gillette

unread,
Oct 3, 2013, 10:31:47 PM10/3/13
to golan...@googlegroups.com, Michael Jones, andrey mirtchovski, Karan Misra, jfcg...@gmail.com
Agreed. I've no doubt that the original could have been much more highly optimized, but for example, manual loop unrolling would be contrary to the goal of getting the most functional value per unit of code volume; as soon as we expanded the code beyond (un-gofmt'd, minified) business-card size, the comparison to the original is moot.

How efficient and functional can we get a Go raytracer, at an equivalent-to-original text display size, when restricted to the size of a business card?

Karan Misra

unread,
Oct 4, 2013, 11:10:12 PM10/4/13
to golan...@googlegroups.com, Michael Jones, andrey mirtchovski, Karan Misra, jfcg...@gmail.com
That would be a very interesting question to answer... but I guess the property which makes go such a nice language to read/understand will stand against it as well.

I have also rerun the benchmarks again with C++ multi-threading and further optimizations: https://kidoman.com/programming/go-getter-part-3.html

Having a hard time getting a decent run out of gccgo. It is slower back a factor of 5 (gcc 4.8.1) when compared to go 1.2rc1 for the latest version of gorays

Michael Jones

unread,
Oct 5, 2013, 12:24:33 AM10/5/13
to Karan Misra, jfcg...@gmail.com, andrey mirtchovski, golan...@googlegroups.com

Good work, Karan. You've learned much about fitting a tiny program to Go's metaphors. I am concerned about the gccgo performance. It should be 10% better at least. May be a MP structure problem. Would not be surprised since your scaling us sublinear for such an easily parallelizable task.

Karan Misra

unread,
Oct 10, 2013, 2:04:03 AM10/10/13
to golan...@googlegroups.com, Karan Misra, jfcg...@gmail.com, andrey mirtchovski
Let me try to see the scaling using real cores. I had run my tests on a 2600, which has 4 real cores and 4 HT cores. That would definitely the scalability.

Michael Jones

unread,
Oct 10, 2013, 2:15:17 AM10/10/13
to Karan Misra, golang-nuts, Serhat Şevki Dinçer, andrey mirtchovski
Try it with 2x the hyperthreaded total:

4 cpus + 4 HT cpus, try GOMAXPROCS=16

That tends to be optimum for me.

Karan Misra

unread,
Oct 10, 2013, 2:31:34 AM10/10/13
to Michael Jones, golang-nuts, Serhat Şevki Dinçer, andrey mirtchovski
For such a CPU intensive activity (the entire CPU maxes out currently with GOMAXPROCS=9) would increasing GOMAXPROCS to 16 help?

Regards,
Karan 'kid0m4n' Misra

CEO, Erodov Media Pvt Ltd
Phone: +91 809 555 0069
Website: www.erodov.com

Michael Jones

unread,
Oct 10, 2013, 9:34:59 AM10/10/13
to Karan Misra, golang-nuts, Serhat Şevki Dinçer, andrey mirtchovski
It should not. I knew that would be a contentious email. But it seems to help my programs...

Job van der Zwan

unread,
Oct 12, 2013, 5:17:44 AM10/12/13
to golan...@googlegroups.com, Karan Misra, Serhat Şevki Dinçer, andrey mirtchovski
Any ideas why? Load balancing?

Michael Jones

unread,
Oct 12, 2013, 9:53:54 AM10/12/13
to Job van der Zwan, golang-nuts, Karan Misra, Serhat Şevki Dinçer, andrey mirtchovski
No. Feel free to disregard my anecdotal evidence. Maybe it is just something about my Mac's (OS X's) processor management properties.

Job van der Zwan

unread,
Oct 12, 2013, 11:17:27 AM10/12/13
to golan...@googlegroups.com, Job van der Zwan, Karan Misra, Serhat Şevki Dinçer, andrey mirtchovski
Well, if the difference pops up consistently, wouldn't finding the minimal algorithm that reproduces the quirk be useful feedback for the people working on the scheduler, for example?

atomly

unread,
Oct 14, 2013, 12:20:23 PM10/14/13
to Michael Jones, Karan Misra, golang-nuts, Serhat Şevki Dinçer, andrey mirtchovski
On Thu, Oct 10, 2013 at 9:34 AM, Michael Jones <m...@google.com> wrote:
I knew that would be a contentious email. 

Pardon the pun.  :P 

:: atomly ::

[ ato...@atomly.com : www.atomly.com  : http://blog.atomly.com/ ...
[ atomiq records : new york city : +1.347.692.8661 ...
[ e-mail atomly-new...@atomly.com for atomly info and updates ...
Reply all
Reply to author
Forward
0 new messages