Inquiring about compiler optimization flag in Go

7,685 views
Skip to first unread message

ProGrammer

unread,
Feb 3, 2013, 4:29:50 PM2/3/13
to golan...@googlegroups.com

Hello

From http://golang.org/cmd/gc/ and http://dave.cheney.net/category/golang, I have only been able to find one flag for optimization which is -B.

I wanted to inquire if there are other optimization flags in Go. I would be grateful for an answer.

Thanks.

Rémy Oudompheng

unread,
Feb 3, 2013, 4:33:33 PM2/3/13
to ProGrammer, golan...@googlegroups.com
Hello,

The -B flag is not an optimization flag, it makes the compiler break
the language specification. There are two optimization flags I know
of:
-N disable optimizations
-l disable inlining
multiple -l: make inlining more aggressive (may break runtime.Callers)

Rémy.

Aram Hăvărneanu

unread,
Feb 3, 2013, 4:37:39 PM2/3/13
to Rémy Oudompheng, ProGrammer, golan...@googlegroups.com
> The -B flag is not an optimization flag, it makes the compiler break
> the language specification. There are two optimization flags I know
> of:
> -N disable optimizations
> -l disable inlining
> multiple -l: make inlining more aggressive (may break runtime.Callers)

I'd argue these are not optimization flags either; rather they are
flags useful when debugging the compiler.

--
Aram Hăvărneanu

ProGrammer

unread,
Feb 3, 2013, 4:46:56 PM2/3/13
to golan...@googlegroups.com, Rémy Oudompheng, ProGrammer


On Sunday, February 3, 2013 3:37:39 PM UTC-6, Aram Hăvărneanu wrote:
> The -B flag is not an optimization flag, it makes the compiler break
> the language specification. There are two optimization flags I know
> of:
>  -N disable optimizations
>  -l disable inlining
>  multiple -l: make inlining more aggressive (may break runtime.Callers)

So -I and -N are most likely to only reduce optimization. So, is there is no way to improve performance. 
 

Rémy Oudompheng

unread,
Feb 3, 2013, 4:48:16 PM2/3/13
to ProGrammer, golan...@googlegroups.com
On 2013/2/3 ProGrammer <sparsh...@gmail.com> wrote:
>
>
This is because if there was a way to improve performance correctly,
it would be enabled.

Rémy.

ProGrammer

unread,
Feb 3, 2013, 4:50:21 PM2/3/13
to golan...@googlegroups.com, ProGrammer

Thanks a lot for your reply.

Dave Cheney

unread,
Feb 3, 2013, 4:50:05 PM2/3/13
to ProGrammer, golan...@googlegroups.com, Rémy Oudompheng, ProGrammer
Would you please describe your problems (my program my be slower than I would like) not your proposed solution (turn on magic go faster switches)

As an observation the Go compilers produce the best code that can be done safely by default. 

Cheers

Dave
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Sparsh Mittal

unread,
Feb 3, 2013, 5:18:32 PM2/3/13
to Dave Cheney, golan...@googlegroups.com, Rémy Oudompheng


On Sun, Feb 3, 2013 at 3:50 PM, Dave Cheney <da...@cheney.net> wrote:
Would you please describe your problems (my program my be slower than I would like) not your proposed solution (turn on magic go faster switches)

As an observation the Go compilers produce the best code that can be done safely by default. 

Cheers

I am doing concurrent programming. In each iteration, I issue say P goroutines and then at end of iteration, I synchronize them using WaitGroup.
So it scales well till 2 goroutine, but because of this synchronization bottleneck, it does not scale well to 4 or 8. So, I was wondering, if I could do something.

Dustin Sallings

unread,
Feb 3, 2013, 5:33:25 PM2/3/13
to golan...@googlegroups.com
Sparsh Mittal <sparsh...@gmail.com>
writes:

> I am doing concurrent programming. In each iteration, I issue say P
> goroutines and then at end of iteration, I synchronize them using
> WaitGroup.
> So it scales well till 2 goroutine, but because of this
> synchronization bottleneck, it does not scale well to 4 or 8. So, I
> was wondering, if I could do something.

The number of things these goroutines could have their own bottlenecks
is at least as large as the total number of programs ever written, yet
you didn't describe this most important part, or even some profiler
output.

I'm guessing you're doing something that doesn't scale, but I'm only
guessing that because it's not scaling. I can't guess *what* you're
doing that doesn't scale without seeing the code.

--
dustin

Ian Lance Taylor

unread,
Feb 3, 2013, 11:57:01 PM2/3/13
to ProGrammer, golan...@googlegroups.com
You can use gccgo. It generates faster code in some cases.

Ian

John Nagle

unread,
Feb 4, 2013, 3:15:33 AM2/4/13
to golan...@googlegroups.com
How many CPUs do you have available? How many Go threads
(not goroutines) do you have running?

Do all the parallel tasks run about the same length of time?
Or are most done while the slow one bottlenecks the group?

Highly parallel number-crunching is non-trivial. You
need to think about things like cache space and the
order through which you progress through arrays. Going
through large 2D arrays the wrong way on multiple CPUs can
be slower than using only one CPU.

John Nagle



Philipp Schumann

unread,
Feb 4, 2013, 6:15:16 AM2/4/13
to golan...@googlegroups.com, Dave Cheney, Rémy Oudompheng
In addition to the other replies -- you may be aware of this, but for me it was an eye-opener... runtime.gomaxprocs defaults to 1 apparently, meaning for your goroutines to actually fully utilize multiple cores you'd need to set it to a higher number (such as runtime.numcpu).

Sparsh Mittal

unread,
Feb 4, 2013, 9:44:07 AM2/4/13
to Philipp Schumann, golan...@googlegroups.com, Dave Cheney, Rémy Oudompheng


On Mon, Feb 4, 2013 at 5:15 AM, Philipp Schumann <philipp....@gmail.com> wrote:
In addition to the other replies -- you may be aware of this, but for me it was an eye-opener... runtime.gomaxprocs defaults to 1 apparently, meaning for your goroutines to actually fully utilize multiple cores you'd need to set it to a higher number (such as runtime.numcpu).

Thanks. Yes, I do set it to  runtime.GOMAXPROCS(numberOfThreads), where numberOfThreads is the number of threads I want to parallelize it on.

When numberOfThreads  is 2, it is fine; but when it is 4, only 250% CPU is utilized (may be due to serialization bottleneck?) and similar for numberOfThreads >4.

So, should I set to even larger? My runtime.numcpu is 32, but I want to parallelize it on only 2,4,8 and 16. Please let me know.


Dmitry Vyukov

unread,
Feb 4, 2013, 9:47:42 AM2/4/13
to Sparsh Mittal, Philipp Schumann, golang-nuts, Dave Cheney, Rémy Oudompheng
Try:
go test -blockprofile=/tmp/block.prof

It will write goroutine blocking profile, it can show why goroutines
are not running in parallel.

Sparsh Mittal

unread,
Feb 4, 2013, 9:47:32 AM2/4/13
to John Nagle, golan...@googlegroups.com

    How many CPUs do you have available?  How many Go threads
(not goroutines) do you have running?

Thanks. I have total 24 CPUs available, but I want to parallelize the problem only on 2,4,8 and 16.
For example for 4 goroutines, I see CPU utilization as only 250%, although is >180%  for 2.

    Do all the parallel tasks run about the same length of time?
Or are most done while the slow one bottlenecks the group?

Theoretically, they do exactly same amount of work.
    Highly parallel number-crunching is non-trivial.  You
need to think about things like cache space and the
order through which you progress through arrays. Going
through large 2D arrays the wrong way on multiple CPUs can
be slower than using only one CPU.

          
Based on my knowledge, I have tried to structure for loop to minimize cache misses. 


Nate Finch

unread,
Feb 4, 2013, 10:33:04 AM2/4/13
to golan...@googlegroups.com
If you could post some example code that shows the problem, we might be able to help.  But it's pretty hard to help when we don't know what the code is doing.

This sounds like a problem with the algorithm, not with Go's compiler.  Probably there are some resources that multiple goroutines are waiting for, so that further goroutines aren't helping. 

Dave Cheney

unread,
Feb 4, 2013, 3:26:23 PM2/4/13
to Nate Finch, golan...@googlegroups.com
Yes, code please. 

For example you may be using resources which themselves are internally serialised (resources not mentioned to avoid speculation). 

Dave

matt

unread,
Feb 5, 2013, 4:56:46 AM2/5/13
to golan...@googlegroups.com, John Nagle
Definitely require code. How much work is being done in each go routine before each WorkGroup barrier? Have you tried profiling your code to make sure you are not implicitly synchronizing due to a call to some other thread-safe resource (e.g a call to rand).

Compiler optimizations are very likely not going to help you much. If you're program isn't scaling with 6g, I find it hard to believe that gccgo will make it "scale", though it may make it faster.

// Matt

hs

unread,
Feb 5, 2013, 7:23:12 AM2/5/13
to golang-nuts


On Feb 4, 9:47 am, Dmitry Vyukov <dvyu...@google.com> wrote:
> On Mon, Feb 4, 2013 at 6:44 PM, Sparsh Mittal <sparsh0mit...@gmail.com> wrote:
>
> Try:
> go test -blockprofile=/tmp/block.prof
>
> It will write goroutine blocking profile, it can show why goroutines
> are not running in parallel.

This test flag is not in go1.0.3, so one needs to use ht tip.

Can someone give me a sample output to give me an idea about it?

Dmitry Vyukov

unread,
Feb 5, 2013, 7:34:33 AM2/5/13
to hs, golang-nuts
Reply all
Reply to author
Forward
0 new messages