use rand.New (the generator created this way is not
mutexed) and use crypto/rand.Int to seed (using rand to seed itself
is probably not going to give you much more randomness!)
I changed Kyle's version to add your "and also split the task 64 times more than expected" change:pi_eval := random_thread(*nb_process * 64, *nb_iter)The results are now nice and linear, with the CPU usage looking nice and steady. Immediately jumping to N CPU's and running steady untill the end.
babar:big mtj$ ./pi -n 1000000000 -g 1
Pi Estimation: 3.141580Elapsed Time: 42.173500 seconds
babar:big mtj$ ./pi -n 1000000000 -g 2
Pi Estimation: 3.141636Elapsed Time: 21.433001 seconds
babar:big mtj$ ./pi -n 1000000000 -g 3
Pi Estimation: 3.141605Elapsed Time: 14.288153 seconds
babar:big mtj$ ./pi -n 1000000000 -g 4
Pi Estimation: 3.141662Elapsed Time: 10.962000 secondsbabar:big mtj$ ./pi -n 1000000000 -g 5Pi Estimation: 3.141560Elapsed Time: 8.907828 secondsbabar:big mtj$ ./pi -n 1000000000 -g 6Pi Estimation: 3.141544Elapsed Time: 7.273181 secondsbabar:big mtj$ ./pi -n 1000000000 -g 7Pi Estimation: 3.141570Elapsed Time: 6.314425 secondsbabar:big mtj$ ./pi -n 1000000000 -g 8Pi Estimation: 3.141534Elapsed Time: 5.559293 secondsbabar:big mtj$ ./pi -n 1000000000 -g 9Pi Estimation: 3.141607Elapsed Time: 5.541590 seconds
babar:big mtj$ ./pi -n 1000000000 -g 10
Pi Estimation: 3.141619Elapsed Time: 5.508037 seconds
babar:big mtj$ ./pi -n 1000000000 -g 11Pi Estimation: 3.141611
Elapsed Time: 5.497377 secondsbabar:big mtj$ ./pi -n 1000000000 -g 12Pi Estimation: 3.141615Elapsed Time: 5.486157 secondsbabar:big mtj$ ./pi -n 1000000000 -g 13Pi Estimation: 3.141604Elapsed Time: 5.493715 secondsbabar:big mtj$ ./pi -n 1000000000 -g 14Pi Estimation: 3.141674Elapsed Time: 5.464580 secondsbabar:big mtj$ ./pi -n 1000000000 -g 15Pi Estimation: 3.141544Elapsed Time: 5.460681 seconds
babar:big mtj$ ./pi -n 1000000000 -g 16
Pi Estimation: 3.141614Elapsed Time: 5.508967 secondsThank you for the advice. This seems an area for development of the runtime code.
Indeed, but what you say suggests that we misunderstand each other.I have 8 CPUs. I ran the program with num goroutines ranging from 1 to N and explicitly set the runtime.GOMAXPROCS to that number each time.Physical observation showed (as I described carefully case by case):1 on 12 on 23 on 34 on 25 on 1 then 56 on 1 then 67 on 1 then 78 on 89 on 9 shared on 810 in 10 shared on 8:As I said then and again now, it was the cases in bold, such as 4 max procs, 4 goroutines, and 100% cpu on two cpus for the whole execution that puzzled me. It still does. Why is 4 threads on 4 CPUs not granular enough?
Seems a bug. A round-robin scheduler would not have this problem. A greedy scheduler would not have this problem. A "just launch them all" null scheduler would not have this problem. So, I conjecture that it is in fact a problem.
Hi! Works fine, same with gosched() frequency reduced greatly (1e6)Does not work in the other extreme case of just one call after each Goroutine dispatch.I am concerned that this demonstrates a problem.
if i % 1e4 == 0 {runtime.Gosched()
}
> The first goroutine is started calculations. Then the second goroutine
> allocates the rand object, at this point the allocation procedure requests
> garbage collection. Third and fourth goroutines are successfully parked (or
> not yet started), however the first goroutine can't stop (it never checks
> for pending GC during calculations), so just continue its calculations till
> completion. Now the first goroutine finishes calculations and GC starts
> (however at this point all goroutines might already have finished). 3
> remaining goroutines start calculation after GC. And so total execution
> time is 2x of what it might be. Moreover, the problematic sequence may
> repeat 2 times more, then execution time is 4x - gorotuines effectively run
> sequentially.
> Such situation is unlikely to happen in a server like application, however
> it quite can happen in parallel computation programs.
Thanks for the analysis. This is definitely a known problem with the
current scheduler: a goroutine which executes a long running
calculation, one which makes no system calls, is not preempted. With
the current scheduler, you will get better parallelism in a long running
calculation if you occasionally call runtime.Gosched. But that is
really a bug which needs to be fixed.
Ian
> On 2 Nov., 17:25, Ian Lance Taylor <i...@google.com> wrote:
>
>
>> With the current scheduler, you will get better parallelism in a long running
>> calculation if you occasionally call runtime.Gosched. But that is
>> really a bug which needs to be fixed.
>
> I thought about this some more, and for me one question remains: is
> this an easy Bug to fix or does it require additional work, making it
> more of a problem in the long term for folks doing benchmarks and
> measuring performance? If the latter is the case then I think the
> issue should be documented on a FAQ somewhere along with the
> workaround for it.
It is not a simple bug to fix.
http://golang.org/doc/go_faq.html#Why_GOMAXPROCS
Ian