[runtime] Scheduling overhead incurred by the runtime

299 views
Skip to first unread message

Deepak Sirone

unread,
Jun 23, 2020, 12:34:02 PM6/23/20
to golang-nuts
I have a benchmark (process A) which launches another process (process B) and the two send messages back and forth through a couple of sockets, one for sending and the other for receiving. Both the processes have two go-routines for handling the sending and the receiving side.

Process A sends a message to B using the "encoding/gob" package to write to the socket which is subsequently decoded. After sending a message, A waits for a response from B. B does some processing on the message and sends a reply back.

The round trip time of getting a response time seemed to be highly variable. I found that the runtime is taking up the bulk of the running time. Also after receiving a message from B, the waiting go-routine takes a long time to wake up, even taking 0.5x to 1x time as the actual round trip time.

Initially the round trip time is around 200 microseconds and it drops to 20 microseconds as more messages are sent. The runtime seems to "learn" the best way to minimize the running time eventually.

How can I reduce the initial overhead incurred by the go runtime? Is there a way to enforce or tweak the scheduling given the above scenario?

Benchmark info from pprof:

      flat  flat%   sum%        cum   cum%
   93.87ms 33.56% 33.56%    98.80ms 35.33%  syscall.Syscall
   79.91ms 28.57% 62.14%    79.91ms 28.57%  runtime.futex
    7.15ms  2.56% 64.69%    54.69ms 19.55%  runtime.findrunnable
    6.30ms  2.25% 66.94%     6.62ms  2.37%  runtime.runqgrab
    3.22ms  1.15% 68.10%     3.22ms  1.15%  runtime.(*randomEnum).next
    2.68ms  0.96% 69.05%     3.96ms  1.42%  runtime.deferreturn
    2.32ms  0.83% 69.88%     2.38ms  0.85%  runtime.lock
    2.30ms  0.82% 70.71%     2.30ms  0.82%  runtime.memmove
    2.24ms   0.8% 71.51%     5.77ms  2.06%  runtime.mallocgc
    2.21ms  0.79% 72.30%     2.22ms  0.79%  runtime.unlock
    2.02ms  0.72% 73.02%     2.25ms   0.8%  time.now
    2.01ms  0.72% 73.74%     8.63ms  3.09%  runtime.runqsteal
    1.78ms  0.64% 74.37%     1.80ms  0.64%  runtime.casgstatus
    1.63ms  0.58% 74.96%     3.38ms  1.21%  runtime.exitsyscall
    1.60ms  0.57% 75.53%    33.05ms 11.82%  runtime.stopm
    1.56ms  0.56% 76.09%     3.16ms  1.13%  runtime.selectgo
    1.53ms  0.55% 76.63%    29.86ms 10.68%  runtime.notesleep
    1.38ms  0.49% 77.13%     1.41ms   0.5%  runtime.newdefer
    1.30ms  0.46% 77.59%     7.74ms  2.77%  fmt.(*pp).doPrintf
    1.17ms  0.42% 78.01%     1.95ms   0.7%  runtime.mapaccess2
    1.03ms  0.37% 78.38%    75.87ms 27.13%  runtime.schedule
       1ms  0.36% 78.74%    53.46ms 19.11%  runtime.notewakeup
    0.83ms   0.3% 79.03%     2.34ms  0.84%  runtime.deferproc
    0.78ms  0.28% 79.31%     1.64ms  0.59%  runtime.chanrecv
    0.73ms  0.26% 79.57%     4.42ms  1.58%  encoding/gob.(*Encoder).encodeStruct

Thanks,
Deepak Sirone

Robert Engels

unread,
Jun 23, 2020, 3:20:12 PM6/23/20
to Deepak Sirone, golang-nuts
You need to keep the routines “hot” - use polling design...

On Jun 23, 2020, at 11:33 AM, Deepak Sirone <deepaks...@gmail.com> wrote:


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/b6c8a38d-5351-4abd-9755-b40bfafd256co%40googlegroups.com.

Deepak Sirone

unread,
Jun 25, 2020, 12:33:56 AM6/25/20
to golang-nuts
Does that mean that I should have the sockets in non-blocking more and set read deadlines on them? And also use the default case statement in select statements?
To unsubscribe from this group and stop receiving emails from it, send an email to golan...@googlegroups.com.

Robert Engels

unread,
Jun 25, 2020, 1:11:37 AM6/25/20
to Deepak Sirone, golang-nuts

On Jun 24, 2020, at 11:34 PM, Deepak Sirone <deepaks...@gmail.com> wrote:


To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/854ca65b-f3c6-4881-9001-487c61dcb256o%40googlegroups.com.

Robert Engels

unread,
Jun 25, 2020, 1:17:30 AM6/25/20
to Deepak Sirone, golang-nuts
But it is probably easier to set an immediate deadline (0) to effect a poll and I think you’ll get the same behavior. Haven’t tried it. 

Downside you are going to burn a cpu at 100% per socket but you will go fast. I haven’t looked into the intervals in a while but I think the deadline of 0 will prevent the scheduler from getting involved. 


On Jun 25, 2020, at 12:10 AM, Robert Engels <ren...@ix.netcom.com> wrote:



Jesper Louis Andersen

unread,
Jun 25, 2020, 4:32:01 AM6/25/20
to Deepak Sirone, golang-nuts
On Tue, Jun 23, 2020 at 6:33 PM Deepak Sirone <deepaks...@gmail.com> wrote:

Process A sends a message to B using the "encoding/gob" package to write to the socket which is subsequently decoded. After sending a message, A waits for a response from B. B does some processing on the message and sends a reply back.


Things I would consider:

* How large are the messages? If they are small, you are measuring a switching overhead. Consider batching work into larger groups so the switching overhead amortizes over multiple messages.
* Usually, encoding is much faster than decoding. In a typical environment, the encoder already has type information, so it can just lay out data and stream it. The decoder has to look at bytes and switch to re-raise the type information.
* Consider pipelining. If you run stop-and-go, you will be limited by the switching time of the operating system. Feed the pipe from the sender, handle messages on the receiver side. If messages vary by work, consider out-of-order processing with a tagging scheme (see e.g., the 9p protocol for a simple way into this).
* Look into micro-batching tricks. They tend to perform well when there is processing overhead.
* Addendum: If you stop-and-go, you are cooperating between the A/B process pair. This is bound to give you skewed benchmarks because the speed of one side severely affects the behavior of the other side. 
* If the two processes lives on the same machine, why even keep them separate? A goroutine with a channel between them is probably better use of the hardware. If the two processes ought to live on separate hardware, the above notion of a (bandwidth*)delay becomes far more important since networks are measured in milliseconds in most cases.

In general: I prefer changing the strategy over tuning for speed. It is bound to yield much better results in the long run in my experience.
Reply all
Reply to author
Forward
0 new messages