cmd/trace: Indirect reporting of OS Latency?

77 views
Skip to first unread message

Rich Moyse

unread,
Feb 11, 2022, 11:05:16 PM2/11/22
to golang-nuts
TLDR

I’d be grateful if someone could either confirm or dismiss the effect of OS, go runtime, and/or Hypervisor latency on the output of a goroutine trace.  The trace report displays empty gaps of 3-14ms where expected periodic goroutines should execute, as there’s a goroutine triggered by each 1ms timer tick.  Attempts to select items within these gaps return only five “counter” items - nothing else.  Those counter items suggest that nothing is running.  Perhaps external preemptive behavior interrupts the running go process, thereby suspending the go runtime, causing these gaps?  If so, can these trace gaps be attributed to OS and/or Hypervisor scheduler preemption?

Although a recent (2/4/2022) go-nuts summary mentions a post to the thread: Issues when using time.Ticker microsecond duration, whose other posts discuss a range of causes, including OS induced latency, this thread attempts to understand how external latency sources affect the trace report.

In addition to the problem description below, the go code and trace screen images are available via this gist while the trace file is available here.  The go program was compiled first using 1.18beta1 and later with 1.17.  It’s running in a 16.04 Ubuntu Desktop VM that’s managed by Hyper-V on a physical machine running Windows 10 Pro.  The go program limits itself to the interplay between two goroutines: a single sending goroutine and a single receiving one.

These goroutines share an unbuffered channel.  The sender is only throttled by the capacity of the receiver to consume its messages while the receiver is gated by a timer that ticks in 1ms intervals.  The sending code executes in ~600ns then blocks waiting for the receiver.  The receiver executes in ~8us, starting after the ticker expires until it's blocked waiting for the timer’s next tick.  Due to their speedy execution, the go runtime scheduler never intervenes to preemptively interrupt either goroutine.  Moreover, the garbage collector doesn’t run because only this same pair of goroutines are active and the message communicated between them is an empty structure.

In general the regular rhythm of the receiver’s timer appears in the “Goroutine analysis” view of the trace report as evenly spaced markings at nearly one 1ms intervals.  However, these ruler-like markings are sporadically absent resulting in gaps between 3-14ms.  During these gaps neither the receiving goroutine nor its coupled sender execute.  When selecting the rectangular gap area only “5 items'' are reported as selected. These 5 items displayed by the “Counter Samples (5)” tab indicate that no goroutines are “Running”, nor are there “Runnable” ones.  Here are all the counters reported by the selected gap appearing at time: ~502ms into the trace report:  Counter          Series             Time                                   Value

Goroutines GCWaiting 502.32961800000004 0

Goroutines Runnable 502.32961800000004 0

Goroutines Running 502.32961800000004 0

Threads InSyscall 501.252301 0

Threads Running 501.252301 1

Since the program was running trace, to collect the data needed to produce its report, perhaps the trace itself induced a measure of go runtime latency?  To eliminate this potential latency source, the trace package and its associated http server goroutine were eliminated.  Instead, latency events were directly detected and reported by the receiving goroutine.  This goroutine computed the latency between timer ticks by remembering the prior timer value and subtracting it from the newly provided one.  It outputted, via fmt.Print, a message whenever the computed latency exceeded 5ms.  It’s source, main_latencycompute.go is provided by the same gist.

Before running the newly altered program, I half thought removing the overhead of collecting trace data would eliminate the latency gaps depicted in its reporting.  However, after a bit of waiting, the altered program began issuing sporadic latency messages.  Furthermore, the activities of opening a new terminal window or moving existing windows with a mouse, caused the altered program to seemingly generate latency messages with greater frequency.  On noticing this behavior it seemed likely to me that the empty gaps depicted by the trace report represent latency introduced by the OS scheduler, as it preemptively stops and then restarts the go program.

After experiencing these latency results with main_latencycompute.go (1.18beta1) described above, it was recompiled using the go 1.17 compiler version.  The 1.17 version also generated latency messages.  A comparison between the latency intervals was performed to determine the similarity of their values.  The comparison process consisted of the following steps:

  • The concurrent execution of the 1.18beta1 and 1.17 versions of main_latencycompute.go for 100 minutes on the same Desktop VM mentioned above.  The Desktop system was left alone - no end user interaction while the test ran.

  • A second concurrent execution of the 1.18beta1 and 1.17 versions of main_latencycompute.go for 100 minutes under the same conditions.  

  • The latency message values were aggregated for each version over the combined 200 minute run time.

  • The percentage difference between the versions was computed using this calculator and these values:  1.18beta1: 6707233630ns, 1.17: 6876776367ns.

The percentage difference was ~2.5% while the difference in total latency between them was ~170ms (over 200min).  These figures suggest, at least to me, that there’s no significant difference in the detected latency between the two go compiler versions.  Therefore, the latency gaps visible by the trace tool report aren’t specific to 1.18beta1.

As an additional observation, there’s a discrepancy between the Counter Values reported by the “View trace” (“/trace”) web page and the one depicted above rendered by the “Goroutine analysis” trace for a specific goroutine “/trace?goid=8”.  While the specific goroutine page above displays a “Threads Running Value” of “1” between timeline values: 502.400-512.200ms, the corresponding “Threads Running Value” reported by “/trace” is 0.  Since the raw data produce by tracing aligns with the “View trace” report and some filtering/summation of this data is performed to produce the report for a specific goroutine, it’s likely that the specific goroutine report might incorrectly display a running thread when in reality there isn’t one?

Counter Series Time Value

Goroutines GCWaiting 502.32961800000004 0

Goroutines Runnable 502.32961800000004 0

Goroutines Running 502.32961800000004 0

Threads InSyscall 502.33236 0

Threads Running 502.33236 0

If it can be confirmed that the stated values of trace data above definitively delineate externally induced latency, it would greatly benefit developers to update the trace report to label the latency gaps as external to the executing program.

robert engels

unread,
Feb 11, 2022, 11:34:29 PM2/11/22
to Rich Moyse, golang-nuts
I suggest you use the linux scheduler tracing facilities, e.g. perf shed to diagnose (see https://www.brendangregg.com/perf.html#SchedulerAnalysis)

But you may have to also do this outside the VM - to schedule trace the VM.

In general, timing at 1ms intervals with a 2% error is probably all you can expect running in a VM - especially if you don’t know (or manage) the load on the bare metal. 2% on 1ms is 20 us - and a raw bare metal scheduler will probably only get you to 5-6 us.

You can simplify your analysis by testing on bare metal first.



--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/9a24ce34-d6c2-49ce-8e59-e4b0258b51ffn%40googlegroups.com.

Rich Moyse

unread,
Feb 12, 2022, 9:49:29 PM2/12/22
to golang-nuts
Robert, thanks for your reply - especially the link to perf sched!  I plan to run the go program on Windows physical machine.  I'll post my results once done.

Robert Engels

unread,
Feb 12, 2022, 11:50:16 PM2/12/22
to Rich Moyse, golang-nuts
A windows machine won’t have ‘perf sched’. 

I don’t think Windows has similar lowlevel scheduling tracing - at least not that I remember. 

I suggest installing Linux on a separate partition and direct booting. 

Linux has a LOT of scheduling tracing/tuning options - some require a custom kernel. 

Have fun!


On Feb 12, 2022, at 8:51 PM, Rich Moyse <ri...@moyse.us> wrote:


Robert, thanks for your reply - especially the link to perf sched!  I plan to run the go program on Windows physical machine.  I'll post my results once done.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages