Hi all,
I am doing some latency measurement of a subprocess in Go. However I found that Go program is unstable, especially when there is delay/gap between process that I want to measure. Please see the example at
https://github.com/klauspost/reedsolomon/issues/180. It also happened in other subprocess besides that reed-solomon encoding.
I suspect, it was caused by the golang runtime scheduler. Other people, who do research with prototype written in Go, also found the same issues. Here are what they wrote in EPaxos Revisited paper, from NSDI'21:
"When investigating the factors that limit
server throughput, we observed several anomalies as servers
neared saturation. Seemingly innocuous changes could have a
large impact on throughput. For example, increasing the num ber of available cores caused throughput to drop. We believe
that some of the issues may come from unexpected behavior
of Go’s thread scheduler. .... We
suspect that Go occasionally deschedules that thread in order
to run less-critical threads that listen for incoming messages."
Do you have any recommendation on how to make the runtime scheduler more deterministic?
I tried to run the program with GOMAXPRCS=1 to limit the CPU usage, minimizing golang scheduler to switch between CPU. However the documentation says, it limit the number of thread used, thus the scheduler can still move the thread from one CPU to another CPU. Is there any way to pin a go program to a specific CPU?
Thank you.