golang scheduler and audio/media

836 views
Skip to first unread message

Scott Cotton

unread,
Sep 9, 2018, 6:53:55 AM9/9/18
to golang-dev
Hi all,

I wanted to bring your attention to this discussion on the port audio mailing list regarding the role of OS real time priority threads and go scheduling.

It is somewhat related also to this golang wiki page about game libraries.

And this issue about scheduling performance using LockOsThread.

Given these things, I am curious: are there any thoughts about the possibility of treating threads with real time priority differently?  Am I perhaps wrong to say that using cgo to call back to go on a "foreign" thread has some problems or overhead associated with it when that thread has some special OS scheduling properties, like real-time priority?

Any thoughts appreciated,

Best,
Scott  

Ian Lance Taylor

unread,
Sep 9, 2018, 8:38:49 AM9/9/18
to Scott Cotton, golang-dev
I'm not aware of any proposals in this area.

Using cgo to call back to Go does have some overhead currently. Some
of that could be eliminated without too much work, but some would
remain. As far as I know these problems are independent of whether
the thread has special OS scheduling properties.

Ian

robert engels

unread,
Sep 9, 2018, 9:33:12 AM9/9/18
to Ian Lance Taylor, Scott Cotton, golang-dev
I would like to chime in here an state that any HPC of HFT needs “thread” pinning for optimum performance. The simplicity of goroutines are great, but in practice they are lacking several critical features. (yes, I’ve read the FAQ, and disagree).

They need names - and the runtime can append .N to avoid name collision. This is solely for debugger and logging - it is VERY difficult to debug/postmortem complex, highly concurrent Go applications.

You should be able to group goroutines to share the same thread, and you need to be able to pin threads to cores.

I realize this flies in the face of “cloud computing” because you usually don’t (and can’t) worry about these details, but to be a great systems language, Go needs these features.

while we’re at it, it needs volatile variables with ‘happens before’ relationships like Java. It makes code sharing much easier - and I know… don’t share data - just not feasible for really high performance systems.
> --
> You received this message because you are subscribed to the Google Groups "golang-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Ian Lance Taylor

unread,
Sep 9, 2018, 9:57:52 AM9/9/18
to robert engels, Scott Cotton, golang-dev
On Sun, Sep 9, 2018 at 6:33 AM, robert engels <ren...@ix.netcom.com> wrote:
>
> They need names - and the runtime can append .N to avoid name collision. This is solely for debugger and logging - it is VERY difficult to debug/postmortem complex, highly concurrent Go applications.

This is quite unlikely to change. For debugging and logging purposes,
I think that current best practice is to pass a context.Context value.


> You should be able to group goroutines to share the same thread, and you need to be able to pin threads to cores.

I agree that this seems likely to be required, though the details are unclear.


> while we’re at it, it needs volatile variables with ‘happens before’ relationships like Java. It makes code sharing much easier - and I know… don’t share data - just not feasible for really high performance systems.

As far as I can tell Java-style volatile variables don't give you
anything you can't already get using the sync/atomic package. What am
I missing?

Ian

Scott Cotton

unread,
Sep 9, 2018, 9:58:33 AM9/9/18
to Ian Lance Taylor, golang-dev
Hi Ian,

Good to know  cgo->go overhead can be reduced in any case :)

For audio/media it seems the issue is variance in scheduling latency which is traditionally handled by projects like Apple AudioUnits, JACK, and googles Android oboe by invoking higher or "real" time priority OS threads.

It seems very much more a case of reducing timing jitter than the actual number of CPU cycles to get something, like cgo->go, done.  I would think the CPU cycles would often be relatively deterministic.

I think LockOsThread could be used in these contexts, but it would be impossible to do without risk on the first scheduling of a foreign specially scheduled or real-time thread.  It would also apparently have the problems in the issue cited below.

For audio, an additional complication is that all the suppliers of systems like above which use real-time threads say you can't do much in them, including for example schedule something in Go.  I am 
a bit skeptical of these claims because from them I can deduce, for example, you can't safely record anything in any language because it may eventually either allocate memory or write to disk in the real time thread or under the control of the timing constraints implied by real time thread; but recorders based on these systems exist.  So some of it may be hype :)

Scott






  
--
Scott Cotton
President, IRI France SAS


Ian Lance Taylor

unread,
Sep 9, 2018, 10:05:08 AM9/9/18
to Scott Cotton, golang-dev
On Sun, Sep 9, 2018 at 6:58 AM, Scott Cotton <w...@iri-labs.com> wrote:
>
> I think LockOsThread could be used in these contexts, but it would be
> impossible to do without risk on the first scheduling of a foreign specially
> scheduled or real-time thread. It would also apparently have the problems
> in the issue cited below.

I don't know much about all this, I'll just that when calling Go from
a thread that was not started by Go the Go code will start in a
goroutine that is locked to the thread. You don't need to use
LockOSThread yourself for that case, so there shouldn't be any
scheduling issue. Of course any new goroutines that you start will
run on different, newly created, threads.

Ian

robert engels

unread,
Sep 9, 2018, 10:13:30 AM9/9/18
to Ian Lance Taylor, Scott Cotton, golang-dev
Yes, the atomic works, but you need to use it on the reader and the writer and the code becomes pretty obtuse and possibly error prone. Can’t use it with bools, so you need to create atomic.Bool class, which is essentially what you do for all of them, so then at least you can use read, increment, write methods on the struct for consistency.

But yea, it works, just not simply, but in retrospect, the stdlib could just offer these wrapper structs - although the problem is that they won’t be inlined, as I don’t think Go does more than a single level, and the struct methods would be making another method call (call atomic.AddXXX) which stops inlining… but I could be wrong here.

robert engels

unread,
Sep 9, 2018, 10:15:19 AM9/9/18
to Ian Lance Taylor, Scott Cotton, golang-dev
Also, the Context works, but anyone doing any complex application is going to need to do this. When something has that degree of reach, it should be a prime candidate for handling in the language/platform.

Scott Cotton

unread,
Sep 12, 2018, 8:39:57 PM9/12/18
to Ian Lance Taylor, golang-dev
Thank Ian,

For audio, there is a tendency to have user land but OS privileged layer code
that uses special thread scheduling.  For example, AAudio (SCHED_FIFO) and Apple CoreAudio
(not sure about the details of how it relates to darwins scheduler, but it is "real time" according to Apple) do this.  There is a strong consensus that this is necessary for reliable 
scheduling of real-time audio (although I haven't personally had any apparently scheduling
related problems myself outside of real-time thread context)

At any rate, there are different levels of interaction with Go implied by this.

At the level of unprivileged access, Go would need to operate on threads supplied
by the above systems.  Presumably, this would be via cgo->go calls.  Ian: Was wondering
if the improvements you suggested were related to setting up the Goroutine on the 
foreign thread the first time, or w.r.t. checking the pointers and everything for Go gc?

At the level of privileged access, Go could potentially eventually offer a replacement for
things like AAudio and CoreAudio.  It could use the native interface (either cgo or sys calls, depending) 
to generate such specially scheduled threads, and then use cgo->go to start goroutines on them.
In this case, I would imagine it would be nice to be able to have M:N goroutines to threads.
To my understanding, this is not currently possible with Goroutines locked to threads, and 
probably would violate some safety assumptions put on for foreign threads in other types of applications.
But in this case, Go would control the "foreign" thread creation. 

The M:N idea would in my estimation also be useful if applied in the case of unprivileged 
access.  It would I guess mostly take the form of old GOMAXPROCS=1 type behaviour.

My question to golang-dev as a whole is if it seems feasible to try to make interoperability with 
OS special scheduling characteristics of threads better, perhaps along the lines above, and if anyone knows of other applications that fall in the category of special OS thread scheduling (not cpu affinity) that would benefit?

Best
Scott

 



--

Ian Lance Taylor

unread,
Sep 12, 2018, 9:10:32 PM9/12/18
to Scott Cotton, golang-dev
On Wed, Sep 12, 2018 at 5:39 PM, Scott Cotton <w...@iri-labs.com> wrote:
>
> Presumably, this would be via cgo->go calls. Ian:
> Was wondering
> if the improvements you suggested were related to setting up the Goroutine
> on the
> foreign thread the first time, or w.r.t. checking the pointers and
> everything for Go gc?

I was referring to setting up the goroutine on the non-Go thread. I
believe that could be faster. For example, see the TODO in the
comment for dropm in runtime/proc.go.

Ian

robert engels

unread,
Sep 12, 2018, 9:24:09 PM9/12/18
to Scott Cotton, Ian Lance Taylor, golang-dev
As a comparison, Java threads have a platform agnostic “priority”. On some platforms, you can map this priority to an OS priority via config. Without that, JNI code is used to call pthread_setschedparam()

In these cases Java has one native thread per Java thread (original Java used green threads like Go).

I think for go to support this, as I stated before, you need to be able to assign goroutines to “groups”, and then set the cpu mask, and thread priority for the group.

This could be done with stdlib functions, and nothing changed in the language syntax,

runtime.AssignCurrentGo(group string)
runtime.AssignCpuMark(group string,mask []int)
runtime.AssignCpuPriority(group string,priority int)

most likely priority should be logical, with a mapping performed by the runtime. For Go’s simplicity you might be able to get away with LOW,NORMAL,HIGH,REALTIME



Scott Cotton

unread,
Sep 13, 2018, 4:28:17 AM9/13/18
to robert engels, Ian Lance Taylor, golang-dev
On 13 September 2018 at 03:24, robert engels <ren...@ix.netcom.com> wrote:
As a comparison, Java threads have a platform agnostic “priority”. On some platforms, you can map this priority to an OS priority via config. Without that, JNI code is used to call pthread_setschedparam()

In these cases Java has one native thread per Java thread (original Java used green threads like Go).

I think for go to support this, as I stated before, you need to be able to assign goroutines to “groups”, and then set the cpu mask, and thread priority for the group.

OK, apologies I didn't get that the idea of groups was what I was looking for before.

 

This could be done with stdlib functions, and nothing changed in the language syntax,

runtime.AssignCurrentGo(group string)
runtime.AssignCpuMark(group string,mask []int)
 
runtime.AssignCpuPriority(group string,priority int)

most likely priority should be logical, with a mapping performed by the runtime. For Go’s simplicity you might be able to get away with LOW,NORMAL,HIGH,REALTIME


This sounds fine to me now for the audio stuff (this stuff takes some time for me to digest). 

I think a review of scheduling with all GOOS values could help answer the question of the best way to represent priority in this API.

Looking at go tool dist list, there are the following platforms for which I have no idea how that would work or where to look:
plan9, nacl, js, windows,

I understand there is some talk of raspberry pi being added as well.  Is that so?  If so is it effectively linux w.r.t. OS thread scheduling?

Other platforms?  like iOS?

Scott

 


To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Robert Engels

unread,
Sep 13, 2018, 8:36:33 AM9/13/18
to Scott Cotton, Ian Lance Taylor, golang-dev
No apologies - that was just my idea - it doesn’t exist!  I don’t think it was well received but you never know. It’s hard to write really high performance applications without it though - that’s why the OS facilities are there in the first place. 
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.

Bryan C. Mills

unread,
Sep 13, 2018, 8:54:24 AM9/13/18
to w...@iri-labs.com, Ian Lance Taylor, golang-dev
I would expect that you could set up the following structure fairly easily:
  • From a privileged C thread, invoke a cgo-exported Go function. The Go function can loop (without returning) to perform whatever real-time work is needed, using buffered channels to communicate with the rest of the program (and thereby avoid blocking the privileged thread).
  • In other goroutines, perform any background work that does not need real-time scheduling (such as pre-rendering or decoding chunks of audio).
FWIW, I have done a couple of experiments with real-time audio in Go in the past. In 2013 it was possible to get acceptable latency characteristics for interactive performance on a Linux desktop machine (using the ALSA C API) without any special scheduling, provided that the main loop did not allocate. Given the GC latency improvements since then, I would be surprised if the “do not allocate” proviso is even still needed.


Robert Engels

unread,
Sep 13, 2018, 9:04:06 AM9/13/18
to Bryan C. Mills, w...@iri-labs.com, Ian Lance Taylor, golang-dev
I would be surprised it would be needed for audio as well, given the GC pause times, and also that the OS drivers are buffered. 

I was referring more to HPC systems and cache locality. Hard to achieve with thousands of Goroutines if you can’t group and isolate them. 

Scott Cotton

unread,
Sep 13, 2018, 9:04:16 AM9/13/18
to Bryan C. Mills, Ian Lance Taylor, golang-dev
On 13 September 2018 at 14:53, Bryan C. Mills <bcm...@google.com> wrote:
I would expect that you could set up the following structure fairly easily:
  • From a privileged C thread, invoke a cgo-exported Go function. The Go function can loop (without returning) to perform whatever real-time work is needed, using buffered channels to communicate with the rest of the program (and thereby avoid blocking the privileged thread).
Yes I think that could work in the context of an OS privileged go program providing audio.  However, the buffering of channels to avoid non-realtime scheduling timing of the program seems less flexible and less reliable than the group idea to me.
 
  • In other goroutines, perform any background work that does not need real-time scheduling (such as pre-rendering or decoding chunks of audio).
FWIW, I have done a couple of experiments with real-time audio in Go in the past. In 2013 it was possible to get acceptable latency characteristics for interactive performance on a Linux desktop machine (using the ALSA C API) without any special scheduling, provided that the main loop did not allocate. Given the GC latency improvements since then, I would be surprised if the “do not allocate” proviso is even still needed.


Cool!  did you do duplex with ALSA?  Duplex latency is much more demanding than say latency of user interface interaction.  The former, such as in VoIP, is related to what we hear, which has much for fine grained timing requirements than say 30-60fps game interaction.

Scott 
 

To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Scott Cotton

unread,
Sep 13, 2018, 9:16:41 AM9/13/18
to Robert Engels, Bryan C. Mills, Ian Lance Taylor, golang-dev
I think there is some over-hype in audio real time requirements.  Most places, such as this post, and github.com/google/oboe, cite restrictions which are more or less equivalent to common restrictions for hardware implementability of software.

However, in reality, variance in CPU cycles for memory lookup due to memory cache, variance in compiler optimisations, branch prediction, etc can yield huge variance in the real time it takes to get something done even with restrictions such as in the post above.   

That said, last I checked (which is quite a while ago) a JIFFY in linux was 1/100 second, which by itself 
is plenty to cause a glitch in a low latency audio APP.

I think Go's low latency GC is great for audio, but it does run in the context of OS scheduling, and so must be subject to OS thread scheduling latency limitations in any case.


Scott





To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

Bryan C. Mills

unread,
Sep 13, 2018, 9:23:24 AM9/13/18
to w...@iri-labs.com, Ian Lance Taylor, golang-dev
On Thu, Sep 13, 2018 at 9:04 AM Scott Cotton <w...@iri-labs.com> wrote:
On 13 September 2018 at 14:53, Bryan C. Mills <bcm...@google.com> wrote:
I would expect that you could set up the following structure fairly easily:
  • From a privileged C thread, invoke a cgo-exported Go function. The Go function can loop (without returning) to perform whatever real-time work is needed, using buffered channels to communicate with the rest of the program (and thereby avoid blocking the privileged thread).
Yes I think that could work in the context of an OS privileged go program providing audio.  However, the buffering of channels to avoid non-realtime scheduling timing of the program seems less flexible and less reliable than the group idea to me.

Buffered channels are already in the language, and already useful independent of realtime scheduling. It would be nice to see how far we can get with existing features before we propose to add new ones. 🙂
 
  • In other goroutines, perform any background work that does not need real-time scheduling (such as pre-rendering or decoding chunks of audio).
FWIW, I have done a couple of experiments with real-time audio in Go in the past. In 2013 it was possible to get acceptable latency characteristics for interactive performance on a Linux desktop machine (using the ALSA C API) without any special scheduling, provided that the main loop did not allocate. Given the GC latency improvements since then, I would be surprised if the “do not allocate” proviso is even still needed.


Cool!  did you do duplex with ALSA?  Duplex latency is much more demanding than say latency of user interface interaction.  The former, such as in VoIP, is related to what we hear, which has much for fine grained timing requirements than say 30-60fps game interaction.

My experiment was only half-duplex, but it's a live instrument (a just-tempered synthesizer using the computer keyboard as input), so the end-to-end latency has similar constraints to VoIP. (If the delay between the keyboard input and audio output gets too long, the instrument becomes more-or-less unplayable.)

It looks like I was using 5ms latency, which is comparable to what I use for ASIO instruments.


Scott 
 

To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Scott Cotton

unread,
Sep 13, 2018, 9:34:36 AM9/13/18
to Bryan C. Mills, Ian Lance Taylor, golang-dev
5ms looks to me like a common latency spot:  it's close to a power of 2 buffer size for fft for common sample rates,  it is doable pretty reliably on unloaded systems independent of OS scheduling priority, and it is small enough to not be terribly irritating to play interactively.  But I would think a professional music studio would offer much lower latency, and a latency sensitive musician may not like 5ms very much.

Probably just good enough for mass market interactive music apps.

Scott



Scott 
 

To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

Scott Cotton

unread,
Sep 13, 2018, 9:45:59 AM9/13/18
to Bryan C. Mills, Ian Lance Taylor, golang-dev
On 13 September 2018 at 15:22, Bryan C. Mills <bcm...@google.com> wrote:
On Thu, Sep 13, 2018 at 9:04 AM Scott Cotton <w...@iri-labs.com> wrote:
On 13 September 2018 at 14:53, Bryan C. Mills <bcm...@google.com> wrote:
I would expect that you could set up the following structure fairly easily:
  • From a privileged C thread, invoke a cgo-exported Go function. The Go function can loop (without returning) to perform whatever real-time work is needed, using buffered channels to communicate with the rest of the program (and thereby avoid blocking the privileged thread).
Yes I think that could work in the context of an OS privileged go program providing audio.  However, the buffering of channels to avoid non-realtime scheduling timing of the program seems less flexible and less reliable than the group idea to me.

Buffered channels are already in the language, and already useful independent of realtime scheduling. It would be nice to see how far we can get with existing features before we propose to add new ones. 🙂

Perhaps, will have to see how some other things advance and bandwidth.  

I think improvements to relationship between Go's scheduler and OS scheduling has long been a concern and improvements have already been proposed, at least informally.  Consensus to me seems so far to be that it would be a good thing to address, not just for audio.  

So I would like to propose not to de-prioritise addressing it while waiting for a project without sufficient resources to plan releases.

Scott 
 


Ralph Corderoy

unread,
Sep 13, 2018, 11:17:06 AM9/13/18
to Scott Cotton, golan...@googlegroups.com
Hi Scott,

> That said, last I checked (which is quite a while ago) a JIFFY in
> linux was 1/100 second, which by itself is plenty to cause a glitch in
> a low latency audio APP.

time(7) on Linux here says

The value of HZ varies across kernel versions and hardware platforms.
On i386 the situation is as follows: on kernels up to and including
2.4.x, HZ was 100, giving a jiffy value of 0.01 seconds; starting with
2.6.0, HZ was raised to 1000, giving a jiffy of 0.001 seconds. Since
kernel 2.6.13, the HZ value is a kernel configuration parameter and can
be 100, 250 (the default) or 1000, yielding a jiffies value of,
respectively, 0.01, 0.004, or 0.001 seconds. Since kernel 2.6.20, a
further frequency is available: 300, a number that divides evenly for
the common video frame rates (PAL, 25 HZ; NTSC, 30 HZ).

...

Before Linux 2.6.21, the accuracy of timer and sleep system calls
(see below) was also limited by the size of the jiffy.

Since Linux 2.6.21, Linux supports high-resolution timers (HRTs),
optionally configurable via CONFIG_HIGH_RES_TIMERS. On a system
that supports HRTs, the accuracy of sleep and timer system calls is
no longer constrained by the jiffy, but instead can be as accurate
as the hardware allows (microsecond accuracy is typical of modern
hardware). You can determine whether high-resolution timers are
supported by checking the resolution returned by a call to
clock_getres(2) or looking at the "resolution" entries in
/proc/timer_list.

Seems it's 300 here by default on Linux 4.18.6-arch1-1-ARCH x86_64.

$ sudo -i sh -c 'grep ^jiff /proc/timer_list; sleep 1; grep ^jiff /proc/timer_list'
jiffies: 4328100819
jiffies: 4328100819
jiffies: 4328100819
jiffies: 4328100819
jiffies: 4328101124
jiffies: 4328101124
jiffies: 4328101124
jiffies: 4328101124
$ dc -e '4328101124 4328100819-p'
305
$

/proc/timer_list also shows hrtimer_interrupt is the event_handler, with
the notional resolution of 1 ns backing that up.

--
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy

Scott Cotton

unread,
Sep 13, 2018, 1:23:37 PM9/13/18
to Ralph Corderoy, golang-dev
Hi Ralph,

Thanks for the update.

Looks like linux is taking care to adopt its scheduling to media.

Also 250 is the default, which is no longer sufficient to create a glitch in a 5ms latency app
but is close.

Scott

Scott Cotton

unread,
Sep 13, 2018, 7:17:35 PM9/13/18
to golang-dev


On Thursday, 13 September 2018 15:45:59 UTC+2, Scott Cotton wrote:


On 13 September 2018 at 15:22, Bryan C. Mills <bcm...@google.com> wrote:
On Thu, Sep 13, 2018 at 9:04 AM Scott Cotton <w...@iri-labs.com> wrote:
On 13 September 2018 at 14:53, Bryan C. Mills <bcm...@google.com> wrote:
I would expect that you could set up the following structure fairly easily:
  • From a privileged C thread, invoke a cgo-exported Go function. The Go function can loop (without returning) to perform whatever real-time work is needed, using buffered channels to communicate with the rest of the program (and thereby avoid blocking the privileged thread).
Yes I think that could work in the context of an OS privileged go program providing audio.  However, the buffering of channels to avoid non-realtime scheduling timing of the program seems less flexible and less reliable than the group idea to me.

Buffered channels are already in the language, and already useful independent of realtime scheduling. It would be nice to see how far we can get with existing features before we propose to add new ones. 🙂

Also, I didn't propose that, Robert Engels did for HPC HFT, because he said you can't do that without pinning threads to cpus and the like.  

The question of supporting widely used OS capabilities like scheduling characteristics is also a chicken and egg thing.  If Go doesn't interface nicely with widely used OS capabilities, it will weaken its prospects for the applications which need that.   For the case of audio, if Go can't say it supports goroutines in specially scheduled OS threads, then my impression is it won't be taken seriously for audio, and 
then there would be fewer users for Go with audio.  Given the recent use of surveys and statistics to drive development, if this were fact it may be used as an argument against taking action to support it in the future.  But IMO discouraging the support of widely used OS functionality is not really the disposition a general purpose language should take, independent of my interests in it.


Scott
 



Scott Cotton

unread,
Sep 13, 2018, 7:36:35 PM9/13/18
to golang-dev


On Thursday, 13 September 2018 15:04:06 UTC+2, Robert Engels wrote:
I would be surprised it would be needed for audio as well, given the GC pause times, and also that the OS drivers are buffered. 

I was referring more to HPC systems and cache locality. Hard to achieve with thousands of Goroutines if you can’t group and isolate them. 

The OS driver buffers for low latency audio represent a real time duration below the default OS thread scheduling latency.  For Go,
I believe the up front latency cost would be OS thread scheduling latency + GC pause times.  The GC latency improvements are great, and enable a lot, but the GC operates within the context of OS thread scheduling.  A Go app with 1ms GC latency would have 
by default on linux a 0.004 sec latency + the GC latency should the OS need to prioritise something else first.  

In other languages (Java, C) the likelihood of 0.004 sec latency resulting from scheduling happens less often because of support for thread scheduling.  That is, they are more reliable than full Go with goroutines.  It is unfortunate to have to do CGO with no goroutines,
and have sigaltstack overhead and what not associated with it on a callback given to a host sound system to run on a high priority thread.

I like your idea of adding to the runtime grouping, scheduling class, and affinity very much for audio.   

Best,
Scott

robert engels

unread,
Sep 13, 2018, 11:46:26 PM9/13/18
to Scott Cotton, golang-dev
The linux kernel can perform context switches in under 5 usecs on “standard hardware”. In the case of equal priorities, I believe the standard time slice is 100 ms (although 64 ms in many systems). So without scheduling priority control, if a program needs anything under 100 ms it is not even close to being guaranteed in a general purpose linux install.

Scott Cotton

unread,
Sep 14, 2018, 4:35:06 PM9/14/18
to robert engels, golang-dev
Hi Robert and All,

Ralph gave us info on the jiffy in linux scheduling.  Although your conclusions are in line with his data, the numbers are a little bit off. The default jiffy (roughly: the time that the scheduler gives one thread to occupy a cpu core without interruption) is 0.004 seconds.  It was 0.01 seconds until some 2.6.0 kernel, when it went down to 0.001 seconds for a bit, but that was too fast, so they backed off to 0.004 shortly thereafter.

This is why reliable low latency audio uses special thread priorities.  It is used in Android AAudio, and Apple CoreAudio for this purpose, which I would classify as widespread use in real applications.  Although you can get low latency in benchmarks on unloaded machines where you don't have to wait for a jiffy, this is not considered reliable in audio systems unless you have dedicated hardware. For similar reasons, it is also not considered reliable to have sys calls like sigaltstack in cgo callbacks on audio processing/rendering threads.  this portaudio mailing list message is a good description.

In answer to the question of how far we can go in audio without scheduling priorities in the go runtime, it seems to me there are the following known limitations:
1. no goroutines and sigaltstack/cgo->go overhead (which involves a sys call) in callbacks on host supplied realtime threads.
2. Go's runtime can't distinguish OS thread scheduling differences them in any way.

A simple conclusion is that low latency audio apps in Go are unreliable on most platforms as the runtime+cgo mechanism stands today.  It doesn't matter how well something is programmed in pure Go or how smartly the work is divided between cgo and go.  It doesn't even matter if someone benchmarks their system and claims to have had reliable low latency (like me and Brian), because these issues are caused by the relationship between go and the host OS thread scheduling and the widespread need for special thread priorities in audio systems.

I have a goal of making reliable low latency audio apps in Go. I think it is a reasonable goal since Go is a good general purpose language, with both low and high level features. But I think these issues together are a stopper for reliable latency under about twice a jiffy, which doesn't really quite fall into the low latency category.

Ian suggested the TODO in dropm in runtime/proc.go.  This would help issue 1.  I have started looking at it and it seems to so far (the code is pretty deep to learn overnight, so take with a grain of salt): in any event 
the cgo->go and go->cgo directions would involve sigaltstack even with the improvement suggested in the TODO.  Any runtime/proc.go gurus willing to comment?

Robert suggested adding runtime functions to define thread priorities and affinities for groups of goroutines.
This would solve issue 2 and to some extent obviate a need for solving 1.  Ian agreed that something like that was necessary but details were unclear.

I have started looking at how to make progress on that more concrete.  I have asked for help w.r.t. 
plan9 and windows and the various js host targets (where I guess this functionality shouldn't be supported)
with no response as of yet.

Best,
Scott










To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.




--

Robert Engels

unread,
Sep 14, 2018, 10:57:36 PM9/14/18
to Scott Cotton, golang-dev
I don’t think your numbers are correct. The minimum is 1 ms but the default is closer to 100 ms, but there are a lot of settings to control it. 

This is a great read https://notes.shichao.io/lkd/ch4/#timeslice

Sent from my iPhone

robert engels

unread,
Sep 14, 2018, 11:28:39 PM9/14/18
to Scott Cotton, golang-dev
One other note, if you use a Go thread/routine - it is still going to be subject to GC pauses - which can vary greatly. even with OS scheduling support.

This is a completely different problem, but if it can’t be solved, the OS priority changes won’t matter.

I think for very low-latency audio, you need native threads, with no dynamic memory allocation, that communicate with the Go threads via shared memory/queue.

Or Go needs a lower pause GC, like Azul Zing, but that is proprietary.

The Go folks might want to investigate the Shenandoah collector in the OpenJDK - because in early tests its pretty amazing and open source :)

Scott Cotton

unread,
Sep 15, 2018, 4:43:35 AM9/15/18
to robert engels, golang-dev
Hi Robert,

On 15 September 2018 at 05:28, robert engels <ren...@ix.netcom.com> wrote:
One other note, if you use a Go thread/routine - it is still going to be subject to GC pauses - which can vary greatly. even with OS scheduling support.

This is a completely different problem, but if it can’t be solved, the OS priority changes won’t matter.

GC Pauses are solvable by program design.

 

I think for very low-latency audio, you need native threads, with no dynamic memory allocation, that communicate with the Go threads via shared memory/queue.

One can't communicate between native threads and go threads without invoking OS scheduling latency,
so this has the limitations stated before between thread priorities.

 

 
Or Go needs a lower pause GC, like Azul Zing, but that is proprietary.


To my knowledge, Go has the best GC in terms of latency there is, but I've not studied the GC's to which you are referring.

Robert Engels

unread,
Sep 15, 2018, 7:41:17 AM9/15/18
to Scott Cotton, golang-dev
I’m sorry but none of what you stated is true. Unless you do no dynamic memory allocation in ANY thread, all goroutines are going to be subject to the GC pause. You can delicately share memory between Go and a native thread. Just use an unsafe off heap allocation so the memory is not subject to GC. Go pause times are great but no where near state of the art. 

Sent from my iPhone

Scott Cotton

unread,
Sep 15, 2018, 7:52:34 AM9/15/18
to Robert Engels, golang-dev
Hi Robert,

On 15 September 2018 at 13:41, Robert Engels <ren...@ix.netcom.com> wrote:
I’m sorry but none of what you stated is true.

I don't find that statement constructive.

 
Unless you do no dynamic memory allocation in ANY thread, all goroutines are going to be subject to the GC pause. You can delicately share memory between Go and a native thread.

Thus one option is to avoid dynamic memory allocation, and GC pauses are also a function of the amount of memory in the heap and the size of the pointer graph, which something the programmer can work with.

 
Just use an unsafe off heap allocation so the memory is not subject to GC. Go pause times are great but no where near state of the art. 



I'll take the recent keynote at ISSM as authoritative on this question for now.


Scott

Scott Cotton

unread,
Sep 15, 2018, 7:59:03 AM9/15/18
to golang-dev


On Saturday, 15 September 2018 04:57:36 UTC+2, Robert Engels wrote:
I don’t think your numbers are correct. The minimum is 1 ms but the default is closer to 100 ms, but there are a lot of settings to control it. 

The numbers came from  https://groups.google.com/d/msg/golang-dev/EVwSXv8JTsk/a8AzEl8tCAAJ and appear correct to me.  0.1 second time slice does not make sense to me.
Thanks for the link, it is a nice read.  Nothing in there that I see says anything about 0.1 second time slicing.

robert engels

unread,
Sep 15, 2018, 4:12:15 PM9/15/18
to Scott Cotton, golang-dev
On Sep 15, 2018, at 6:52 AM, Scott Cotton <w...@iri-labs.com> wrote:

Hi Robert,

On 15 September 2018 at 13:41, Robert Engels <ren...@ix.netcom.com> wrote:
I’m sorry but none of what you stated is true. 

I don't find that statement constructive.

 

I am not sure why. I was simply stating the statement you made were not true. Honestly, your statement is more offensive if you think about it.

Unless you do no dynamic memory allocation in ANY thread, all goroutines are going to be subject to the GC pause. You can delicately share memory between Go and a native thread. 

Thus one option is to avoid dynamic memory allocation, and GC pauses are also a function of the amount of memory in the heap and the size of the pointer graph, which something the programmer can work with.


GC pauses in Go are not based on the the heap size. The “pause” time is based on number of active threads and stack depth (coupled with the root object set). Still, if the GC is running a lot, it will starve (compete with) the CPU from the application threads making them “slower” due to scheduling, but this is not a pause.

And I said, you can allocate off heap memory to be shared with the native thread that is not subject to GC pauses, and run these threads in real time priority. This is a common technique in many Java libraries and it works in Go as well. I personally don’t use the technique because the pause time is no longer based on heap size (it was previously). It does avoid the overhead of converting “memory layout (e.g. strings)” between the sides.

 
Just use an unsafe off heap allocation so the memory is not subject to GC. Go pause times are great but no where near state of the art. 



I'll take the recent keynote at ISSM as authoritative on this question for now.


I read the presentation.  Go currently claims pauses times in the presentation for 2018, are 2 less than 500 us sec pauses per second. The pauses times in Azul Zing are under the 100 us for more than 1 TB of heap, and typically under 10 usec. The Azul GC often requires large heaps/head-room, 20+ GB is not uncommon, to be efficient, which is not the typically Go environment. Even the OpenJDK Shenandoah with 100+ Gb heaps have pauses less than 500 usec.

So I will state it again, Go GC is very, very good, but it is not state of the art. It is close.

Most importantly though, this really has nothing to do with the requirements for real-time audio. I was attempting to explain how you could do it.

If you review https://making.pusher.com/golangs-real-time-gc-in-theory-and-practice/ you will see Go has 7 ms allocation pauses. probably too much based on what you’ve stated. I’ve run those tests on my machine using Go 1.11 and I see similar 7 ms pauses times (my Java times using standard G1 are in the 28 ms range). This is a direct link to the relevant code main.go

i was only trying to be helpful, and I don’t appreciate being called out for stating something is untrue, and I don’t think that is productive.

robert engels

unread,
Sep 15, 2018, 4:33:33 PM9/15/18
to Scott Cotton, golang-dev
Also, the report states:

"Linux's CFS scheduler does not directly assign timeslices to processes, but assigns processes a proportion of the processor. The amount of processor time that a process receives is a function of the load of the system. This assigned proportion is further affected by each process's nice value. The nice value acts as a weight, changing the proportion of the processor time each process receives. Processes with higher nice values (a lower priority) receive a deflationary weight, yielding them a smaller proportion of the processor, and vice versa.

With the CFS scheduler, whether the process runs immediately (preempting the currently running process) is a function of how much of a proportion of the processor the newly runnable processor has consumed. If it has consumed a smaller proportion of the processor than the currently executing process, it runs immediately"

If you run the numbers with everything running at default priority and default nice value, and two CPU bound SCHED_NORMAL processes, you will see that the timeslice is very close to 100 ms as I stated - meaning a process will run for 100 ms before it is pre-empted to run the other - leading to 100 ms scheduling delays.

That being said, most process are not CPU bound, but IO bound, thus the scheduler attempts the run the process in a “fair” fashion, while trying to minimize context switches, because they are inefficient, and lead to poor use of the CPU cache.

From the linux source code:

/*
* default timeslice is 100 msecs (used only for SCHED_RR tasks).
* Timeslices get refilled after they expire.
*/
#define RR_TIMESLICE (100 * HZ / 1000

So for two SCHED_RR programs at equal priority, it is 100 ms.

Scott Cotton

unread,
Sep 15, 2018, 5:17:40 PM9/15/18
to robert engels, golang-dev
Hi Robert,

On 15 September 2018 at 22:12, robert engels <ren...@ix.netcom.com> wrote:


On Sep 15, 2018, at 6:52 AM, Scott Cotton <w...@iri-labs.com> wrote:

Hi Robert,

On 15 September 2018 at 13:41, Robert Engels <ren...@ix.netcom.com> wrote:
I’m sorry but none of what you stated is true. 

I don't find that statement constructive.

 

I am not sure why. I was simply stating the statement you made were not true. Honestly, your statement is more offensive if you think about it.

I don't read this interchange that way.  You said that "none of what" I said is true.  In the context of this thread, the scope of "what I said" may refer to many many things, as there are lots of interchanges and I said many things.  You may thus have been referring to everything I've brought up.  I don't know.

In any event, when I might think what someone else says is not true, I would word the response as "I don't understand that", or "that doesn't make sense to me".  Because claiming what another person says is not true, moreover maybe even all of it, asserts authority over "the" truth over the other person. To me, there is always a possibility of a miscommunication or ambiguities in communication, especially with a stranger in a public form, and such assertions aren't helpful.

For these reasons, I did not find this statement constructive.  I am sorry if stating that offended you.  I also don't know what you mean by my statement being more offensive.  I'd like to invite you to explain more off line, as my intention is only to direct the discussion toward something fruitful, and any tension between you and I seems a distraction from the ultimate goal and purpose of this thread and list.

Of course, I appreciate that our views differ and learn what I can from it.  I would appreciate it if that sentiment went many ways.

[...]

i was only trying to be helpful, and I don’t appreciate being called out for stating something is untrue, and I don’t think that is productive.

Again, I'm sorry you didn't find that to be productive.   I'll continue the discussion on my end without reference to this interchange in hopes you will either accept my response above, or work it out offline.

Best,
Scott


robert engels

unread,
Sep 15, 2018, 5:47:03 PM9/15/18
to Scott Cotton, golang-dev
I was only referring to the three claims you made in the email I was responding to. I could of said it differently and maybe this exchange would of been avoided. Sometimes that happens in email, I am sorry, and I’ll try and keep that in mind for the future.

Scott Cotton

unread,
Sep 15, 2018, 5:57:27 PM9/15/18
to robert engels, golang-dev
On 15 September 2018 at 22:12, robert engels <ren...@ix.netcom.com> wrote:


On Sep 15, 2018, at 6:52 AM, Scott Cotton <w...@iri-labs.com> wrote:

Hi Robert,

On 15 September 2018 at 13:41, Robert Engels <ren...@ix.netcom.com> wrote:
[...] 
Unless you do no dynamic memory allocation in ANY thread, all goroutines are going to be subject to the GC pause. You can delicately share memory between Go and a native thread. 

Thus one option is to avoid dynamic memory allocation, and GC pauses are also a function of the amount of memory in the heap and the size of the pointer graph, which something the programmer can work with.


GC pauses in Go are not based on the the heap size. The “pause” time is based on number of active threads and stack depth (coupled with the root object set). Still, if the GC is running a lot, it will starve (compete with) the CPU from the application threads making them “slower” due to scheduling, but this is not a pause.

I am not a GC expert, but my point is only the programmer has a pretty reasonable amount of control over
the work presented to the GC, especially in contexts where memory can be pre-allocated and the program has a dedicated task.

 

And I said, you can allocate off heap memory to be shared with the native thread that is not subject to GC pauses, and run these threads in real time priority. This is a common technique in many Java libraries and it works in Go as well. I personally don’t use the technique because the pause time is no longer based on heap size (it was previously). It does avoid the overhead of converting “memory layout (e.g. strings)” between the sides.

This may be worth looking at.  My impression is still that the relationship between Go runtime and OS priveleged special thread scheduling is the main thing that needs to be considered.  It is not clear to me that any communication between  an OS priveleged special thread and a user goroutine, by sharing memory as above or otherwise addresses the scheduling problem.
 

 

 
Just use an unsafe off heap allocation so the memory is not subject to GC. Go pause times are great but no where near state of the art. 



I'll take the recent keynote at ISSM as authoritative on this question for now.


I read the presentation.  Go currently claims pauses times in the presentation for 2018, are 2 less than 500 us sec pauses per second. The pauses times in Azul Zing are under the 100 us for more than 1 TB of heap, and typically under 10 usec. The Azul GC often requires large heaps/head-room, 20+ GB is not uncommon, to be efficient, which is not the typically Go environment. Even the OpenJDK Shenandoah with 100+ Gb heaps have pauses less than 500 usec.

So I will state it again, Go GC is very, very good, but it is not state of the art. It is close.

Close enough for me.
 

Most importantly though, this really has nothing to do with the requirements for real-time audio. I was attempting to explain how you could do it.

Just curious, have you done it?

 

If you review https://making.pusher.com/golangs-real-time-gc-in-theory-and-practice/ you will see Go has 7 ms allocation pauses. probably too much based on what you’ve stated. I’ve run those tests on my machine using Go 1.11 and I see similar 7 ms pauses times (my Java times using standard G1 are in the 28 ms range). This is a direct link to the relevant code main.go


I have read that some time ago, it is interesting.  Thanks for bringing it up.
 
Scott

robert engels

unread,
Sep 15, 2018, 6:36:01 PM9/15/18
to Scott Cotton, golang-dev
Yes, if you have an isolated program/service that is possible. In the context of a complex ‘midi’ player, with a GUI and lots of services - it is very hard to write GC free code - people try that in Java too and it usually leads to very subtle hard to find bugs especially in concurrent systems. The basic technique is object pool/re-use - very difficult to do error free in a concurrent environment. This is why platforms like LMAX disrupter exists, but even then, as soon as you use a char[] and incorrectly retain a reference, it will get stomped on.

I believe the best solution, and it would probably work well enough is the ‘group/real-time’ enhancements I presented before, but I wouldn’t count on the timeline - thus I offered the solution you could do now.

Even if Go adopted my simple API, it is not that simple... When the goroutines/threads have varying priorities it can lead to starvation, and threads/routines not able to reach a safe point (for stack recording). So often implementations try to run all of the internal threads at a higher-priority than all user threads, but then the GC work blocks the application mutators instead of running concurrently… So there needs to be a way to temporarily boost their priority when needed… Sounds simple but there can be lots of race conditions.

I was using real-time threads in Java without JRTS (Java real-time system) very early on, maybe one of the earliest to do so, and needed to work with the Azul staff A LOT tracking down very subtle, but devastating bugs/crashes.

Scott Cotton

unread,
Sep 15, 2018, 6:38:46 PM9/15/18
to golang-dev

So there's something to the 0.1 time slicing after all.  My linux scheduler knowledge is quite dated, last time I looked there was a JIFFY 
and it was 0.01 seconds for CPU bound switching.

I don't think that an interactive music or voip app will appear on a linux kernel with 2 processes very often. And as load factors into the equation I would guess that means that as the number of processes/threads increases, switching rate increases, making the time allocated to a process/thread smaller.  Maybe JIFFY now means the lower bound of that time allocation?

Although these numbers are relevant to the analysis, I think that whatever the numbers are in practice, the bottom line is that 
OS privileged threads like SCHED_FIFO in AAudio are higher priority, widely used, and the result is increased reliability. How to make that work with Go is not clear, as any delegation of work to unprivileged Go process will no longer be scheduled by the OS at high priority.

Scott

Scott Cotton

unread,
Sep 16, 2018, 5:47:27 AM9/16/18
to robert engels, golang-dev
On 16 September 2018 at 00:35, robert engels <ren...@ix.netcom.com> wrote:
Yes, if you have an isolated program/service that is possible. In the context of a complex ‘midi’ player, with a GUI and lots of services - it is very hard to write GC free code - people try that in Java too and it usually leads to very subtle hard to find bugs especially in concurrent systems. The basic technique is object pool/re-use - very difficult to do error free in a concurrent environment. This is why platforms like LMAX disrupter exists, but even then, as soon as you use a char[] and incorrectly retain a reference, it will get stomped on.

I believe the best solution, and it would probably work well enough is the ‘group/real-time’ enhancements I presented before, but I wouldn’t count on the timeline - thus I offered the solution you could do now.

Thanks.  Not so worried about the timeline at this point so much as negative feedback to the priority over time.  If there are 10x more (pure) go cloud users than performance media today, it doesn't mean there would be tomorrow if it worked well.  But citing surveys to justify priorities doesn't so much allow for such reasoning.
The problem is somewhat related to this article as a way of ranking needs.
 

Even if Go adopted my simple API, it is not that simple... When the goroutines/threads have varying priorities it can lead to starvation, and threads/routines not able to reach a safe point (for stack recording). So often implementations try to run all of the internal threads at a higher-priority than all user threads, but then the GC work blocks the application mutators instead of running concurrently… So there needs to be a way to temporarily boost their priority when needed… Sounds simple but there can be lots of race conditions.

Indeed it is not simple under the hood.  Another potential example of difficulty I'm wondering about is whether sched_yield() would be necessary for using high priority threads.
 

I was using real-time threads in Java without JRTS (Java real-time system) very early on, maybe one of the earliest to do so, and needed to work with the Azul staff A LOT tracking down very subtle, but devastating bugs/crashes.

I didn't know Azul had staff or ventured into real time stuff.  Sounds like you've got a lot of experience most of us (including me) might not.   Would love to know more.

Best
Scott

 


On Sep 15, 2018, at 4:57 PM, Scott Cotton <w...@iri-labs.com> wrote:



On 15 September 2018 at 22:12, robert engels <ren...@ix.netcom.com> wrote:


On Sep 15, 2018, at 6:52 AM, Scott Cotton <w...@iri-labs.com> wrote:

Hi Robert,

On 15 September 2018 at 13:41, Robert Engels <ren...@ix.netcom.com> wrote:
[...] 
Unless you do no dynamic memory allocation in ANY thread, all goroutines are going to be subject to the GC pause. You can delicately share memory between Go and a native thread. 

Thus one option is to avoid dynamic memory allocation, and GC pauses are also a function of the amount of memory in the heap and the size of the pointer graph, which something the programmer can work with.


GC pauses in Go are not based on the the heap size. The “pause” time is based on number of active threads and stack depth (coupled with the root object set). Still, if the GC is running a lot, it will starve (compete with) the CPU from the application threads making them “slower” due to scheduling, but this is not a pause.

I am not a GC expert, but my point is only the programmer has a pretty reasonable amount of control over
the work presented to the GC, especially in contexts where memory can be pre-allocated and the program has a dedicated task.

 

And I said, you can allocate off heap memory to be shared with the native thread that is not subject to GC pauses, and run these threads in real time priority. This is a common technique in many Java libraries and it works in Go as well. I personally don’t use the technique because the pause time is no longer based on heap size (it was previously). It does avoid the overhead of converting “memory layout (e.g. strings)” between the sides.

This may be worth looking at.  My impression is still that the relationship between Go runtime and OS priveleged special thread scheduling is the main thing that needs to be considered.  It is not clear to me that any communication between  an OS priveleged special thread and a user goroutine, by sharing memory as above or otherwise addresses the scheduling problem.
 

 

 
Just use an unsafe off heap allocation so the memory is not subject to GC. Go pause times are great but no where near state of the art. 



I'll take the recent keynote at ISSM as authoritative on this question for now.


I read the presentation.  Go currently claims pauses times in the presentation for 2018, are 2 less than 500 us sec pauses per second. The pauses times in Azul Zing are under the 100 us for more than 1 TB of heap, and typically under 10 usec. The Azul GC often requires large heaps/head-room, 20+ GB is not uncommon, to be efficient, which is not the typically Go environment. Even the OpenJDK Shenandoah with 100+ Gb heaps have pauses less than 500 usec.

So I will state it again, Go GC is very, very good, but it is not state of the art. It is close.

Close enough for me.
 

Most importantly though, this really has nothing to do with the requirements for real-time audio. I was attempting to explain how you could do it.

Just curious, have you done it?

 

If you review https://making.pusher.com/golangs-real-time-gc-in-theory-and-practice/ you will see Go has 7 ms allocation pauses. probably too much based on what you’ve stated. I’ve run those tests on my machine using Go 1.11 and I see similar 7 ms pauses times (my Java times using standard G1 are in the 28 ms range). This is a direct link to the relevant code main.go


I have read that some time ago, it is interesting.  Thanks for bringing it up.
 
Scott

Scott Cotton

unread,
Sep 17, 2018, 5:12:21 AM9/17/18
to golang-dev
Hi all,

After looking at runtime and some thought about the scheduling of special priority threads, it has 
occurred to me that there might be a simple solution to making the full Go runtime work (channels, goroutines, etc) with special OS thread scheduling.  

The idea would be to use pthread_attr_setinheritsched and set it to PTHREAD_INHERIT_SCHED (maybe under some kind of option)
when creating a new thread.

Although actually getting this working in runtime looks like a hefty task, and although this would not allow mixing different thread scheduling in one Go runtime, it seems to me it would allow a Go program to run with all threads specially scheduled provided that 
it was launched by a thread/process with the desired OS scheduling, thus enabling the possibility of using Go in contexts like a real-time audio processing chain which use special OS thread scheduling.

It also seems like it would be much simpler than modifying the runtime to know about OS scheduling priorities.

Any thoughts appreciated.

Best,
Scott

Robert Engels

unread,
Sep 17, 2018, 8:48:05 AM9/17/18
to Scott Cotton, golang-dev
I would think you could do that now, just start the program, on linux at least, using 
chrt. 

The problem there are quite a few internal threads, they should inherit the priority as well since that is the default.
The problem is that if the internal runtime is already using priorities for the scheduler all sorts of bad things might happen. 

Scott Cotton

unread,
Sep 17, 2018, 10:48:21 AM9/17/18
to Robert Engels, golang-dev
On 17 September 2018 at 14:47, Robert Engels <ren...@ix.netcom.com> wrote:
I would think you could do that now, just start the program, on linux at least, using 
chrt. 

The problem there are quite a few internal threads, they should inherit the priority as well since that is the default.

Yes.  My reading of pthread_attr_setinheritsched from man7.org is that by default the scheduling is 
not inherited, except for the case of the bug at the bottom

"""
BUGS         top
       As at glibc 2.8, if a thread attributes object is initialized using
       pthread_attr_init(3), then the scheduling policy of the attributes
       object is set to SCHED_OTHER and the scheduling priority is set to 0.
       However, if the inherit-scheduler attribute is then set to
       PTHREAD_EXPLICIT_SCHED, then a thread created using the attribute
       object wrongly inherits its scheduling attributes from the creating
       thread.  This bug does not occur if either the scheduling policy or
       scheduling priority attribute is explicitly set in the thread
       attributes object before calling pthread_create(3).
"""

I think the only way to actually do it correctly and ensure the behaviour was as desired would be to introduce calls that would make the result independent of bugs like above. 


 
The problem is that if the internal runtime is already using priorities for the scheduler all sorts of bad things might happen. 

It doesn't look to me like pthread scheduling is currently manipulated in runtime.  It is however opaque because of the optimisations and trampolining around pthread function calls.   my assessment is only from perusing and grepping for where it would seem such things would occur.  But with all the assembly related to pthreads in runtime, I could have missed something.
If OS scheduling inheritance were under an option or environmental variable or build tag then it wouldn't in principle prevent future or other efforts to introduce OS scheduling into the runtime.

Scott 

robert engels

unread,
Sep 17, 2018, 11:06:51 AM9/17/18
to Scott Cotton, golang-dev
According to the docs,

The default setting of the inherit-scheduler attribute in a newly initialized thread attributes object is PTHREAD_INHERIT_SCHED

so if the runtime doesn’t manipulate it should be inherited.

Scott Cotton

unread,
Sep 17, 2018, 11:20:16 AM9/17/18
to robert engels, golang-dev
Yep, thanks my bad.

Scott

r...@golang.org

unread,
Sep 17, 2018, 2:10:31 PM9/17/18
to golang-dev
>> I read the presentation.  Go currently claims pauses times in the presentation for 2018, are 2 less than 500 us sec pauses per second. The pauses times in Azul Zing are under the 100 us for more than 1 TB of heap, and typically under 10 usec. The Azul GC often requires large heaps/head-room, 20+ GB is not uncommon, to be efficient, which is not the typically Go environment. Even the OpenJDK Shenandoah with 100+ Gb heaps have pauses less than 500 usec.

I had not seen these Java latency numbers in the literature, could you provide a reference to both the Azul as well as the Shenandoah numbers?


Scott


 

Sent from my iPhone
Hi Robert,

robert engels

unread,
Sep 17, 2018, 2:40:06 PM9/17/18
to r...@golang.org, golang-dev
For Zing, https://www.azul.com/products/zing/java-performance/ and these matched our internal tests, although at times given certain work-loads the pauses were closer to 100 us.

For Shenandoah you need to look at the pause times in the logs here, https://www.youtube.com/watch?v=qBQtbkmURiQ go to minute 8:18… the slides are presented elsewhere but I could not find them off hand.

Like I’ve said before though, the ‘pause time’ is only one part of the story - the pause time might only be 1 usec, but if it happens 500,000 times a second, what is the effective “pause time”? Depends on the application, because what you are doing is lowering the progress of the application threads to a degree that they are effectively “paused".

This is why in the other GC tests I referenced, the Go “pause time” is 7-8 ms in the given case - even though the actual pauses are far smaller (probably on the order of 1 ms) - in order to complete the user operation it takes 7 - 8 ms. 

robert engels

unread,
Sep 17, 2018, 5:35:43 PM9/17/18
to r...@golang.org, golang-dev
As a follow-up, I downloaded and built OpenJDK for Java11 with Shenandoah.

I ran the tests at https://github.com/WillSewell/gc-latency-experiment on a quit machine (osx, core i7, 3.4 ghz, 4 cores, 8 threads), slightly modified to perform 10 runs, and report the run time of each run.

Here are the results:

Go 1.11:

Worst push time:  5.777907ms  run time  981.361142ms
Worst push time:  6.306577ms  run time  752.262192ms
Worst push time:  7.438668ms  run time  723.050672ms
Worst push time:  9.169415ms  run time  749.984075ms
Worst push time:  7.070763ms  run time  727.326469ms
Worst push time:  7.218757ms  run time  728.34274ms
Worst push time:  6.865207ms  run time  723.579475ms
Worst push time:  7.135745ms  run time  724.002589ms
Worst push time:  9.262009ms  run time  727.544747ms
Worst push time:  7.54652ms  run time  729.091587ms

JDK 11 with G1GC

Worst push time: 15.90627, run time 881
Worst push time: 15.679716, run time 896
Worst push time: 12.650266, run time 738
Worst push time: 12.516753, run time 718
Worst push time: 13.078774, run time 746
Worst push time: 12.578543, run time 724
Worst push time: 11.879806, run time 744
Worst push time: 12.496375, run time 724
Worst push time: 12.188031, run time 729
Worst push time: 12.646902, run time 735

JDK 11 with Shenandoah

Worst push time: 4.316621, run time 582
Worst push time: 3.613893, run time 577
Worst push time: 4.353042, run time 517
Worst push time: 4.33344, run time 502
Worst push time: 4.069009, run time 506
Worst push time: 3.959577, run time 501
Worst push time: 3.949561, run time 516
Worst push time: 3.726912, run time 503
Worst push time: 1.304127, run time 472
Worst push time: 1.436347, run time 489

I am no longer able to test with Zing as I am no longer with the company that had the license.

As I said, Go GC is very, very good, but not state of the art.

Scott Cotton

unread,
Sep 17, 2018, 6:08:33 PM9/17/18
to golang-dev
Just wanted to include the related go-nuts message from Ian for potential follow up:

On Mon, Sep 17, 2018 at 10:39 AM, Scott Cotton <w...@iri-labs.com> wrote:
>
> Wanted to ask about the Go runtime use of threads.  Specifically, suppose
> I've got an app in mind that would run OS-priveleged and use specially
> scheduled threads, like SCHED_RR in linux for example.
>
> One could do this with chrt or calling from a process/thread at the desired
> scheduling priority/type (as pointed out on a related thread in golang-dev)
>
> The question is: does this as of go1.11 interfere with Go runtime internal
> prioritising of threads?
> The other question is:  may it one day interfere with Go runtime internal
> prioritising of threads?

Using specially scheduled threads should not be a problem if the Go
code that runs on those threads reliably calls runtime.LockOSThread.
If not, then it's hard to say.

Scott Cotton

unread,
Sep 17, 2018, 6:30:11 PM9/17/18
to golang-dev
And here is a follow up


On Tuesday, 18 September 2018 00:08:33 UTC+2, Scott Cotton wrote:
Just wanted to include the related go-nuts message from Ian for potential follow up:

On Mon, Sep 17, 2018 at 10:39 AM, Scott Cotton <w...@iri-labs.com> wrote:
>
> Wanted to ask about the Go runtime use of threads.  Specifically, suppose
> I've got an app in mind that would run OS-priveleged and use specially
> scheduled threads, like SCHED_RR in linux for example.
>
> One could do this with chrt or calling from a process/thread at the desired
> scheduling priority/type (as pointed out on a related thread in golang-dev)
>
> The question is: does this as of go1.11 interfere with Go runtime internal
> prioritising of threads?
> The other question is:  may it one day interfere with Go runtime internal
> prioritising of threads?

Using specially scheduled threads should not be a problem if the Go
code that runs on those threads reliably calls runtime.LockOSThread.
If not, then it's hard to say.


If in Go, even moreover in non-runtime Go but user Go, one calls runtime.LockOSThread and
then sets a priority and then creates another goroutine, then I would have thought that that thread
may create another thread.  Given the inheritance of scheduling priority, if this were the case, then there
would be a leak of the protection w.r.t. scheduling priority.   Any thoughts?  Is there some guard against an
m creating a new m if it is associated with a g via LockOSThread?  I couldn't find one, but it's not easy to verify
things like this in runtime without spending some time playing with the code.

Second, a command like chrt and the lack of guarding against scheduling priority inheritence in the runtime now
would normally have implied that it is ok to call chrt, at least to me, so I was a bit surprised.

But at any rate, for what I'm looking at doing, something like chrt without LockOSThread would be much more interesting.  
Perhaps running the go test suite in an os-priveleged context replacing the executables with chrt wrappers would be a good
place to start examining this.  

Keith Randall

unread,
Sep 17, 2018, 6:41:25 PM9/17/18
to w...@iri-labs.com, golang-dev
We changed the runtime to never span a new OS thread from a thread that is currently locked with LockOSThread.
We have dedicated clean thread that can spawn new OS threads when needed.
We did this precisely because people do strange things to OS threads after doing a LockOSThread, and we don't want to copy those strange things to unrelated OS threads.





--



--

Scott Cotton

unread,
Sep 18, 2018, 8:09:28 AM9/18/18
to golang-dev
Hi Keith,

thanks, makes sense to me and it is indeed helpful to see the merge commit and issue/related issues.  It gives me a lot of context that I was lacking before.

Hi all,

Just to try to help prevent this from getting buried in the various exchanges, I'm pulling back to the top the following about
go runtime inheriting OS scheduler parameters, like via chrt on linux.

from Ian:
Using specially scheduled threads should not be a problem if the Go
code that runs on those threads reliably calls runtime.LockOSThread.
If not, then it's hard to say.


from me:
a command like chrt and the lack of guarding against scheduling priority inheritence in the runtime now
would normally have implied that it is ok to call chrt, at least to me, so I was a bit surprised.

But at any rate, for what I'm looking at doing, something like chrt without LockOSThread would be much more interesting.  
Perhaps running the go test suite in an os-priveleged context replacing the executables with chrt wrappers would be a good
place to start examining this. [+update: I'm looking at how to do this.  If a cross platform way of doing it were devised, would TruBot be available for such a test?]

Best,
Scott

David Chase

unread,
Sep 18, 2018, 11:39:54 AM9/18/18
to w...@iri-labs.com, golang-dev
Thanks for raising the Go GC latency issue. That's a bug, not expected behavior.
(A rapidly-allocating goroutine will be taxed to keep it from getting ahead of the garbage collector.
In this case, the tax is far higher than expected.  Details/cause TBD).
Also, it's great that this is a simple, fast reproducer.

robert engels

unread,
Sep 18, 2018, 11:58:30 AM9/18/18
to David Chase, w...@iri-labs.com, golang-dev
Yea, that’s the point though. The GC collection in Go is not as efficient as the Shenandoah collector, so it effectively needs to pause/stall the app/mutators to catch up. Just as the G1 collector is not as good as Shenandoah, and the pauses are even longer.

Which is why I originally pointed the Go authors to Shenandoah - since it is open-source, I am fairly certain its techniques could be adopted (although I know there is some discussion on write barriers being not desirable).

David Chase

unread,
Sep 18, 2018, 12:00:10 PM9/18/18
to ren...@ix.netcom.com, w...@iri-labs.com, golang-dev
This is not a technique problem. It is a bug, and it can be fixed without reengineering the Go garbage collector to use Brooks pointers.

robert engels

unread,
Sep 18, 2018, 12:08:43 PM9/18/18
to David Chase, w...@iri-labs.com, golang-dev
That’s good news.

Curious though, how do you know that? I look at it from the fact that if a mutator thread only allocates, and “frees", a single collector thread would need to be able to collect the garbage as fast as the allocator produces it (which is hard given basically free allocation costs), so I would think that finding, collecting, compacting, the garbage cannot be as cheap as the allocation. Then you throw in that the runtime might use other threads for housekeeping leading to cpu availability issues.

Even non-GC like malloc can perform very poorly at times due to fragmentation and allocation profile.

I am not questioning that it could be a bug, I was just wondering how you KNOW it is ? (technical curiosity)

David Chase

unread,
Sep 18, 2018, 1:40:49 PM9/18/18
to ren...@ix.netcom.com, Scott Cotton, golang-dev
If you eyeball a trace, I'd estimate that overall for that benchmark, there's about equal parts GC work and "real" work,
and that benchmark is very GC-heavy.  For most programs the time spent in GC is much lower (and can be made lower
by increasing GOGC, at the expense of a larger memory footprint).

For the sake of argument and easy math, assume that each slice allocation and initialization uses 1 uS of real work.
(The actually observed average is some mix of GC+real and is measured at 1uS even with this bug, but this is also on a "4" processor box, but it's not using them properly during assists, so mumble.  Call it 1uS).

If we time-sliced GC infinitely well and ran it continuously on a single processor, with equal parts spent on GC and real work, we'd observe a latency of 2 uS.
This simple work argument gets us nowhere near a 5000 uS latency for this benchmark -- but it's not a problem of the GC "keeping up".

One thing to watch out for in discussions of the Go garbage collector is that "keep up" is with respect to a goal heap size; based on observed behavior, the GC predicts when it needs to start in order to finish so that the peak heap size is LIVE * (100 + GOGC)/100.  When predictions are ideal, even for a heavily allocating program, there should be (modulo bugs) no need to draft the mutator to help with GC work, and on a 4 processor laptop running a single mutator thread it should be easy for the GC to "keep up" because it has 75% of the available CPU to use.  But, sometimes, if the estimate is wrong, the mutator will be drafted to perform GC work whenever it allocates memory.

So anyhow, I think this is (at least) a time/work-slicing problem of some sort.  It looks like mark assist is handed a large lump of work that does not get split up like it should, and threads that should be helping with GC instead sit idle.  (If large objects exist, splitting up their associated work is a latency problem for all garbage collectors; there are known solutions, but there are bugs).

And yes, there are some lumpy bits of work that don't time slice well, but the largest of those is supposed to be smaller than 100 uS long.

Scott Cotton

unread,
Sep 18, 2018, 1:43:18 PM9/18/18
to golang-dev
Another thanks for bringing it up.

Looking at the test itself, it seems not to be so much a burden or worry to avoid such behaviour if you have 
control over the whole program.  Clearly allocating like in the test in a latency sensitive context w.r.t. processing
media/audio in real time would be undesirable even if the GC pulled off the magic to reduce latencies well below
Go's current scores.  But if its in a library where someone else controls main then who knows.

Best,
Scott

robert engels

unread,
Sep 18, 2018, 1:58:12 PM9/18/18
to David Chase, Scott Cotton, golang-dev
Thanks for the color. Makes sense. Other than those slides that were offered earlier, is there a current white paper describing the Go GC in detail ? It seems to have changed (improved) a lot, and I’ve curious about the internals - but don’t really want to dig through the code … :)

Scott Cotton

unread,
Sep 18, 2018, 9:06:42 PM9/18/18
to golang-dev
Hi all,

I wanted to summarise my take home on this very productive thread:

1) Go's GC is a great enabler for latency sensitive real-time audio/media processing

2) Nonetheless, there are some issues around runtime/language and low latency realtime audio APIs, as it is commonly considered unreliable to do this outside of C because
  a) Callbacks on a real time thread of an audio system can't call things like sigaltstack in cgo->go
  b) Go's runtime doesn't let you directly play with OS thread scheduling

3) We have some solutions for b)
   - a) use LockOSThread()
   - b) although received with some skepticism, there is no apparent reason one can't set the thread scheduling from calling context or OS     calls (preliminary tests on my part work fine BTW)

4) FWIW, I have also in mind a solution for a) for the case of audio, as the callbacks all have a similar form and just transfer data, so perhaps cgo->go can be bypassed completely.

3 a) is suboptimal due to the inherent limitations of LockOSThread and current Go runtime scheduling of them.

If 3 b) continues to work and 4) works out, then the only remaining question is really Go's GC, which looks very promising in this regard to me.

So although current Go real-time audio apps are considered unreliable in low latency situations as compared to more widely used systems,  the prospects of it working well with solutions 3 b) and 4) look very good to me now.  They didn't before learning everything from all that was shared here.  and no changes to runtime necessary, other than perhaps the perception of it allowing to inherit (and even set) scheduling properties from the OS and calling context.

I'm quite happy with those results.

Thanks all & regards
Scott 

robert engels

unread,
Sep 18, 2018, 9:31:47 PM9/18/18
to Scott Cotton, golang-dev
I would also search “android real-time audio issues”. I know that there were problems reported there - people that created “effects software” always complained especially when compared to iOS.

As far as I know, they solved these issues - most likely by a native driver layer in the linux kernel ? - but still, Android is Java, and it didn’t even have some of the lower level bridges to C/native that Go has, so I would expect the same techniques could be applied to Go, but you probably lose cross-platform doing real-time audio anyway.

I would review those problems and solutions.


Scott Cotton

unread,
Sep 19, 2018, 7:40:25 AM9/19/18
to robert engels, golang-dev
On 19 September 2018 at 03:31, robert engels <ren...@ix.netcom.com> wrote:
I would also search “android real-time audio issues”. I know that there were problems reported there - people that created “effects software” always complained especially when compared to iOS.

Yes I'm quite aware of that.  iOS/darwin is very un simple for performance audio but very much ahead of other OSs.  You can for example synchronise devices (mic/speaker) to a single hardware audio clock there and treat effects/processors the same as I/O, which doesn't appear possible to me anywhere else.  
 

As far as I know, they solved these issues - most likely by a native driver layer in the linux kernel ? - but still, Android is Java, and it didn’t even have some of the lower level bridges to C/native that Go has, so I would expect the same techniques could be applied to Go, but you probably lose cross-platform doing real-time audio anyway.

I would review those problems and solutions.


Of course.  Been doing a lot of just that.  With Android there are lots of audio APIs so it takes time and I'm still learning, but the big picture appears to be AAudio is the solution, and things like oboe are the best solution 
in terms of compatability between AAudio and older Android (OpenSL ES).

AAudio looks good to me, but still behind iOS in terms of OS<->hardware interface and duplex synchronisation. The Android Audio HAL appears to be moving forward too, with things like DMA 
which help.
 
I think most "performance" audio apps use Android NDK and C/C++ for all but the most high level,
not much actual Java runtime in the picture I see despite it being Android. 

Best,
Scott


To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Scott Cotton

unread,
Sep 21, 2018, 3:34:00 PM9/21/18
to golang-dev
Hi all,

In case anyone wants to follow [here](https://github.com/zikichombo/sio/issues/17) is an issue
to track progress.

Best,
Scott
Reply all
Reply to author
Forward
0 new messages